Running Hardened Wordpress in Kubernetes

28 minute read Published

How to deploy a hardened WordPress site to Kubernetes using Ansible.
Table of Contents

Previously I’ve discussed the basics of init containers and shown how to deploy WordPress locally. But if you’ve already got a domain handy and are ready to move your WordPress site to a staging or production environment please continue. I’ll use Digital Ocean but you could use any hosting provider you like so long as you can use them to deploy some VPS instances for yourself during the setup below. I’ve been using this method to host Chicago Gang History for over three years.

Guide assumes you are not using a "managed" K8s solution or cloud provider and want to create your own cloud using VPS instances you manage yourself.

When you are finished you will have a hardened WordPress site deployed to Kubernetes using Ansible capable of handling up to 80K users per month.

Provision nodes

In this section we’ll provision three VPS nodes for use by the cluster. In addition, some basic security maintenance and hardening is also performed. At the end of this section you will have three hardened Ubuntu nodes.

Start by provisioning 3 nodes, one 4GB RAM server and two 1GB RAM workers:

doctl compute droplet create --image ubuntu-22-10-x64 \
  --size s-2vcpu-4gb --region nyc1 k3s-server \
  --tag-names k3s,k3s-server-1 --ssh-keys $SSH_KEY
  
doctl compute droplet create --image ubuntu-22-10-x64 \
  --size s-1vcpu-1gb --region nyc1 k3s-agent-1 \
  --tag-names k3s,k3s-agent --ssh-keys $SSH_KEY

doctl compute droplet create --image ubuntu-22-10-x64 \
  --size s-1vcpu-1gb --region nyc1 k3s-agent-2 \
  --tag-names k3s,k3s-agent --ssh-keys $SSH_KEY
If using gpg-agent for auth, there'd be no need to specify `--ssh-keys` when provisioning nodes for this cluster. If you're not using GPG for SSH, have a look at Michał Górny's 2018 post titled On OpenPGP (GnuPG) key management.

Take inventory of the nodes:

doctl compute droplet ls --tag-name k3s

Wait for public IP addresses to become available:

ID           Name                    Public IPv4        Private IPv4    Public IPv6    Memory    VCPUs    Disk    Region    Image               VPC UUID                                Status    Tags              Features                            Volumes
281189459    k3s-server-1            159.223.158.116    10.114.0.2                     4096      2        80      nyc1      Ubuntu 22.10 x64    fda185a1-5304-4165-860d-2c1dbb2fdca9    active    k3s,k3s-server    droplet_agent,private_networking
281189419    k3s-agent-1             159.223.158.112    10.114.0.3                     1024      1        25      nyc1      Ubuntu 22.10 x64    fda185a1-5304-4165-860d-2c1dbb2fdca9    active    k3s,k3s-agent     droplet_agent,private_networking
281189380    k3s-agent-2             159.223.158.104    10.114.0.4                     1024      1        25      nyc1      Ubuntu 22.10 x64    fda185a1-5304-4165-860d-2c1dbb2fdca9    active    k3s,k3s-agent     droplet_agent,private_networking

Export aliases to each node’s IP address:

export SERVER1=159.223.158.116
export AGENT1=159.223.158.112
export AGENT2=159.223.158.104

Begin hardening server, rebooting for kernel updates:

ssh root@"$(echo $SERVER1)"
root@k3s-server-1:~# apt update && apt upgrade -y
root@k3s-server-1:~# reboot
root@k3s-server-1:~# adduser deployer
New password: ******************
root@k3s-server-1:~# mkdir /home/deployer/.ssh
root@k3s-server-1:~# cp .ssh/authorized_keys /home/deployer/.ssh/
root@k3s-server-1:~# chown deployer:deployer /home/deployer/.ssh/authorized_keys
root@k3s-server-1:~# usermod -aG sudo deployer

Start second login shell to deployer and edit SSH config using sudo privileges:

ssh deployer@"$(echo $SERVER1)"
deployer@k3s-server-1:~$ sudo vim /etc/ssh/sshd_config
[sudo] password for deployer: ******************

Disable password auth and root login:

PasswordAuthentication no
PermitRootLogin no

Restart SSH server and exit:

deployer@k3s-server-1:~$ sudo service --status-all
deployer@k3s-server-1:~$ sudo service ssh restart
deployer@k3s-server-1:~# exit

Confirm only deployer can login now and has root access:

ssh root@"$(echo $SERVER1)"
root@k3s-server-1: Permission denied (publickey)
ssh deployer@"$(echo $SERVER1)"
deployer@k3s-1:~$ su -
Password: ******************
root@k3s-server-1:~# exit

Switch back to first shell session, set root password and exit:

root@k3s-server-1:~# passwd
New password: ******************
Retype new password: ******************
passwd: password updated successfully
root@k3s-server-1:~# exit

Now only deployer may login via SSH to access the root account. Use this account to configure banning with Fail2ban:

ssh deployer@"$(echo $SERVER1)"

deployer@k3s-server-1:~$ sudo apt install fail2ban -y
[sudo] password for deployer: ******************
deployer@k3s-server-1:~$ sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
deployer@k3s-server-1:~$ sudo service fail2ban restart
deployer@k3s-server1:~$ sudo fail2ban-client status sshd
Status for the jail: sshd
|- Filter
|  |- Currently failed:	2
|  |- Total failed:	2
|  `- File list:	/var/log/auth.log
`- Actions
   |- Currently banned:	0
   |- Total banned:	0
   `- Banned IP list:	

After a week your Fail2ban sshd stats may look more like this:

deployer@k3s-server-1:~$ sudo fail2ban-client status sshd
Status for the jail: sshd
|- Filter
|  |- Currently failed:	15
|  |- Total failed:	14228
|  `- File list:	/var/log/auth.log
`- Actions
   |- Currently banned:	0
   |- Total banned:	2105
   `- Banned IP list:	

14 thousand failed ssh attempts 7 days? Good thing we hardened the server node. Now repeat the above process for the two worker nodes AGENT1 and AGENT2. Once you’re finished you’ll be ready to network the nodes in a Kubernetes cluster.

Provision cluster

With nodes provisioned we will now network them into a Kubernetes cluster. To accomplish this task we’ll leverage K3s for Kubernetes and use a handy CLI tool called k3sup (pronounced “Ketchup”) to bootstrap the cluster.

Note: K3s installation via deployer account requires passwordless sudo access in order to complete. Please configure visudo and give deployer passwordless sudo access on each of the three nodes created to continue.

Install or upgrade k3sup on host machine (macOS example using Homebrew):

HOMEBREW_NO_AUTO_UPDATE=1 brew install k3sup

Verify auth socket for ssh on host (example using gpg-agent for auth):

env | grep SOCK
SSH_AUTH_SOCK=/Users/host/.gnupg/S.gpg-agent.ssh

Create a single-node cluster from host machine using SSH_AUTH_SOCK:

k3sup install --ip $SERVER1 --user deployer \
  --context nyc1-k3s-cluster-1

This node will become your control plane for the cluster so be sure the node used has as least 4GB RAM and is not one of the two agent nodes provisioned earlier.

Exception: If the above command fails with error sudo: a terminal is required to read the password; please see the passwordless sudo note above.

A successful installation will show you output like:

Expand to view sample output
Running: k3sup install
2023/03/05 13:21:32 159.223.158.116
Public IP: 159.223.158.116
[INFO]  Finding release for channel stable
[INFO]  Using v1.25.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Enabling k3s unit
[INFO]  systemd: Starting k3s
Result: [INFO]  Finding release for channel stable
[INFO]  Using v1.25.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
[INFO]  systemd: Starting k3s
 Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.

Saving file to: /Users/host/kubeconfig

# Test your cluster with:
export KUBECONFIG=/Users/host/kubeconfig
kubectl config use-context nyc1-k3s-cluster-1
kubectl get node -o wide

🐳 k3sup needs your support: https://github.com/sponsors/alexellis

With K3s installed on your control plane, provision and join 2 agent nodes:

k3sup join --ip $AGENT1 --server-ip $SERVER1 --user deployer
k3sup join --ip $AGENT2 --server-ip $SERVER1 --user deployer

For each agent joined you should see output like:

Expand to view sample output
Running: k3sup join
Server IP: 159.223.158.112
K108aa4be9816cb05bc00668dcd44bab239552dab93ad86a97b92723dfdcfc5469c::server:a758035006d7788060bc3f7330ee1d0d
[INFO]  Finding release for channel stable
[INFO]  Using v1.25.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO]  systemd: Enabling k3s-agent unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
[INFO]  systemd: Starting k3s-agent
Logs: Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
Output: [INFO]  Finding release for channel stable
[INFO]  Using v1.25.6+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO]  systemd: Enabling k3s-agent unit
[INFO]  systemd: Starting k3s-agent

🐳 k3sup needs your support: https://github.com/sponsors/alexellis

You now have a three node Kubernetes cluster and the ability to control it via kubectl Kubernetes CLI or using an IDE for Kubernetes. Inspect the output from the k3sup install command issued above to find your KUBECONFIG file.

Cluster provisioning complete. You may disable passwordless sudo now if desired.

Configure Lens

To simplify management of the cluster we’ll use Lens. See my Lens app primer for a basic introduction to Lens for cluster management. With Lens we’ll be more effective at managing the Kubernetes cluster with less effort than using the CLI alone thanks to Lens analytics including push-button Prometheus installation.

Start by installing Lens and importing your KUBECONFIG to connect to the K3s cluster created in the last section. Once the cluster is accessible from within Lens, install Lens metrics including Prometheus. Once the metrics server is finished installing you should see an interface similar to the following:

lens interface showing metrics server installed

Confirm metrics-server under Pods section of Workloads in kube-system workspace. Without this you will not be able to introspect on the cluster with Lens.

Configure chart

With provisioning complete and Lens ready we can configure and deploy the Helm chart. The chart we’ll be deploying is the Stackspin WordPress Helm chart. A functional configuration example file is provided below for reference.

Start by copying the example config file:

cp values-local.yaml.example values-local.yaml

Update configuration for your cluster:

Expand to view sample configuration
wordpress:
  config:
    db:
      prefix: wp_
    adm:
      usid: admin
      pssw: ******************
  site:
    # NOTE: Make sure you use underscore and that the localisation is in full caps
    default_language: en_US
    # NOTE: Optionally set a Wordpress version number to override the default
    # version: LOCAL-WORDPRESS-VERSION-NUMBER-OR-DELETE-THIS-LINE
    # NOTE: This is the URL that points to your WordPress installation. If this
    # URL is set incorrectly your site will most likely not work. You can not
    # change it after you have run helm install once because WordPress saves the
    # site URL in its database. To change this value, you would need to helm
    # delete and then helm install the chart again, or manually change the
    # WordPress database fields that contain the URL.
    url: "https://www.chicagoganghistory.com"
    title: "Chicago Gang History"
    # If you are including a plugin to alias wp login then set an alt.path and set alt.config options
    # NOTE: The value of alt enabled must be set as true or false
    alt:
      enabled: false
    # config: PATH-SETTING-IN-OPTIONS-TABLE
    # path: SOME-LOGIN-PATH
    # # Path used by the liveness and readiness probes to see if the site runs
    # # correctly. Defaults to `/wp-login.php`. Be sure to make this the same as
    # # alt_path if you use it!
    # probe_path: /wp-login.php
    # debug:
    #   # Set to true to set WP_DEBUG on in the configuration. This enables the
    #   # usage of wp_debug_log and wp_debug_display.
    #   wp_debug: false
    #   # IF debug.wp_debug, log to wp-content/debug.log
    #   wp_debug_log: true
    #   # IF debug.wp_debug, log to display (i.e. inside the site)
    #   wp_debug_display: false

  # Uncomment and edit this list to add plugins that will be installed before
  # the WordPress container starts.
  #
  # plugins:
  #   # Download plugin by slug from WordPress
  #   - classic-editor
  #   # Download plugin from extraordinary location
  #   - https://github.com/level73/stackspin-plugin/archive/refs/heads/main.zip
  plugins:
    # Download plugin by slug from WordPress
    - classic-editor
    # Download plugin from extraordinary location
    - https://github.com/level73/stackspin-plugin/archive/refs/heads/main.zip
    # Download plugins added for Chicago Gang History
    - cryout-serious-slider
    - wp-migrate-db
    - pods
    - pods-seo
    - photoswipe-masonry
    - akismet
    - newsletter
    - easy-google-adsense
    - honeypot

  # Install includes all parent, child, default, active and fallback themes
  # NOTE: Use theme *slugs* here. Can also be a URL to a theme zip file.
  themes_install:
   - twentytwentythree
   - bravada

  wp_content:
    # The directory to mount the files placed in wp-content. You shouldn't have to
    # change this.
    mount_path: /var/www/wp-content-mount
  wp_upload:
    # Contents of the .htaccess file that is mounted in the `wpUploadMnt` directory
    htaccess: |
      # Disable access to all file types except the following
      Require all denied
      <Files ~ ".(woff|xml|css|js|jpe?g|png|gif)$">
        Require all granted
      </Files>      

  ## Set default permissions given to files and directories by Wordpress
  ## Strong and writeable defaults are 750 and 640
  #permissions:
  #  directory_mode: 0750
  #  files_mode: 0640

  ## mu_plugins are installed as hidden and cannot be updated from the UI
  ## mu_dir  'mu-plugins' maps to wp-content/mu-plugins
  ## mu_plugins supplies a detailed list of mu values and plugins with versions

  mu_plugins_enabled: true
  mu_plugins_dir: mu-plugins
  mu_plugins:
    - name: Block Bad Queries
      # Path to the PHP files inside the plugin zip
      path: block-bad-queries
      # Entrypoint to plugin inside `path`
      phpfile: block-bad-queries.php
      url: https://downloads.wordpress.org/plugin/block-bad-queries.20230303.zip

  ## Enable externally triggered cron for an MU cron plugin
  # NOTE: mu_plugins *must* be enabled if you want to enable mu_cron
  mu_cron:
    enabled: false
    secret: ******************
    # # By default cron runs every 3 minutes, but you can change the schedule
    # # here:
    # cronjob:
    #   schedule: "*/3 * * * *"

# These settings make sense to overwrite if you want to use the OpenID connect
# plugin
openid_connect_settings:
  enabled: false
  client_id: OPENID_CLIENT_ID
  client_secret: OPENID_CLIENT_SECRET
  endpoint_login: https://login-endpoint-url
  endpoint_userinfo: https://userinfo-endpoint-url
  endpoint_token: https://token-validation-endpoint-url
  endpoint_end_session: https://end-session-endpoint-url
  no_sslverify: "0"
  enable_logging: "1"
  role_mapping_enabled: false
  role_key: roles

smtp_settings:
  # Enable using these SMTP settings
  enabled: false
  # Username for SMTP authentication
  smtp_user: admin@example.com
  # Password for STMP authentication
  smtp_pass: password
  # Hostname of the mailserver
  smtp_host: smtp.example.som
  # SMTP from email address
  smtp_from: admin@example.com
  # SMTP from name
  smtp_name: WordPress Admin
  # SMTP port number - likely to be 25, 465 or 587
  smtp_port: 587
  # Encryption system to use - ssl or tls
  smtp_secure: 'tls'
  # Use SMTP authentication (true|false)
  smtp_auth: true
  # Can be set to 1 or 2 for debug logs
  smtp_debug: 0

database:
  auth:
    username: wordpress
    password: ******************
    rootPassword: ******************
    replicationPassword: ******************

# Set this to true to have a Redis container next to your WP. The WP will be
# configured to connect to this Redis and `Redis Object Cache` plugin will be
# installed as a convential plugin and configured to connect to this Redis
# Change Redis MU plugin configurations to use MU instead
redis:
  # Set redis.enabled to true to have a Redis container next to your WP. The WP will be
  # configured to connect to this Redis and `Redis Object Cache` plugin will be
  # installed as a conventional plugin.
  enabled: false
  auth:
    password: ******************
  master:
    persistence:
      # Set persistence to true you want redis to have persistence
      enabled: false
    # disableCommand is set as null to enable FLUSHALL and FLUSHDB and allow cache purge and flush
    disableCommands: []
  # Set architecture to "replication" to have a primary and secondary redis. Not necessary for caching
  architecture: "replication"



# This will add a cronjob that performs backups of the wordpress
# database, copying an sql file created by `wp db export` to the given target.
#
# The backups are saved as yyyy-mm-dd and as such if you do several backups per
# day, previous backups of the same day will be overwritten. There is no cleanup
# of backups: it is assumed that cleanup is done on the target server.
# backup:
#   enabled: true
#   # The target location of the backup. This can be a local directory (not
#   # advisable) or a remote directory reachable over SSH. backup command uses
#   # this value as the second argument for `rsync`
#   target: <username@server.example.org:backup-dir/>
#   # If `backup.target` is an SSH address, use this private key:
#   sshPrivateKey: |
#     -----BEGIN OPENSSH PRIVATE KEY-----
#     ...
#     -----END OPENSSH PRIVATE KEY-----
#   # This string is mounted as a text file to /etc/ssh/ssh_known_hosts.
#   # Required when `sshPrivateKey` is provided. Required for SSH host key
#   # verification. If this is ill-configured, expect a host key verification
#   # error in the logs of the wordpress-backup container.
#   # Read the SSH documentation for the correct contents of the ssh_known_hosts
#   # file. You can use `ssh-keyscan` on a trusted network to find host keys.
#   sshKnownHosts: |
#     <hostname> <keytype> <key>
#   # when isDate is true or unset the database is backed up as wp-db-RELEASE-DATE.sql
#   # If isDate is set to false then backup names are a 2 week cycle of A(day number) or B(day number)
#   # A monthly database backup and monthly wordpress manifest are always made with monthnumber prefix
#   isDate: true
#   # The cron schedule that determines when backups are made.
#   # Run at 3:37 every day.
#   schedule: "37 3 * * *"

# customCron:
# - schedule: "5 * * * *"
#   command: "echo test"

# It's advisable to set resource limits to prevent your K8s cluster from
# crashing
# resources:
#   limits:
#     cpu: 100m
#     memory: 512i
#   requests:
#     cpu: 50m
#     memory: 256Mi

ingress:
  # If this is false, no ingress is created by the helm chart
  enabled: true
  # Example annotation to make cert-manager manage the TLS certificates for
  # this ingress (Don't supply crt and key to the tls config in this case).
  # annotations:
  #   kubernetes.io/tls-acme: "true"
  annotations:
    kubernetes.io/ingress.class: traefik
    kubernetes.io/tls-acme: "true"
    cert-manager.io/cluster-issuer: letsencrypt-staging
    traefik.ingress.kubernetes.io/redirect-regex: ^http://(?:www.)?chicagoganghistory.com/(.*)"
    traefik.ingress.kubernetes.io/redirect-replacement: https://chicagoganghistory.com/$1
  path: /
  hosts:
    - www.chicagoganghistory.com
    - chicagoganghistory.com
  tls:
    - hosts:
        - www.chicagoganghistory.com
        - chicagoganghistory.com
      secretName: wordpress-cert
      # crt: |
      #   Optionally insert your certificate here, it will be saved as a
      #   Kubernetes secret. You can insert the whole certificate chain here.
      #   NOTE: Don't do this if you use cert-manager to get your certificates!
      # key: |
      #   If you use a custom certificate, insert your TLS key here, it will be
      #   saved as a Kubernetes secret.

# Set this for use with Minikube:
# service:
#   type: NodePort
#   port: 12345

# Labels that will be added to pods created by the wordpress StatefulSet.
# podLabels:
#   key: value
#   something: else

# Labels that will be added to the wordpress StatefulSet itself.
# statefulSetLabels:
#   someCustom: labelValue

# This will restrict WordPress deployment to the specified node. This is useful
# when you are working with limited resources and do not want to run WordPress
# on a resourced-constrained node.
#
# See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/
# for more information on assigning pods to nodes.
# nodeSelector:
#   kubernetes.io/hostname: k3s-server-1

Leave Redis disabled for now if you plan to restore from backup. In a later section we’ll come back and explicitly enable it.

Once satisfied with your chart configuration, deploy it to the cluster using the letsencrypt-test cluster issuer as shown in the example config. In a later section we’ll create a similarly named certificate for HTTPS access to the site.

Deploy chart

Before deployment create an A record at your DNS provider so the domain name specified in the Helm chart config points to the IP of the $SERVER1 control plane node. Bear in mind it can take some time for the change to propagate.

Then install or upgrade Helm on host machine (macOS example using Homebrew):

HOMEBREW_NO_AUTO_UPDATE=1 brew install helm

With Helm installed, deploy chart to a new wordpress namespace:

cp install.sh.example install.sh
chmod +x install.sh
./install.sh --namespace wordpress --create-namespace

You should see output like the following:

Expand to view sample output
releaseName="wordpress-production"

# Upgrade or install application using the current git branch as docker tag
helm upgrade $releaseName . --install -f values-local.yaml "$@"
Release "wordpress-production" does not exist. Installing it now.
NAME: wordpress-production
LAST DEPLOYED: Sun Mar 19 10:26:22 2023
NAMESPACE: wordpress
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
  https://www.chicagoganghistory.com/
  https://chicagoganghistory.com/
Note: Omitting --namespace from the installation command will cause the chart to attempt installation in the default namespace instead which is not ideal.

Each time the chart is deployed a WordPress init container will run and build WordPress from scratch based on included Ansible script. Before you will be able to access your site use Lens to ensure the init container builds successfully:

lens ide showing wordpress init container running

Leverage Pod logs in Lens to identify and debug any errors:

lens ide showing wordpress init container logs access

When the init-wordpress container finishes, your installation of WordPress will be deployed and and publicly accessible via DNS lookup. Confirm with:

curl --head --insecure https://chicagoganghistory.com

You should see output like:

HTTP/2 301
content-type: text/html; charset=UTF-8
date: Sun, 19 Mar 2023 14:29:53 GMT
location: https://www.chicagoganghistory.com/
server: Apache/2.4.54 (Debian)
x-powered-by: PHP/8.1.12
x-redirect-by: WordPress
Note: Response assumes your Ingress is configured as shown in the example, and a DNS A record pointing at the server node has already propagated.

Confirm Helm chart created Ingress resource in wordpress namespace:

kubectl get Ingress --all-namespaces

You should see output like:

NAMESPACE   NAME                   CLASS    HOSTS                                               ADDRESS                                           PORTS     AGE
wordpress   wordpress-production   <none>   www.chicagoganghistory.com,chicagoganghistory.com   159.223.158.116,159.223.158.112,159.223.158.104   80, 443   22h

At this point you may wish to configure any plugins and themes you plan to use by modifying values-local.yaml and redeploying with ./install.sh -n wordpress. Repeat this process until your ready to create an SSL certificate for the site.

Create certificate

Next we’ll use cert-manager to create a certificate issuer for the cluster.

First take inventory of any issuers with:

kubectl get Issuer,ClusterIssuer --all-namespaces

Since this is a new cluster you should see an error like:

error: the server doesn't have a resource type "Issuer"

Use Helm to install Cert Manager and its CRDs to resolve error:

helm repo add jetstack https://charts.jetstack.io && \
helm repo update && \
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.11.0 \
  --set installCRDs=true

Consider version will change over time. Use the latest release version when installing. See github.com/cert-manager/cert-manager/releases for releases. To uninstall Cert Manager use kubectl delete namespace cert-manager to start over.

You should see output like:

Expand to view sample output
NAME: cert-manager
LAST DEPLOYED: Sun Mar 12 16:06:02 2023
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.11.0 has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://cert-manager.io/docs/configuration/

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the `ingress-shim`
documentation:

https://cert-manager.io/docs/usage/ingress/

Next create an Issuer or ClusterIssuer for use obtaining TLS certificates from LetsEncrypt. Here we’ll use a ClusterIssuer so we can assign certs to Ingress resources in any namespace using the HTTP01 Ingress solver. We’ll start by issuing a test certificate and, once functional, move to production.

Warning: When requesting certificates from LetsEncrypt always start with a test certificate first to avoid having the cluster’s control plane IP banned in case of too many production certificate request failures during setup.

Create test ClusterIssuer resource definition and store in a file:

cat <<EOF > letsencrypt-issuer-staging.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
  namespace: default
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: user@example.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-staging
    # Enable the HTTP-01 challenge provider
    solvers:
    # An empty 'selector' means that this solver matches all domains
    - selector: {}
      http01:
        ingress:
          class: traefik
EOF
Tip: The above code block is runnable. Set a valid email by editing it and then execute it in a terminal to create a declarative configuration file.

Apply configured test issuer to cluster and get resource details:

kubectl apply -f letsencrypt-issuer-staging.yaml && \
kubectl get CertificateRequests,Orders,Challenges --all-namespaces

You should see output like:

NAMESPACE   NAME                                                      APPROVED   DENIED   READY   ISSUER                REQUESTOR                                         AGE
NAMESPACE   NAME                                                      APPROVED   DENIED   READY   ISSUER                REQUESTOR                                         AGE
wordpress   certificaterequest.cert-manager.io/wordpress-cert-bfwsp   True                False   letsencrypt-staging   system:serviceaccount:cert-manager:cert-manager   25s

NAMESPACE   NAME                                                         STATE     AGE
wordpress   order.acme.cert-manager.io/wordpress-cert-bfwsp-2053340489   pending   25s

NAMESPACE   NAME                                                                        STATE     DOMAIN                       AGE
wordpress   challenge.acme.cert-manager.io/wordpress-cert-bfwsp-2053340489-90611677     pending   www.chicagoganghistory.com   22s
wordpress   challenge.acme.cert-manager.io/wordpress-cert-bfwsp-2053340489-1678907108   pending   chicagoganghistory.com       22s

If you have an Order stuck in a pending state, try describing the resource as shown in Troubleshooting Orders in the Cert Manager docs:

kubectl describe order wordpress-cert-6pnlc-739176700

Using describe on an Order stuck in pending will give a debug Event like:

Events:
  Type     Reason  Age   From                 Message
  ----     ------  ----  ----                 -------
  Warning  Solver  16m   cert-manager-orders  Failed to determine a valid solver configuration for the set of domains on the Order: no configured challenge solvers can be used for this challenge

A successful Order will have an Events section like this:

Events:
  Type    Reason    Age   From                 Message
  ----    ------    ----  ----                 -------
  Normal  Created   10m   cert-manager-orders  Created Challenge resource "wordpress-cert-bfwsp-2053340489-1678907108" for domain "chicagoganghistory.com"
  Normal  Created   10m   cert-manager-orders  Created Challenge resource "wordpress-cert-bfwsp-2053340489-90611677" for domain "www.chicagoganghistory.com"
  Normal  Complete  10m   cert-manager-orders  Order completed successfully

And a rerun of:

kubectl get CertificateRequests,Orders,Challenges --all-namespaces

Will yield a successful result like:

NAMESPACE   NAME                                                      APPROVED   DENIED   READY   ISSUER                REQUESTOR                                         AGE
wordpress   certificaterequest.cert-manager.io/wordpress-cert-bfwsp   True                True    letsencrypt-staging   system:serviceaccount:cert-manager:cert-manager   15m

NAMESPACE   NAME                                                         STATE   AGE
wordpress   order.acme.cert-manager.io/wordpress-cert-bfwsp-2053340489   valid   15m

Once the test CertificateRequest shows READY of True you’re ready to copy it for production and apply it to the WordPress Ingress controller:

cp letsencrypt-issuer-staging.yaml letsencrypt-issuer-prod.yaml && \
sed -ie 's/acme-staging-v02.api.letsencrypt.org/acme-v02.api.letsencrypt.org/g' \
    letsencrypt-issuer-prod.yaml && \
sed -ie 's/-staging/-prod/g' letsencrypt-issuer-prod.yaml && \
kubectl apply -f letsencrypt-issuer-prod.yaml

Then verify both cluster issues are deployed and READY for use:

watch kubectl get ClusterIssuer

Wait until it looks like:

NAME                  READY   AGE
letsencrypt-staging   True    13m
letsencrypt-prod      True    15s

And when you have READY of True for both test and production ClusterIssuer update the Ingress in values-local.yaml to use letsencrypt-prod and then run ./install.sh -n wordpress to update the WordPress deployment.

Remember to wait for the init container to complete and, once WordPress is back up, hit the site without using an --insecure flag via curl like:

curl -IsS https://chicagoganghistory.com http://chicagoganghistory.com \
    https://www.chicagoganghistory.com http://www.chicagoganghistory.com

You should see successful responses for all four URL configurations:

Expand to view sample output
HTTP/2 301
content-type: text/html; charset=UTF-8
date: Mon, 13 Mar 2023 02:11:22 GMT
location: https://www.chicagoganghistory.com/
server: Apache/2.4.54 (Debian)
x-powered-by: PHP/8.1.12
x-redirect-by: WordPress

HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Date: Mon, 13 Mar 2023 02:11:23 GMT
Location: http://www.chicagoganghistory.com/
Server: Apache/2.4.54 (Debian)
X-Powered-By: PHP/8.1.12
X-Redirect-By: WordPress

HTTP/2 200
content-type: text/html; charset=UTF-8
date: Mon, 13 Mar 2023 02:11:24 GMT
link: <https://www.chicagoganghistory.com/index.php?rest_route=/>; rel="https://api.w.org/"
server: Apache/2.4.54 (Debian)
x-powered-by: PHP/8.1.12

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Date: Mon, 13 Mar 2023 02:11:25 GMT
Link: <https://www.chicagoganghistory.com/index.php?rest_route=/>; rel="https://api.w.org/"
Server: Apache/2.4.54 (Debian)
X-Powered-By: PHP/8.1.12

Or in summary something like:

HTTP/2 301
HTTP/1.1 301 Moved Permanently
HTTP/2 200
HTTP/1.1 200 OK

content-type: text/html; charset=UTF-8
date: Mon, 13 Mar 2023 01:31:12 GMT

Now visitors may access the site using the https scheme. And later, when your cert is set to expire, Cert Manager will renew it for you automatically.

With the site up and encryption enabled it’s time to restore from backup.

Restore content

See the Spinstack repo README for an overview. My recommendation is to follow their instructions for importing the database but, unless you’ve got a content backup with only a few dozen files, use krsync to recover persistent volumes by attaching a data access container. Below I will describe recommended steps.

First create a Pod spec to access WordPress content:

dataaccess-wp-content.yml
---
apiVersion: v1
kind: Pod
metadata:
  name: dataaccess-wp-content
  namespace: wordpress
spec:
  containers:
  - name: alpine
    image: alpine:latest
    command: ['tail', '-f', '/dev/null']
    volumeMounts:
    - name: wordpress-production-wp-content
      mountPath: /data
  volumes:
  - name: wordpress-production-wp-content
    persistentVolumeClaim:
      claimName: wordpress-production-wp-content

Ensure the namespace is the same used while deploying WordPress and claimName corresponds with a Persistent Volume Claim created during the first deployment.

If everything checks out, apply the spec:

kubectl apply -f dataaccess-wp-content.yml

Once the Pod starts, shell into the data access container:

kubectl exec -n wordpress -it dataaccess-wp-content -- sh

Use ls to verify the contents of the /data directory:

/ # ls -1 data
cache
logs
mu-plugins
object-cache.php
plugins
themes
upgrade
uploads

Install rsync using apk add then exit the shell session:

/ # apk add --no-cache rsync
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/5) Installing libacl (2.3.1-r1)
(2/5) Installing lz4-libs (1.9.4-r1)
(3/5) Installing popt (1.19-r0)
(4/5) Installing zstd-libs (1.5.2-r9)
(5/5) Installing rsync (3.2.7-r0)
Executing busybox-1.35.0-r29.trigger
OK: 8 MiB in 20 packages
/ # exit

Verify you also have rsync installed on the host machine as well then install krsync as described in Recover Files from a Kubernetes Persistent Volume.

Start by syncing the uploads directory from your backup:

./krsync -av --progress --stats uploads/ \
    dataaccess-wp-content@wordpress:/data/uploads/

If you’re on a MacBook, use Amphetimine to prevent the system from going to sleep during the transfer. 20GB of data for Chicago Gang History finished syncing in just under an hour. If your sync fails part way through, re-run krsync and the transfer will attempt to resume itself – something kubectl cp cannot do.

Once the sync is finished you’ll see output like:

Expand to view sample output
Number of files: 100886
Number of files transferred: 100829
Total file size: 20919306278 bytes
Total transferred file size: 20919306278 bytes
Literal data: 20919306216 bytes
Matched data: 62 bytes
File list size: 5772920
File list generation time: 3.955 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 20932067720
Total bytes received: 2218612

sent 20932067720 bytes  received 2218612 bytes  6325513.32 bytes/sec
total size is 20919306278  speedup is 1.00

Confirm the uploads were synced to the correct location by listing the contents of the /data/uploads directory mounted to the Persistent Volume:

kubectl exec -n wordpress -it dataaccess-wp-content -- ls -1 /data/uploads

You should see output like:

Expand to view sample output
2016
2017
2018
2019
2020
2021
2022
2023

If you see an uploads folder nested inside of another uploads folder, shell into the data access container and move the contents out of the nested folder.

Then give WordPress permission to access the files:

kubectl exec -n wordpress -it dataaccess-wp-content -- \
    chown -R 33:33 /data/uploads

Repeat the sync process for any other files desired. WordPress child themes, for example, can be uploaded to the /data/themes directory. Do note, however, public themes installable via WordPress admin can be installed using the Helm chart.

When you’re finished remove the data access Pod from the cluster:

kubectl delete -f dataaccess-wp-content.yml
Note: Command assumes dataaccess-wp-content exists in the current directory.

Now you’re ready to restore the database from backup.

Restore database

With your WordPress content restored it’s time to restore the database. The instructions on the Spinstack repo README are sufficient for this task. But they will likely need to be modified to point to the correct Pod name.

Restore the database using a command like:

export HISTCONTROL=ignorespace && \
 kubectl exec -n wordpress -i wordpress-production-database-primary-0 -- \
    mysql -uwordpress -p<your password> --database=wordpress_db < dump.sql

This will read a local file called dump.sql and redirect the output to mysql using the username -u and password -p specified. The -i flag passes stdin to the container which is why this command is able to read from the local system. Your database password was specified inside the Helm chart deployed earlier.

Tip: It’s preferable to restore from a DB dump in order to preserve previous user accounts but you can also restore from a SQL file created by WP Migrate.

The import should complete within a few minutes.

Verify the results by visiting the site and regression testing. If you run into any problems, inspect the database pod in Lens and leverage the logs for debugging purposes. You can delete the database Pod and wait for it to come back in order to get at the logs. Similarly, you can delete the WordPress Pod to restart it as well once the Database Pod is running again.

Once your site is tested, redeploy the Helm chart with Redis activated.

Activte Redis

Enable Redis by setting redis.enabled to true in the Helm chart and redploying:

./install.sh -n wordpress

Now when the WordPress init container runs the Ansible script will additionally install and configure the Redis Object Cache plugin. Here’s a graph showing cluster CPU usage shortly after Redis is enabled:

Lens showing cluster CPU usage after Redis enabled.
Lens showing cluster CPU usage after Redis enabled.

Redis should automatically deploy to one of your two worker nodes. Once enabled, Redis Object Cache will query Redis and serve up faster, cached responses for requests which otherwise might’ve hit the database and slowed down responses.

As noted in the comments of values-local.yaml, you can also configure Redis Object Cache as a MU (“Must Use”) plugin. MU plugins are ideal to ensure specific plug-ins are not deactivated or removed inadvertently.

Hardening

Enable the Block Bad Queries plugin for WordPress otherwise known as the BBQ Firewall. This is already configured as a MU (“Must Use”) plug-in based on the values-local.yaml provided earlier. To use it set mu_plugins_enabled to true and run ./install.sh -n wordpress to update your WordPress deployment.

Then take inventory of your cluster’s nodes:

export SERVER1=159.223.158.116
export AGENT1=159.223.158.112
export AGENT2=159.223.158.104

For each node, shell into the deployer user, access the root account and then run visudo to view user priviledge specifications:

ssh deployer@"$(echo $SERVER1)"
deployer@k3s-server-1:~$ su -
Password: ********************
root@k3s-server-1:~# visudo

Disable the NOPASSWD priviledge given to deployer while provisioning nodes and save the visudo settings. The deployer user will now need to authenticate with their own password in order to run sudo commands thereafter.

Exit out of the root user and verify sudo commands prompt for a password:

root@k3s-server-1:~# exit
deployer@k3s-server-1:~$ sudo fail2ban-client status sshd
[sudo] password for deployer: ********************

If you’re running Ubuntu, verify the unattended-upgrades service is active:

sudo systemctl status unattended-upgrades

Finally, consider opting into the Lens Pro trial which gives you access to tools which can be used to scan your cluster for vulnerabilities.

Handling crashes

If your WordPress site crashes, Kubernetes will attempt to automatically restart the WordPress Pod, rebuild the site and re-deploy it to the cluster. This will typically result in a few minutes of downtime while Ansible does its thing. Unlike a traditional WordPress install this recovery process is automamtic.

There are cases, however, which will lead to a node running out of memory and this may require the entire node to be rebooted. If it does, consider increasing the RAM on the node as described in Moving from Pantheon.io to Kubernetes. When a node is rebooted K3s will automatically reconnect it to the cluster and restart the Pods it was running. Restart Fail2ban on the node after upgrade, if needed.

Making backups

Always keep back-ups of your site. You can create backups from the WordPress Admin site but there are limitations if done that way, and there’s a good chance you’ll lose the users table in your database. For that reason, consider instead making backups using krsync, and kubectl with the -i as shown earlier and the mysqldump command similar to restoring a backup but in the reverse direction. As long as you have the content of the site and a copy of the database you can recover from just about anything.

Keeping secrets

Before you pat yourself on the back make sure you have kept track of all the passwords generated and used during the setup. These are easy to keep track of during set-up using a Logseq journal for installation notes. Just make sure you clear those out after you transfer them to your password manager. You may also wish to keep a copy of your KUBECONFIG file in your pasword manager as well. I tend to base64-encode these and drop them along with alll the passwords in a single plain text file which is then encrypted at rest using my GPG key.

Summary

In this tutorial you learned how to deploy a hardened WordPress to Kubernetes using Ansible capable of handling up to 80K users per month. As I mentioned earlier, I’ve been using this method for hosting Chicago Gang History for several years. And I expect to continue using this method for the forseeable future. I’m sharing this in the hopes it may help you as much as its helped me.