Migrate WordPress from Shared Hosting to AWS+Docker

Goal: Migrate WordPress sites from GoDaddy shared hosting (cPanel) to AWS+Docker.

Motivation

Being at the tail end of a five-year hosting contract with GoDaddy, the renewal fees have gone up, plus the Linux kernel and PHP interpreter are quite outdated – a rebuild is needed anyway. I’ll use this opportunity to migrate my WordPress sites to AWS cloud hosting to save money, save resources, have fine-grain server control, and squeeze out every drop of performance.

My sites are fortunately simple: I have many presentation sites with mostly statically-generated pages and minimal dynamic content. They all run WordPress, but mainly static HTML is served to visitors for utmost responsiveness. Together with DNS and reverse-proxy (CDN) provider Cloudflare, these sites will be snappy, secure, and inexpensive to operate.

AWS, Docker, Cloudflare

I’ve separated my steps into the following sections:

Part 1 – Setting up a minimal staging site


  1. Calculate the monthly cost with AWS
  2. Back everything up from the shared host
    2.1 Important web folders
    2.2 Generate a full site backup
    2.3 Crontab, DNS settings, site aliases, and TLS certs
    2.4 WordPress databases
    2.5 Email files and settings
    2.6 Search the site archive for other settings

  3. Setup an AWS EC2 instance for Docker
    3.1 Add root and web storage
    3.2 Open ports and set firewall rules
    3.3 Preview settings before launch
    3.4 Create an SSH key pair
    3.5 Allocate a static IP for DNS
    3.6 Test out the SSH connection and EC2 instance
    3.7 Set the timezone

  4. Install Docker and Docker Compose
    4.1 Install Docker Compose
    4.2 Confirm the web server is publicly reachable

  5. Design considerations
    5.1 DNS provider: AWS Route 53 or Cloudflare
    5.2 WordPress database storage: RDS or Docker containers
    5.3 Automatic database backups
    5.4 TLS certificate provider for HTTPS
    5.5 Mail service provider
    5.6 Identify the development workflow

  6. Recreate the shared hosting structure in Docker
    6.1 Setup an automated Nginx reverse proxy
    6.2 Setup a minimal test site
    6.3 Migrate a real staging site
    6.4 Gotcha: Infinite redirects
    6.5 Gotcha: WordPress is unable to write files
    6.6 Setup health checks
    6.7 Gotcha: Docker Alpine images missing usermod, groupmod, and curl
    6.8 Set a unique project name
    6.9 Understand the security layers
    6.10 Add basic authentication

  7. Common Docker and Docker Compose commands

Part 2 – Setting up a tight-running production site (TBD)


  1. Next steps

Step 1. Calculate the monthly cost with AWS

My GoDaddy shared hosting these past five years was 2 vCPUs and 1GB of RAM, and it is billed as the top-tier Ultimate Linux hosting plan. Renewal is CAD$20/mo. With AWS, the first year is free with the AWS Free Tier, but hosting is estimated to cost only $5/mo from the second year. See below.

AWS Free Tier features
AWS Free Tier features

I’ll be using the AWS Free Tier with a single EC2 t2.micro (1 vCPU and 1GB RAM) instance, a 5GB provisioned SSD EBS volume (root file system), a 5GB S3 bucket for web assets, and a modest SES for email. From the AWS pricing calculator, this should cost under US$10/mo when the Free Tier expires.

AWS micro EC2 instance monthly cost
AWS micro EC2 instance monthly cost

This is the on-demand pricing. Did you know that if you reserve your instance for a longer period of time you get a significant savings? For example, with a three-year no-upfront reservation the above infrastructure comes down to just under $5/mo. First I’d like to evaluate the t2.micro instance before I commit to anything more.


Step 2. Back everything up

I’ll be moving away from GoDaddy1. My first step is to back everything up. First I’ll calculate how much space is needed for the backup.

Public html folder size
Public html folder size

I have about 5.6GiB of in my GoDaddy home folder, but only 3.4GiB in my web folders. Everything can be rsync‘d in about 10 minutes, but I will let cPanel do the heavy lifting shortly.

Important web folders

These are the most important folders in my shared hosting account. However I backup my hosting account, I need to ensure these folders are present.

  • ~/.htpasswds – Holds the Apache basic auth passwords
  • ~/.mysql_backup – Holds weekly MySQL backups for shared WordPress sites
  • ~/.ssh – Holds the public keys used to SSH into to shared hosting account
  • ~/access-logs (symlink) – Holds the plain-text website access logs for the current day
  • ~/error-logs – If specified in the root php.ini file, this folder holds a running PHP error log. Check the ini file for the actual error logs location if it has moved.
  • ~/logs – Holds the archived website access logs going far back in time
  • ~/mail – Holds all mail messages and attachments.
  • ~/public_html – This is the most important folder. It holds all the shared web hosting files for each website, or just the main website if there is only one.
  • ~/ssl – Holds the TLS keys to make HTTPS connections. This may or may not have any certificates present. I currently use Let’s Encrypt and Cloudflare instead.

Generate a full site backup

CPanel has the functionality to perform a full and complete backup of the site and all settings including DNS entries and mail forwarding rules.

CPanel backup of entire website
CPanel backup of entire website

Through cPanel I’ll be downloading all my files and settings to a single tarball archive. I’ve chosen the home directory backup location and started the backup process.

CPanel full site backup to a single archive
CPanel full site backup to a single archive

After about ten or so minutes a single archive containing more data and settings than I imagined appears in my home folder. Here are the contents of that archive.

Shared hosting backup archive contents
Shared hosting backup archive contents
The tarball has some very deep folder structures and as such Windows may not show all the folders. A version of 7Zip from 2017 hides long folders, but a newer version correctly shows all the archived folders and folder sizes.

Crontab, DNS settings, site aliases, and TLS certs

The shared hosting SSH home folder ~ is archived to homedir. It contains all the important folders outlined above. The other folders in the archive conveniently contain TLS certificates, crontab information, DNS settings, domain aliases, MySQL dumps, current logs, and cPanel configuration files. This is fantastic because it saves a lot of time rummaging through cPanel to note such settings. I’ll briefly explain what information each special folder contains.

Shared hosting backup archive special folders
Shared hosting backup archive special folders
  • apache_tls – Holds a copy of the TLS keys to make HTTPS connections. This may or may not have any certificates present. I currently use Let’s Encrypt and Cloudflare so I will disregard this folder.
  • cp – Holds a single file containing all the TLDs and SLDs associated with my account
  • cron – Holds a copy of the crontab file. The cron jobs can also be seen by running crontab -l via SSH, and the live crontab file found in either /usr/bin/crontab or /usr/bin/crontab.cagefs. You can edit the crontab file with crontab -e (:q quits).
  • dnszones – Holds the DNS settings for each TLD associated with my account. These can largely be ignored as Cloudflare will be the DNS provider from now on.
  • homedir – This is a copy of the home folder at ~.
  • mysql – Holds an uncompressed dump of each MySQL database made at the time the archive is generated. When inflated, these files can become quite large.
  • userdata – Contains information about the document root and domain aliases of each site associated with the shared account
The cp folder has a single file with all the domain, subdomain and document root information. If I didn’t create a full backup I could find this information in the cPanel, but I’d have to look in two panels: Addon Domains and Subdomains.

Manual locations to look for the domain and subdomain document root mappings.

CPanel add-on domains panel
CPanel add-on domains panel
CPanel subdomains panel
CPanel subdomains panel

WordPress databases

If enabled, GoDaddy creates weekly backups of the WordPress databases, but just for the previous week. They can be found in ~/.mysql_backup. I won’t be using this folder because the backups are outdated.

Instead, uncompressed databases dumps are exported along with the entire hosting account in the full backup archive above in the mysql folder.

Exported mysql dumps in site archive
Exported mysql dumps in site archive

These are great when initially migrating sites, but I won’t be archiving my entire account each time I need database dumps. The databases can also be exported individually or in bulk from the phpMyAdmin panel in hosting. Here I can customize the dump file and specify where I’d like it saved.

PhpMyAdmin database export screen
PhpMyAdmin database export screen

The easiest method to export individual, compressed database dumps is through cPanel again.

Export and download database dumps with cPanel
Export and download database dumps with cPanel

Email settings

If I had used GoDaddy as my email solution provider then I would have saved email messages. In that case I would enable POP3 or IMAP access to all my email accounts and import my mail to Gmail or Outlook. Those settings can be found in the cPanel email panel.

CPanel email settings panel
CPanel email settings panel

What I actually do is use a catch-all setting to forward mail on a per-domain basis to Gmail email accounts. I do this because Gmail has much better spam filters than I could ever set up manually, so I know if I were a shared hosting provider I would not be able to provide as good spam filters as an email industry leader. That being said, I need to record all my catch-all settings. They must be in the archive, so I performed a grep of the archive and found the location of the email catch-all rules: va/.

Grep the archive for a target string
Grep the archive for a target string

These are also in cPanel, but it requires a lot of clicking and refreshing to see every catch-all.

CPanel email catch-all settings
CPanel email catch-all settings

Search the site archive for other settings

To find important settings in the special folders I like to untar the archive so I can grep for strings like in the email example above. I’ll untar the archive on the shared hosting side and use their resources to perform the search. If you have a huge archive, I’d even recommend starting with the screen command to resume sessions later.

The entire account is now backed up, and all the settings to recreate the shared hosting in a local Docker environment are present or can be search for quickly.


Step 3. Setup an AWS EC2 instance for Docker

To get started I created an AWS account and selected the Free Tier to work with.

AWS Free Tier signup
AWS Free Tier signup

After I created an AWS account (AWS Free Tier) I needed to choose a Docker solution. Kubernetes is amazing for large-scale clusters with auto-scaling and high-availability. This is also an expensive solution in terms of resources and, yes, cost. Fargate is less complicated than Kubernetes, but is still designed to operate scaling resources. A plain Amazon EC2 is the perfect choice in this case because it is just a “Linux box” with fixed resources, and is easiest to run locally for testing as well as in AWS for production.

Running on Docker on AWS - EC2
Running on Docker on AWS – EC2

Going with an EC2, I need to launch a new AMI2. The default location is Ohio (US East), but I changed it to Oregon (US West) because I’m in Vancouver; in production it won’t matter where the EC2 instance is as Cloudflare will act as my global CDN. The next question is, which EC2 AMI should I choose?

Select a free-tier Linux 2 AMI
Select a free-tier Linux 2 AMI

I’ve chosen the Amazon Linux 2 AMI (no out-of-the-box Docker support yet) because it is already optimized for AWS virtualization and it is based on CentOS 7 with systemd, plus it can be run on-prem as a virtual machine. I’ve run Docker on CentOS before and like a systemd environment, but really I just need an optimized EC2 to run the Docker daemon without crashing. I essentially just need the Docker daemon and a firewall. All the fun things happen inside the Docker containers.

Choose the t2.micro instance type
Choose the t2.micro instance type

Next I configured the instance details withe the defaults and no IAM role yet. I also made sure to protect against accidental termination – there will be an extra step if I really do want to terminate my one and only Docker EC2 instance.

Configure the EC2 instance
Configure the EC2 instance

Add root and web storage

Next I added storage. I gave the instance a root drive of 15GiB because Docker images built from scratch can grow more than you would believe with the intermediary images unless they are trimmed right, or pre-built images are used. The minimum size is 8GiB anyway. I’ve also added an sdb drive which is where my web files will reside. I’m afforded 30GiB of storage but I’m only using 20GiB today. Later I can create snapshots or add more drives.

Add EBS storage to the EC2 instance
Add EBS storage to the EC2 instance

AWS suggested a good tagging convention so I went with that.

AWS tagging
AWS tagging

Open ports and set firewall rules

Let’s open the standard web server ports for now with a restriction on the SSH source IP which can be updated through the AWS console if it changes. Root login is disabled already, and the ec2-user can only log in with a private key, so SSH attacks are mitigated.

Simple firewall rules for now
Simple firewall rules for now

Preview settings before launch

Next I previewed my settings, firewall rules, instance details, storage, and tags. Everything looks correct, so click on “launch”.

Preview the instance settings and launch the EC2
Preview the instance settings and launch the EC2

I took this time while the instance is launching to set up some billing alerts. I’ll know if I create too many snapshots or if I leave multiple instances running by accident.

Setup billing alerts
Setup billing alerts

Create an SSH key pair

This is a quick step. Just before any instance starts to launch, AWS prompts us to reuse or create an SSH key pair which will be used to remote into the EC2 instance. Assume the default “ec2-user” has a lost password, so the only way to remote into the instance with with a private key. AWS warns us that once the key pair is created, it cannot be downloaded again so it must be kept safe.

Generate an AWS key pair
Generate an AWS key pair

Before I can connect to the EC2 instance I need to convert the SSH PEM file into a PPK (PuTTY Private Key) format to use with PuTTY in Windows using PuTTYgen. Linux SSH is straightforward, but I have some pretty awesome software that only runs on Windows, so my primary development environment is on Windows (in case anyone is wondering).

Convert AWS PEM file into a PPK key for PuTTY
Convert AWS PEM file into a PPK key for PuTTY

Allocate a static IP for DNS

Next, I’ll need an Elastic IP. It’s a static, public IP that I can set in my DNS A records in Cloudflare for my websites. It doesn’t change. Internally in AWS I can associate the public Elastic IP to the Webserver EC2 instance. The EC2 instance, when running, starts out with a public IPv4 address, but this is mutable and changes when the instance is destroyed or new instances are swapped in. The Elastic IP is fixed and is reliable for DNS records.

Create and associate an Elastic IP
Create and associate an Elastic IP

Test out the SSH connection and EC2 instance

Let’s test out the EC2 instance. Using a PuTTY connection profile with “ec2-user” (not root) and the Elastic IP, the EC2 instance is owned.

Successful EC2 SSH connection via the Elastic IP
Successful EC2 SSH connection via the Elastic IP

How much space does a new Amazon Linux 2 EC2 instance occupy? The Management console doesn’t indicate, only the EBS volumes I created. In the terminal I ran the same du command from earlier and found out the new instance only occupies 1.1GiB. The command is:

Amazon Linux 2 EC2 instance disk usage
Amazon Linux 2 EC2 instance disk usage

Set the timezone

Let’s set the timezone of the AMI2 EC2 instance to keep the log files meaningful:

Set the AWS AMI2 EC2 instance timezone
Set the AWS AMI2 EC2 instance timezone

The EC2 instance is good to go. Next I’ll install Docker and get ready for the real fun to begin.


Step 4. Install Docker and Docker Compose

I’ll now install Docker and Docker Compose. Amazon has a partial guide on how to install Docker in the Linux 2 EC2. Here are the complete steps with details in the comments.

Let’s confirm the Docker “hello world” container runs.

Confirm that the Docker "hello world" container runs
Confirm that the Docker “hello world” container runs

Install Docker Compose

Next I’ll install Docker Compose for simple container orchestration. The version at this time is 1.23.2, but if I have to perform this procedure again I’ll check the latest release. The following steps not only install Docker Compose, but enable command completion which you really will appreciate.

At this point I can control Docker via the terminal (e.g. docker run ... or docker-compose up).

In development I use PhpStorm to communicate with Docker because of its rich reporting, plus I use this IDE to develop my WordPress sites. If you are interested in using PhpStorm to connect to the Docker daemon on AWS, my walk-through is here.

Launch and inspect Docker containers from PhpStorm

Confirm the web server is publicly reachable

Just to confirm that the EC2 instance is alive and running, I’ll launch the nginx:alpine container in the foreground (press ctrl+c to quit). ICMP (ping) is disabled by default on a new EC2 instance, so to test if the server is alive I just open up a browser or curl the headers (both shown below).

Confirm the EC2 instance is publicly reachable
Confirm the EC2 instance is publicly reachable

Step 5. Design considerations

Before cowboy coding right into this, there are several AWS+Docker hosting design considerations that need to be addressed.

From professional experience, discussing and weighing the pros and cons, the benefits and the ROI, and estimating timelines from resources available takes as long, if not more time, than implementing solid solutions. The following solutions have served me faithfully and are free, minimal, and work well for an individual as well as a small team. Budget-minded growth hackers will appreciate the simplicity.

DNS provider: AWS Route 53 or Cloudflare

AWS has something fantastic called Route 533 which is a DNS solution to route domain requests to internal AWS EC2 instances and even S3 buckets. While this is convenient, for five reasons I will not be taking advantage of this service:

  1. I will only have one EC2 instance, so there is only one internal IP
  2. Docker containers are assigned internal IPs randomly or they change often
  3. Route 53 has a usage-based cost which adds up for 10+ sites plus bot traffic
  4. Cloudflare is already my DNS and reverse-proxy cache provider
  5. Cloudflare has DDoS, WAF4, and bad-bot filters in place already.

In production my Cloudflare A records will will point to a static AWS IP. On my local Windows machine I use something called Acrylic DNS Proxy which is similar to, but much better than, editing the hosts file. It’s better because you can use wildcards and regex in your domain name mappings whereas in hosts you have to list each domain and subdomain.

Cloudflare is a powerful, free DNS provider.

WordPress database storage: RDS or Docker containers

AWS has dedicated RDS services for hosting MySQL databases. I could load up a single RDS instance with all the databases for all my sites, staging and production, in one convenient place. However, the lowest hourly compute rate in most zones is US$0.017 per hour.

With RDS and the lowest hourly compute rate, after one month that is about US$12.44, or US$149 per year5. There is also an increasing cost for data transfer. This is an expensive solution.

Instead, I’ll operate several MariaDB Docker containers – one for each site – with persistent storage and backups, but without any additional expense. They can still be accessible with phpMyAdmin or MySQLWorkbench remotely. Remember, most of the site will be cached or served statically, so DB load will typically be low.

Individual Docker database containers are reliable and free.

Automatic database backups

With GoDaddy shared hosting I would have to pay between CAD$84 (25GB) and CAD$132 (50GB) per year from year two for automatic backups.

Shared hosting automatic backup pricing
Shared hosting automatic backup pricing

For literally just a few dollars a month I can have 50GB of dedicated S3 storage, and with a free cron job I can backup my entire infrastructure semi-daily if warranted. Docker makes it easy to dump one ore more databases on a schedule and sideload those dumps to S3, or upload them to Google Drive, DropBox, or the like.

Cron jobs with Docker and bash scripts are reliable and free.

TLS certificate provider for HTTPS

With GoDaddy I would have to pay a minimum of CAD$100 per year from year two for something that is free from Let’s Encrypt, or even automatically free and included with Cloudflare.

GoDaddy SSL certificate costs
GoDaddy SSL certificate costs

For my purposes, with no credit card or sensitive personal information being collected, I’m perfectly happy with the TLS certificates that Cloudflare transparently provides. In the future I can add a Let’s Encrypt script to rotate certificates in conjunction with Cloudflare’s strict TLS support, but for now this is sufficient.

Cloudflare Universal SSL (TLS) certificate: select Full mode
Cloudflare Universal SSL (TLS) certificate: select Full mode
Cloudflare already uses TLS certificates for public-facing sites.

Mail service provider: Amazon SES or G Suite?

With 10+ sites hosted in AWS there are some interesting mail service considerations. Let’s see how much it would cost to handle email through G Suite (Gmail).

G Suite (Gmail)

If there are ten sites and each requires a custom @domain email address, then G Suite would cost US$6 times ten domains, or US$60/mo!

G Suite pricing for custom email
G Suite pricing for custom email

Amazon SES (Simple Email Service)

With Amazon SES it costs $0 to receive the first 1000 emails, US$0.10 for the next 1000 emails, and US$0.10 to send 1000 emails6. There are per-256KB chunk costs of fractions of a penny, and some minor cost for very large attachments. It does not matter what the recipient domain is, so this is aggregate across 10+ domains.

If you’re not sending and receiving thousands of emails a month, nor forwarding joke emails with large images, email costs are likely less than a dollar a month with SES.

Amazon SES should cost less than a dollar per month for handling the email for 10+ sites.

Identify the development workflow

With shared hosting PHP files usually run via FCGI and all sites unfortunately share a common php.ini and root .htaccess. With AWS+Docker each site is independent and self-contained. In other words, we can mix and match PHP versions for testing, as well slowly migrate from Apache to Nginx with testing.

I’ve identified that as a contract developer I’d like to perform theme development and content updates in AWS directly. That is, I’ll edit theme files locally in PhpStorm and they will be synced in real-time to AWS (as well as kept under version control). All my databases will be on a remote AWS server, not on my local machine(s). This way all my databases are in one place (and constantly backed-up) eliminating the need for slow synchronization between my local machine(s) and AWS.

Why have development and staging environments on AWS too? Remember: my production and staging LEMP stacks are completely isolated, plus if I can add more LEMP stacks for experimentation and UAT and clone the production database. Pretty neat.
Developing WordPress themes, plugins, and content simultaneously is a cardinal sin in the CMS world. With a single developer or a very small team it is manageable. In my larger jobs, however, I made use of database locks, WordPress maintenance mode, git push/pull scripts, Docker Compose bash scripts, and complicated media offload and synchronization on S3. Stay small and agile if you go this route.

Step 6. Recreate the shared hosting structure

Next I’ll recreate the shared hosting environment and structure in Docker using isolated LAMP stacks and Docker Compose.

This could be implemented with a simple LAMP7 stack and a fancy .htaccess file to perform the routing like the shared host does, but then all sites would share the same parent .htaccess file (and maybe the same php.ini unless custom fcgi settings are added also). You really don’t want to do this.

I prefer using Docker and Nginx (LEMP8 stack) over Apache, plus Docker is specifically designed for this purpose, so I’ll start with an upstream reverse proxy to isolate and forward requests to the 10+ downstream websites. They can then be migrated and updated in isolation later.

Setup an automated Nginx reverse proxy

With shared hosting we get one IP address (which may or may not change), and some cPanel-configured routing to send traffic on a per-domain basis to individual document root folders where in each an .htaccess file is waiting to interpret the request.

With AWS+Docker we get a static (Elastic) IP at which traffic is delivered to a single black-box EC2 instance, so it is up to us to implement the “document root” routing logic. This is completely out of Amazon’s hands. The EC2 instance just receives TCP traffic with a host header destined for port 80 or 443. We can have some fun here.

Let’s use an Nginx reverse proxy. I use a gently modified version of Jason Wilder’s Automated Nginx Reverse Proxy for Docker to route my domain traffic to the individual (and randomly assigned) Docker LEMP subnets. When a new stack is raised, the reverse proxy is automatically updated. This has made my development life so much easier.

Automated Nginx Reverse Proxy for Docker
Automated Nginx Reverse Proxy for Docker
All server communication takes place over HTTPS and HTTP traffic is 301-redirected to HTTPS by the reverse proxy. For now I’ve generated snakeoil TLS certs for each site I want to migrate, but in the future I will use Let’s Encrypt and enable Cloudflare’s Full Strict TLS mode.

Setup a minimal test site

With the automated reverse-proxy running, the next step is to create a minimal Nginx+PHP site to ensure the configuration and automated routing are correct. I’ve created a site called minimal.example with the TLS key-pair minimal.example.crt and minimal.example.key.

Minimal server example key pair
Minimal server example key pair

I’ve updated my hosts file9 so that minimal.example resolves to my AWS Elastic IP on my development machine and a proper host header is sent with the request. Here is the docker-compose.yml file for the minimal setup to use a PHP-FPM interpreter behind another Nginx server. My up.sh script essentially runs docker-compose up -d, and VIRTUAL_HOST=minimal.example is in the .env file.

Docker minimal site docker-compose.yml
Docker minimal site docker-compose.yml

With the minimal example site up and running, let’s perform a quick latency test. I have a simple PHP file on my AWS EC2 and my production GoDaddy server that echos “Hello”. With DevTools open for each tab, I’ll simply reload the page a handful of times and observe the load times.

Minimal PHP load times between AWS EC2 and GoDaddy
Minimal PHP load times between AWS EC2 and GoDaddy

Remember, I’m running the most expensive shared hosting (Ultimate Linux) plan on GoDaddy, and a micro EC2 instance on AWS. This is encouraging.

Migrate a real staging site

Now I will migrate a real staging site. One of my ancient sites that is no longer actively updated is a good candidate. It’s a single-landing-page for-rent WordPress site. First I’ll recreate the LAMP stack used on my GoDaddy shared server in a docker-compose.yml configuration file. With Docker I can even recreate MySQL 5.6 and PHP 5.6 (that’s just a coincidence). This configuration file is in-progress, but it does what I need for now.

Tip: When you migrate servers the domain and/or HTTP/S may change. This must be updated in the WordPress DB to prevent redirects back to the old URL. You cannot edit the SQL dump directly because many strings are serialized (with fragile string-length information). You could perform a replace in a SQL editor, or better yet, use WP-CLI. Simply run
wp search-replace 'OLD' 'NEW' [-dry-run].

Gotcha: Infinite redirects

The first gotcha is that traffic reaching the Apache daemon from the Nginx reverse-proxy is always HTTP traffic10, but the website’s canonical URL starts with https:// (as all modern sites should). This results in an endless redirect loop to send HTTP traffic to the HTTPS URL.

Tip: Behind the reverse proxy only HTTP traffic exists, so WordPress’s is_ssl() function will always report false and infinite redirect loops may ensue. One trick I use is to inspect the HTTP_X_FORWARDED_PROTO header for “https” in wp-config.php and set $_SERVER['HTTPS']='on'; to fix this.

Gotcha: WordPress is unable to write files

The next gotcha is that the Apache daemon is running as www-data inside a container with a volume link to the EC2 host web folder owned by ec2-user, and both groups are different. This means that Apache is unable to write to the EC2 host machine (no uploading photos). From inside the Docker container Apache sees that the web files belong to unknown user 1000 (which is the ec2-user on the host).

Container file ownership is the EC2 filesystem
Container file ownership is the EC2 filesystem
Tip: WP needs to write to the host, so the content folder needs to be writable on the host. This folder resides on EC2 (ec2-user is the owner) but Apache resides in a Docker container (the container runs as root, but Apache runs as www-data). My trick is to create a user/group pair (www-data) on the EC2, chown the host web folder, and update the UID and GUID used by Apache (www-data). The crux is that the host owner and group should have the same ids as the Apache process, but not root.

On the host let’s add a new do-nothing user and group called www-data with UID/GID 9999, then chown the web folder.

Tip: Remember to log out and back in for changes to take effect. For AWS just close the terminal and reconnect. Don’t forget to close open SFTP connections for the same reason.

Next, the www-data user and group needs to be updated in the Dockerfile or the docker-compose.yml file. Here is the latter:

Below we can see that the user and group ids have changed and are picked up by the Apache daemon.

Phpinfo: The user and groupidDs have changed
Phpinfo: The user and group ids have changed

The Docker container correctly sees the www-data ownership of the host web files.

Container file ownership is the shared www-data user and group
Container file ownership is the shared www-data user and group

Now the first WordPress site is up an running with no permissions issues, and without resorting to chown’ing everything to root (never do that).

Successfully working WordPress site on Docker+AWS
Successfully working WordPress site on Docker+AWS

Setup health checks

We’re getting to the weeds here, but one important feature of Docker is the ability to perform health checks. By adding a few more lines to the docker-compose.yml, Docker can periodically test the server. Here I want a command to curl the site every minute and a half. The curl context is from inside the container, so my target is localhost, and I need to indicate that the request ‘originated’ from HTTPS to prevent a 302 redirect (and health check fail).

Here we can see the status is “healthy”.

Docker container health check
Docker container health check

You can test your health check function from the host with something like this:

Tip: You can see all health check results with this command: docker inspect -f"{{json .State.Health }}" <container> | jq.

Gotcha: Docker Alpine images missing usermod, groupmod, and curl

For the file permissions trick to work with Alpine Docker images (they are the most lightweight mainstream Linux images), the usermod and groupmod commands need to be installed. To this end I extend my Alpine images in Dockerfiles. For the health checks to work, curl needs to be added as well. Below is a sample Dockerfile for an Nginx image based on Alpine.

Then in my docker-compose.yml files I can build and name them like so:

Set a unique project name

One last touch is to specify a unique project name. In the above screenshots the containers are prefixed with staging_ because that is the folder the docker-compose.yml file is in. We can change that easily (docs). Simply make sure there is a .env file and add the key COMPOSE_PROJECT_NAME=.... That’s it. Here is my .env file so far:

Now my containers are prefixed with a nice name.

Pretty Docker container names
Pretty Docker container names

Know the security layers so far

Docker

Docker containers expose ports in the course of their duties. For example, the MySQL container in the previous section exposes port 3306, and the PHP-FPM container in the minimal example exposes port 9000. Containers can expose any ports they like which can be accessed on their stack’s subnet only. For the outside world to reach these containers the ports need to be bound, not just exposed, to the interface. This is the first layer of security with Docker.

AWS

Next, with the exception of some advanced services, I only have port 80 and 443 bound to my AWS public interface. The Nginx reverse proxy handles delegating traffic to the various web stacks deeper in the network. The AWS firewall handles blocking non-HTTP/S traffic at this level. The is the second layer of security with AWS.

Cloudflare

Finally, by leveraging Cloudflare’s resources, we can achieve DDoS mitigation with their traffic pattern heuristics and WAF protection (e.g. detecting common SQL-injection attacks), as well as some measure of bot protection. This is an additional layer of security with Cloudflare. It’s like our website stack is hosted behind two panic rooms.

Add basic authentication

Now, to protect my staging stack from casual visitors I’ll add a basic authentication scheme using htpasswd. This is very simple to implement with Nginx. The nginx-proxy container allows for .htpasswd files to be dropped in for any site you wish to have a basic password challenge presented. I’ve made a helper script to make generating these hashed password files simple.

Generate per-site htpasswd files for the nginx-proxy
Generate per-site htpasswd files for the nginx-proxy

A sample server block in the generated Nginx reverse-proxy looks like this:

Each time a new .htpasswd file is created, remember to reload the nginx-proxy stack: docker-compse restart.


Step 7. Common Docker and Docker Compose commands

These are the most common Docker commands I use. I’ve turned many of them into scripts, but these are the building blocks of Docker workflows.

Docker ps command to see container names and ids
Docker ps command to see container names and ids

Step 8. Next steps

This was just the essentials of getting a staging site migrated to AWS on a single EC2 instance using Docker containers to build up a web stack. To sum up the results so far, I set up an EC2 instance on Amazon AWS, installed Docker, set up a reverse-proxy to automatically route requests to various web stacks, recreated a WordPress staging site and the exact database used on GoDaddy, imported the database into a Docker MySQL container, added basic authentication, and raised the web stack.

There are more steps and considerations to explore in a future post, such as setting up:

  • Email on Amazon SES (TBD)
  • Fail2ban behind the reverse-proxy (where TLS terminates) (TBD)
  • Automatic rotation of container logs (TBD)
  • A tight Nginx+PHP-FPM production web stack (TBD)
  • IAM policies for AWS account access (TBD)
  • Xdebug settings for debugging PHP
  • Automatic Let’s Encrypt TLS certificates for each web stack
  • Automatic database backups with cron and a SQL dump script
  • Periodic automated git commits as a backup mechanism
  • Cloudflare DNS entries
A staging site is now running on AWS under Docker. The next steps will be to set up a production site with stronger security and email (TBD).

Notes:

  1. I have a five-year premium hosting contract which expires this month, but the renewal fees have increased, and GoDaddy is known to injecting surreptitious opt-out code into web pages, so it’s time to move on.
  2. Amazon Machine Instance
  3. In case you are wondering, “53” comes from a DNS service typically being bound to port 53, like a web service is typically bound to port 80 and 443.
  4. WAF = Web Application Firewall
  5. With no long-term lock-in
  6. $0 for the first 62,000/mo if emails are sent via software within the AWS ecosystem, e.g. alerts or WordPress forms.
  7. LAMP = Linux, Apache, MySQL/MariaDB, PHP
  8. E = Nnginx – The first syllable sounds like an “en”.
  9. I actually use Acrylic DNS Proxy. See my section on DNS provider.
  10. TLS communication terminates at the reverse proxy because it has the TLS certificates.