Leave GoDaddy and Go Completely Serverless Using Just Cloudflare and S3 Buckets: Part One

Goal: Completely leave GoDaddy, move email services to Cloudflare, run WordPress offline and serve static HTML pages from Amazon S3, only pay a fraction of the ever-rising GoDaddy hosting fees, and finally move off GoDaddy’s underpowered, EOL’d shared server.

Results

Here is my serverless WordPress website now:

Ericdraken.com serverless website performance
Ericdraken.com serverless website performance
Brief Overview: I show you how to backup your GoDaddy WordPress sites including databases, move your domain registrar to Cloudflare, set up a catch-all email to Gmail, host your WordPress site locally with Docker, repeatably create static HTML pages, easily host your HTML files in AWS S3, set up 301/302 redirects, load dynamic content with Cloudflare Workers, and satisfyingly close your GoDaddy account.

Some advanced topics include: page-paint boosting with multiple domain aliases, cache and edge-cache tuning, WAF and bot protection, SEO boosts, S3 protection, hosting Google fonts, analyzing site speed, and adding comments and a search bar.

Checklist: If you would like to skip to the end, here is a checklist of steps I take when I move a site from GoDaddy.

Background Story

While overseas, I needed to make some WordPress websites, so I got a 5-year Ultimate Web Hosting Linux plan for $450 and installed WordPress myself. About $90 per year on good hosting is not bad, right?

The hosting worked okay, but several times a year I’d have to call GoDaddy due to my websites not loading and find out another tenant is abusing his shared resources. I’d always have to call – they wouldn’t just detect it. Also, the server was on HDDs, not SSDs – those cost extra.

Also, in order to get HTTPS on a few domains, I had to hand over $287.96 to get Standard UCC SSL (have you ever heard of Starfield Technologies, Inc. which is owned by GoDaddy and issues these certs? I didn’t think so).

The contract is over. To keep this same hosting, it costs $18/mo – hosting is now $216 per year (plus tax). That is a far cry from the $90 per year they initially offered – oh, and still on HDDs. Also, the HTTPS certificates need renewal!

But first, here is what $18/mo gets me with GoDaddy:

In sum, my shared GoDaddy server is using a retired, custom RHEL 6 Linux called CloudLinux 6 (the end-of-life was in 2020), has a 6-core CPU from 2013 that is three times slower than my laptop, and has 32 GB of shared RAM at 98% usage1: no power for burst visitors.

My laptop CPU is three times more powerful than my shared GoDaddy server
My laptop CPU is three times more powerful than my shared GoDaddy server

Friends, for $18/mo I get one vCPU on a 6-core CPU shared by hundreds of accounts2, and a 512MB slice of that shared RAM. It’s time to break up with GoDaddy.


Table of Contents

Part One


Part Two


More Reasons to Leave GoDaddy

  • GoDaddy secretly injected JavaScript into web pages.
  • GoDaddy supported the broadly-worded SOPA bill.
  • GoDaddy engages in unreasonable price jumps. For example, a domain renews at $26 on GoDaddy, but only $10 at Cloudflare or Namecheap. Shameful.
    Shameful GoDaddy domain pricing
    Shameful GoDaddy domain pricing
  • GoDaddy’s pricing is garbage. You get free TLS (TLS, by the way, not SSL) with a CDN, Cloudflare, or even Let’s Encrypt. Hint: You need Deluxe to get the “lock icon” in the search bar. These certificates are issued by Starfield Technologies, Inc. which is owned by GoDaddy. Ever heard of them?
    GoDaddy garbage TLS cert pricing
    Garbage GoDaddy TLS certificate pricing
  • GoDaddy always has affiliate link coupon codes for the first year, but never has coupons for renewals – you are stuck at those extreme renewal prices.
  • GoDaddy’s cPanel is so slow I often forget what I was going to do.
    Incredibly slow cPanel
    Incredibly slow GoDaddy cPanel
    GoDaddy's cPanel often just crashes
    GoDaddy’s cPanel often just crashes
  • GoDaddy plays with your environment without notice. Look at my logs, suddenly:
    Where did Memcached go?
    Where did Memcached go, GoDaddy?

Top


Imagine you have a GoDaddy Deluxe shared hosting plan with one vCPU and 512MB of RAM. And, you keep getting emails like this:

GoDaddy price increase email
GoDaddy price increase email

Here is more strength to my resolve to leave GoDaddy.

GoDaddy expensive prices for underperforming shared servers
GoDaddy expensive prices for underperforming shared servers

The following pricing is also garbage and misleading.

For one, it’s in GoDaddy’s best interest to use a WAF (Web Application Firewall) automatically to prevent their data centers from turning into a bot farm. This is available by default and for free with Cloudflare along with effective DDoS protection. Besides, who puts “CDN accelerator” on their “Website Security” cash grab page? Avoid GoDaddy.

GoDaddy garbage security snakeoil

Also, suddenly SSH is not supported on my account anymore. I needed to upload a PHP web shell to force shell access. We have to end this terrible relationship.

GoDaddy's SSH suddenly stopped

I’ve already moved all my domains away from GoDaddy.

Top


GoDaddy is Indirectly Vulnerable to Hacks

Try a Google search for the custom Linux kernel and patch level my GoDaddy shared server uses:
2.6.32-954.3.5.lve1.4.87.el6.3

Hacked GoDaddy websites with web shells visible
Hacked GoDaddy websites with web shells visible

Shared hosts are like modern airplanes: more seats, less room, squeezing out profit.

While not directly hackable, shared hosts use something like CageFS (from CloudLinux) to isolate tenants on the physical server. WordPress is notorious for being hackable through dodgy plugins.

What does this mean?

If one tenant’s WordPress or Joomla site gets hacked and joins a botnet or spam campaign, CPU and RAM and disk and network usage go up and the whole server grinds. This is more common than you think. I’ve had to call Tech Support many times.

Top


Many Domains Point to the GoDaddy Shared Server

How many domains are hosted on my shared GoDaddy server?

I’m in a jailed filesystem, so I need to be creative to find this out.

My GoDaddy sites’ DNS records point to 107.180.51.231. That IP unhelpfully resolves to:
ip-107-180-51-231.ip.secureserver.net.

Running hostname returns a2plcpnl0490.prod.iad2.secureserver.net which resolves to 198.71.235.88. When I inspect /etc/hosts I get:

Now, ip, ifconfig, tracert, and similar IP tools are not installed. But, I can curl outside and discover the server’s external IP reported as 198.71.235.88. However, when I perform a tracert to a domain on this box, the final IP is 107.180.51.231. Then, what is the true IP of my server?

New tactic using the host header:
curl -H "host: my-site.net" http://107.180.51.231/ reaches my site, but
curl -H "host: my-site.net" http://198.71.235.88/ results in a generic 404 page.

That means 107.180.51.231 is my shared server’s IP.

Now, let’s see how many domains we can find that point to the same IP address.

Too many domains point to the same server
Too many domains point to the same server
Are They Joking? Over 700 domains point to this 6-core, 32GB server?

Is this IP a Load Balancer?

Could this IP address belong to a load balancer? Let’s find out. I ran an nmap scan in “polite” -T2 mode and the WAF did not stop me.

If this is a load balancer, then it routes ports FTP (21) and SFTP (21) to “somewhere” as we can see. The alternative is that GoDaddy left FTP open on this “load balancer”, which would be about the worst thing a hosting provider could do.

If it were a load balancer, where would it load-balance an FTP connection? That doesn’t make any sense: people need to upload files to their accounts, not to random machines. This must be a server.

Let’s look at port 443:

The most interesting parts are that Apache is serving a “Coming soon” page on port 443. On my shared server, I do see these default index.shtml and 404.shtml in the root folder. I’d expect a load balancer to be Nginx, unless GoDaddy uses a transparent proxy. But, again, given that I gave nmap an IP address with no hostname, did nmap scan some random machine the load balancer routed it to or is this a server?

One way to tell the difference is to interrogate SMTP on port 465. Notice the server announces itself as a2plcpnl0490.prod.iad2.secureserver.net.

Let’s run nmap again.

Same machine.

Are you not convinced yet? Well, we can interrogate the MySQL info, next.

Let’s run nmap again.

The salt always changes, but it is interesting that all other mysql-info information is the same, except for the thread id which has been incremented and thus makes sense.

Same machine.

Need more convincing that GoDaddy is cramming their servers with as many domains as they can? Let’s use a simple curl command.

I’ll leave this an exercise to the reader as to why, but: the same machine. I’ll repeat an earlier statement:

Are They Joking? Over 700 domains point to this 6-core, 32GB server?

Top


Motivation to Leave

While overseas years back, I needed a hosting solution. I got stuck in the GoDaddy ecosystem, especially with email.

For Context: It now costs CAD $20.15/mo ($17.99/mo+tax) to host static HTML files on a “Deluxe” plan with one shared CPU and 512MB of RAM. Read that again: half a gigabyte of RAM and one shared CPU, all for the bargain price of $242/year.

Why is 512MB of shared RAM painful?

If I want to walk my site to generate static pages, that takes RAM. If I want to resize images that already exist, that takes RAM. If I want more than two visitors at a time on a WordPress site, that takes RAM.

My presentation sites (such as this one) are fortunately simple: I have several themed sites with 98% statically-generated pages. However, only CDN-provider Cloudflare with their edge cache makes the sites load quickly; the static HTML pages still take seconds to load, and eons to load dynamically. It’s now a matter of criticality to leave GoDaddy.

Top


Let’s Get Organized

Let’s perform some thought experiments.

  • Should I render static HTML pages offline and upload them to AWS (or even GitHub)?
    • If so, then where are the media assets served from?
      • There may be too many assets for GitHub, but free (e.g. 4 sizes for each image).
      • AWS S3 is a safe bet for a lot of media at a small recurring cost.
      • How to maintain MIME headers for S3 files?
    • Can I make a VM to host several WordPress sites?
      • If so, then editing websites is restricted to a physical computer.
      • There can be no dynamic content.
        • Dynamic content can come from Lambdas or Edge Workers.
    • Can I make a Docker Compose file to build the infrastructure on demand?
      • If so, then I still need a VM.
        • I can install exact versions to match GoDaddy’s PHP and MariaDB.
  • Should I host the whole WordPress infrastructure in AWS?
    • If so, do I create a Linux EC2 and MariaDB instance for an extra cost?
      • This is prone to high-maintenance security updates and evolving AWS stack.
    • If so, can I Dockerize PHP and MariaDB somehow?
      • This minimizes hands-on maintenance, but some EC2 needs to run still.
      • Can I use AWS Terraform to make this easier?
        • This is not the time nor place for Terraform/Chef/Ansible.
  • Where to host catch-all email servers for multiple domains?
    • Purchase an email hosting plan?
      • Requires another hosting plan, like GoDaddy.
    • Purchase a Google Workspace account?
      • Gmail only allows one email address per CAD $7.80/mo.
    • Use AWS for email somehow?
      • AWS + SES is not managed and is complex.
    • Use the free email forwarder service ImprovMX?
      • ImprovMX does everything I need for one domain.
    • Use the free Cloudflare Email Routing private beta?
      • This could be perfect.
Remember: Being with GoDaddy is like being in a bad relationship: get out, take the essentials, set up mail forwarding, and rebuild to get up and running as soon as possible.

Top


High-Level Plan of Action

AWS is constantly evolving, the Console and CLI are ever-updating, and settings, security, IAMS, roles, backups, etc. are moving and adapting. We shouldn’t be slaves to maintaining the infrastructure behind static HTML pages. Also, I don’t want to be a slave to updating WordPress or core PHP after every security update, nor the plugins and the sieve of security holes they open (remember TimThumb?).

Offline WordPress: WordPress and MariaDB infrastructure will be held offline and have regular backups. A modern laptop is far more performant than a shared host and renders static HTML WordPress pages much faster. This also removes the exploit surface of WordPress and its plugins.

I presently hold terabytes of data in S3 Glacier for a few dollars a month. According to cPanel, with all the raw uploads and multiple backup chains per site, I’m sitting just under 14 GB.

GoDaddy statistics

Publish to AWS S3: I’ll keep all the raw media, generated WordPress images, and static HTML in a VM that can be backed up and versioned, and sync static HTML and assets to S3 on publish.

Will I have an Ubuntu VM with LAMP (Linux, Apache, MariaDB, PHP) to run WordPress to mirror my GoDaddy environment? Yes. With Docker and Compose we can quickly recreate the GoDaddy Linux environment down to PHP 5.6 and MySQL 5.6.

WordPress in a Local VM: I’ll use a Cloud-Init version of an Ubuntu server and install all the WordPress plumbing and try to recreate the GoDaddy environment to make the offline transition frictionless.

HTML text files, PNGs, JPGs, MP4s and the like are self-described by the calling HTML5 tags and extensions, so there shouldn’t be a problem without web server MIME headers, except when loaded in their own tabs (e.g. open an image in a new tab). There are some files for download on my site(s), and hopefully, a visitor will click “Save As”.

What about 404 pages?

Serve 404 Pages: AWS S3 has the option to create custom 404 pages per bucket. My 404 pages have useful information, so on static HTML generation the 404.html needs to be uploaded to S3 as well.

What about HTTPS? S3 URLs are by design not HTTPS.

Cloudflare for HTTPS: I’m in a good relationship with Cloudflare, so I’ll use their relaxed HTTPS to serve my static assets to visitors over HTTPS while they are cached via an HTTP trawl.

There is some dynamic content using PHP. It is very limited in scope and usage. What can AWS or Cloudflare do here?

Cloudflare workers

Cloudflare Workers: My company and I have been burned in the past with AWS Lambdas and their slow cold-start time and expensive warm-starts, so for this, I’ll try Cloudflare Workers for the infrequent dynamic content.

Finally, what about all the catch-all email domains from various sites that are forwarded to various Gmail accounts? This is the problem that held off until now to change hosting providers: it looked like a hosting account was needed to catch and forward emails, and AWS SES is inexpensive but quite complicated. However, many people online rave about ImprovMX.

ImprovMX banner

Cloudflare Email Routing

Email Forwarding: I’ll take a hybrid approach: For my MX 10 high-priority DNS record, I’ll use Cloudflare Email Routing. For my backup MX 20 record, I’ll experiment with ImprovMX.

Top


Step 1. Calculate the Monthly Cost with AWS S3

With AWS S3, the first year is free with the AWS Free Tier, but S3 is estimated to cost only $0.023 per GB of storage and $0.0004 per 1000 GET requests from the second year. Happily, “data transferred out to the internet for the first 100GB per month” is free.

Clodflare statistics over the past 30 days
Cloudflare statistics over the past 30 days

I can see that in a one-month period I would incur:

  • $0.0004 (per 1000) * 293 = $0.12 for the GET requests.
  • $0.023 (per GB) * 14GB = $0.32 for S3.
  • $0 for the 6.8GB bandwidth.

For AWS SES (Simple Email Service),

  • $0 for the first 62,000 emails forwarded from Lambda.
  • $0.12 for GB of forwarded attachments.
  • $0 for the first 1000 emails received.
  • $0.10 for every 1000 emails received thereafter.
  • $0.09 for every 1000 email chunks (256KB including attachments).

Looking at Gmail, I’m using about 2GB of storage and have only a few thousand emails (including spam). I rarely get attachments and can even set an SES filter to reject large attachments as well. A typical email from Amazon showing an AWS invoice or something clocks in under 50KB with the CSS and HTML included. According to the SES pricing calculator, such an email would not incur a chunk cost. Then I expect to pay less than $0.12 per month.

Unless I am missing something, on the AWS side given my current monthly usage, I expect to incur a fee of just 56 cents per month.

Top


Step 2. Back Up Everything

I’m excited to leave GoDaddy – a host that is known to inject surreptitious opt-out code into your web pages. The first step is to back up everything. First I’ll calculate how much space is needed for the backup with some SSH commands.

Public html folder size
Public html folder size

I have about 5.6GiB in my GoDaddy home folder, but only 3.4GiB in my web folders. Everything can be rsync‘d in about 10 minutes, but I will let cPanel do the heavy lifting shortly.

Important Web Folders

These are the most important folders in my shared hosting account. However I choose to backup my hosting account, I must ensure these folders are saved.

  • ~/.htpasswds – Holds the Apache basic auth passwords
  • ~/.mysql_backup – Holds weekly MySQL backups for shared WordPress sites
  • ~/.ssh – Holds the public keys used to SSH into to shared hosting account
  • ~/access-logs (symlink) – Holds the plain-text website access logs for the current day
  • ~/error-logs – If specified in the root php.ini file, this folder holds a running PHP error log. Check the ini file for the actual error logs location if it has moved.
  • ~/logs – Holds the archived website access logs going far back in time
  • ~/mail – Holds all mail messages and attachments.
  • ~/public_html – This is the most important folder. It holds all the shared web hosting files for each website, or just the main website if there is only one.
  • ~/ssl – Holds the TLS keys to make HTTPS connections. This may or may not have any certificates present. I currently use Let’s Encrypt and Cloudflare instead.

Generate a Full Account Backup

cPanel has the functionality to perform a full and complete backup of the sites and all settings including DNS entries and mail-forwarding rules.

CPanel backup of entire website
CPanel backup of entire website

Through cPanel, I’ll be downloading all my files and settings to a single tarball archive. I’ve chosen the home directory backup location and started the backup process.

CPanel full account backup to a single archive
CPanel full account backup to a single archive

After about ten or so minutes, to GoDaddy’s credit, a single archive containing more data and settings than I imagined appears in my home folder. Here are the contents of that archive.

Shared hosting backup archive contents
Shared hosting backup archive contents
Long Paths: The tarball has some very deep folder structures and as such Windows may not show all the folders. A version of 7Zip from 2017 hides long folders, but a newer version correctly shows all the archived folders and folder sizes.

Crontab, DNS settings, Site Aliases, and TLS Certs

The shared hosting SSH home folder ~ is archived to homedir. It contains all the important folders outlined above. The other folders in the archive conveniently contain TLS certificates, crontab information, DNS settings, domain aliases, MySQL dumps, current logs, and cPanel configuration files. This is fantastic because it saves a lot of time rummaging through cPanel to note such settings. I’ll briefly explain what information each special folder contains.

Shared hosting backup archive special folders
Shared hosting backup archive special folders
  • apache_tls – Holds a copy of the TLS keys to make HTTPS connections. This may or may not have any certificates present. I currently use Let’s Encrypt and Cloudflare so I will disregard this folder.
  • cp – Holds a single file containing all the TLDs and SLDs associated with my account
  • cron – Holds a copy of the crontab file. The cron jobs can also be seen by running crontab -l via SSH, and the live crontab file found in either /usr/bin/crontab or /usr/bin/crontab.cagefs. You can edit the crontab file with crontab -e (:q quits).
  • dnszones – Holds the DNS settings for each TLD associated with my account. These can largely be ignored as Cloudflare will be the DNS provider from now on.
  • homedir – This is a copy of the home folder at ~.
  • mysql – Holds an uncompressed dump of each MySQL database made at the time the archive is generated. When inflated, these files can become quite large.
  • userdata – Contains information about the document root and domain aliases of each site associated with the shared account
The cp folder has a single file with all the domain, subdomain and document root information. If I didn’t create a full backup I could find this information in the cPanel, but I’d have to look in two panels: Addon Domains and Subdomains.

Manual locations to look for the domain and subdomain document root mappings.

CPanel add-on domains panel
CPanel add-on domains panel
CPanel subdomains panel
CPanel subdomains panel

WordPress Databases

If enabled, GoDaddy creates weekly backups of the WordPress databases, but just for the previous week. They can be found in ~/.mysql_backup. I won’t be using this folder because the backups are outdated.

Instead, uncompressed database dumps are exported along with the entire hosting account in the full backup archive above in the mysql folder.

Exported mysql dumps in site archive
Exported mysql dumps in site archive

These are great when initially migrating sites, but I won’t be archiving my entire account each time I need database dumps. The databases can also be exported individually or in bulk from the phpMyAdmin panel in hosting. Here I can customize the dump file and specify where I’d like it saved.

PhpMyAdmin database export screen
PhpMyAdmin database export screen

The easiest method to export individual, compressed database dumps is through cPanel again.

Export and download database dumps with cPanel
Export and download database dumps with cPanel

Email Settings

If I had used GoDaddy as my email solution provider, then I would have saved email messages. In that case, I would enable POP3 or IMAP access to all my email accounts and import my mail to Gmail or Outlook. Those settings can be found in the cPanel email panel.

CPanel email settings panel
CPanel email settings panel

What I actually do is use a catch-all setting to forward mail on a per-domain basis to Gmail email accounts. I do this because Gmail has much better spam filters than I could ever set up manually, so I know if I were a shared hosting provider I would not be able to provide as good spam filters as an email industry leader. That being said, I need to record all my catch-all settings. They must be in the archive, so I performed a grep of the archive and found the location of the email catch-all rules: va/.

Grep the archive for a target string
Grep the archive for a target string

These are also in cPanel, but it requires a lot of clicking and refreshing to see every catch-all.

CPanel email catch-all settings
CPanel email catch-all settings

Search the Site Archive for Other Settings

To find important settings in the special folders I like to untar the archive so I can grep for strings like in the email example above. I’ll untar the archive on the shared hosting side and use their resources to perform the search. If you have a huge archive, I’d even recommend starting with the screen command to resume sessions later.

The entire account is now backed up, and all the settings to recreate the shared hosting in a local Docker environment are present or can be searched for quickly.

Top


Step 3. Test the Cloudflare Email Routing Service

Cloudflare Email Routing only works if Cloudflare is your DNS provider as well. Check. I left GoDaddy’s $26 .COM domain names a long time ago.

Cloudflare email

Fun Fact: In the early 2000s, dot-com domains were USD $100/year.

The setup is quick and their screens are bound to change, so let’s skip to the Catch-All screen which is happily just a toggle switch. Cloudflare does UI right.

Cloudflare Email Routing catch-all screen
Cloudflare Email Routing catch-all screen

Then, the MX records are updated to use mx.cloudflare.net as the email forwarding provider. After I experiment with ImprovMX, I may adjust the priority levels.

Reply-To: Cloudflare does not add a reply-to header when forwarding email, so if all your domains point to a single Gmail account, you must be creative when replying.

Here is an example of the DNS records you would have to add in Cloudflare.

Cloudflare MX DNS records updated
Cloudflare MX DNS records updated

Let’s give this a try by sending myself an email.

Cloudflare email caught as suspect phishing attempt
Cloudflare email caught as suspect phishing attempt

This could be Gmail using heuristics to add value to their service. Let’s look at the security information for DMARC (verification of originating server).

DMARC, DKIM, and SPF all passed
DMARC, DKIM, and SPF all passed

Now, let’s send a test email from the Gmail account that is the recipient of the routed email.

Cloudflare rejected email

Except for there being no reply-to header, Cloudflare Email Routing works great, and they give me three email server records for redundancy. Very comforting.

Gmail Plus Trick: The neat Gmail “plus trick” does not work as a forwarding email address. For example, normally, if your email address is me@gmail.com, you can safely give away me+company@gmail.com to get an email tagged with “company” at me@gmail.com to catch the b@stards that sell your email address. This trick does not work with Cloudflare, probably due to DMARC.

Subdomains

Does Cloudflare forward email to subdomains?

Yes. Cloudflare easily supports email routing for subdomains. Create as many subdomains as you like with MX records. However, you must set a TXT SPF record for each subdomain, but it is just copy-and-paste for each subdomain and the root/apex domain.

Add subdomains as MX records
Add subdomains as MX records

So far, Cloudflare is now my email forwarding provider, and GoDaddy’s MX records are still in place (with extremely low priority) as a failsafe while in testing.

Top


Step 4. (Optional) Test a Third-Party Email Forwarder

I’d still like the reply-to headers with emails to one domain I use to catch b@stard email resellers and data breaches. For example, I began to receive phishing emails from my throwaway email address I gave to H&R Block after they suffered a 3rd-party vendor breach.

Email received on Jan 5, 2022:
On December 23, 2021, starting at 4:05 PM EST our account on Amazon’s AWS servers was compromised … After working further with Amazon to understand what happened, we learned a certain set of data, including personal information of some customers was accessed and downloaded including:

  • first and last names,
  • email addresses, and
  • phone numbers.

I used to use the MyFitnessPal app before it was bought by Under Armor. Now, I get emails to my myfitnesspal@****.com catch-all email so I know they were either hacked or those b@stards sold my personal info.

To: myfitnesspal@my-domain.com
Subject: Louis Vuitton Bags Up To 90% Off! Top Quality Low Cost! Shop Online Now!

The point is, no one can be trusted with your email address, so why not make a catch-all email-forwarding domain to Gmail and call out shady companies to the Privacy Commissioner of Canada as I do, or at least delete spam emails to compromised addresses.

Set Up ImprovMX

It’s straightforward to set up your one and only email forwarder with ImprovMX, so please visit their site. Remember to create a password on the Security page, and enable 2FA.

Also, ImprovMX gets grumpy if the priorities don’t match in the DNS records, but the service still works.

ImprovMX would like your MX record priorities to match
ImprovMX would like your MX record priorities to match

Now, I have a temporary abundance of MX records for one subdomain.

Too many MX records

Reminder: You must update the TXT record that holds the SPF information to include both Cloudflare and ImprovMX.
Update the DNS TXT record to include both email forwarders
Update the DNS TXT record to include both email forwarders

I can confirm the MX priorities work as intended, and ImprovMX supersedes Cloudflare at this moment.

Replies: Using an email sent to the catch-all, I do not see the reply-to header nor does replying from Gmail honour the original email address. Emailing a declared alias in ImprovMX similarly has no effect. Bummer.

Not knowing this at the time, it seems this is a premium feature.

[Sending emails using Gmail SMTP] feature is a … custom SMTP solution with all our premium plans where you don’t need to rely on Google. (ref)

In lieu of ImprovMX not supporting the reply-to feature on the free plan (and to be fair, it is more complicated than adding a ‘reply-to’ header), for my use case, using only Cloudflare Email Routing is preferred. If you do not have DNS hosted with Cloudflare, then certainly ImprovMX is beneficial.

Note: You can completely skip using ImprovMX if Cloudflare is your domain registrar.

Top


Step 5. Create a Virtual Machine Running Docker

Let’s extract the tarball backup from step one into an Ubuntu VM. Notice how you cannot restore a full backup in case your sites are hacked – backups with GoDaddy are best for leaving GoDaddy.

Download the full backup tarball
Download the full backup tarball

Create the VM Ubuntu Cloud-Init Image

Before this exercise, I knew nothing about Cloud-Init because I always have some Ubuntu or Mint image handy. However, this was a fun opportunity to set up an automated install of a VM image for WordPress.

I was thinking about making a graphical Linux VM to hold GIMP (Photoshop for Linux), but really, just a halfway decent minimal Ubuntu install as a base. Actually, let’s borrow one of the Ubuntu cloud images. Here is a useful guide. Let’s try the new Impish release.

Ubuntu Minimal 21.10 (Impish Indri) release
Ubuntu Minimal 21.10 (Impish Indri) release

Let’s check out this image.

Qemu info

I’ll expand the virtual image size to 30GB from its current 2.2GB. That should be enough to hold raw WordPress files and all those images of various sizes.

Next, we can convert the IMG to a VMDK file for use with VMWare Workstation.

Excellent.

VMWare VMDK of an Ubuntu cloud image
VMWare VMDK of an Ubuntu cloud image

Continuing with this guide, I’ll create an ISO file with initial setup information holding the absolute, bare minimum information of just a plaintext password to get the ball rolling.

If all goes well…

Create the seed.iso for the Ubuntu cloud image
Create the seed.iso for the Ubuntu cloud image

After moving all the files to a dedicated folder, let’s launch a new VM pointing to the VDMK and ISO.

Files for cloud Ubuntu

I am so excited to get this working and leave GoDaddy.

VM ready for first launch
VM ready for first launch

initrdless boot failed

This is anticlimactic.

Whatever. How about downloading a non-minimal VMDK image? That should work, right? No.

Problems with VMDK

Better done than perfect. I’ll download a non-minimal Cloud-Init Ubuntu release of the IMG and run the same conversion steps.

Before my first run, I’ll make a backup of the VMDK so I can go back and re-cloud-init the image over and over until it runs perfectly.

Files for first launch

After replacing the old ByeDaddy.vmdk with the new one, the VM runs and I’m dropped into the login prompt using the password in the seed.iso.

First login to ByeDaddy
First login to ByeDaddy
How Much VM RAM? Give the VM 0.5 GB for system processes, and 2 GB of RAM per website. MySQL is the memory hog, followed by Apache. If you get “exited with code 137” error messages from Docker, that means you ran out of memory.

&nbsp

How much vm RAM?

Top


Step 6. Install Docker with Cloud-Init

We can configure the Ubuntu server with more options like adding PHP, MariaDB, and Nginx (but let’s just add Docker), and we can disable IPv6 and perform a full update on the first boot. To get Docker installed, recreate the user-data file and then rebuild the seed.iso as in the previous step. We can also take this opportunity to increase the console resolution.

Why Docker? GoDaddy is using PHP 5.6 (EOL: 2018) and MariaDB 5.6 (EOL: 2021), so it is getting harder to find random people that maintain a PPA repo of dead, abandoned packages. However, Docker images of those same packages built in their heyday are easily found.

Here is the (WIP) user-data script to install Docker and Compose. Feel free to think to yourself how this can be improved; it works, so let’s step on the gas.

Restore the backup of the VMDK you made in step 5 (ByeDaddy.clean.vmdk). You made a clean backup, right? If not, rebuild it from step 5 or else log in and manually clean the cloud-init files. Boot up the new VM.

When you first login under glorious 640×480 resolution, you can use the username ubuntu and password byedaddy!. The VM will automatically restart when the cloud-init scripts have finished, and the resolution in VMWare should jump to 1024×768.

Cloud-Init ran successfully
Cloud-Init ran successfully
Tip: When you log in again, run cloud-init analyze show or cloud-init analyze blame to show how long each section took and if there were errors.

After running a few commands, we see that Docker and networking are available now.

ByeDaddy Docker VM ready
ByeDaddy Docker VM ready

Top


Step 7. Replicate GoDaddy’s WordPress Environment

This (one?) GoDaddy server is such a mess. There is PHP 5.6, a single php.ini for all sites, custom PHP extensions I may have to go hunting for, .htaccess files all over the place, dozens and dozens of . files – cPanel artifacts from various upgrades they made, and on.

Let’s migrate over a real staging site. One of my ancient sites that is no longer updated is a good candidate. It’s a landing page for a rental property. Inside the VM, I’ll untar the full GoDaddy backup to a known location.

This could take a long time since I have nearly 200,000 files to uncompress. Next, I’ll recreate the environment used in my GoDaddy shared server in a docker-compose.yml configuration file. Here is a WIP Compose configuration.

We’ll need a .env file as well.

Notice that we need a mount point to the hard-coded homedir path that various config files expect. Also, we need to point php.ini in the GoDaddy account folder to the right location in the Docker WordPress image. I’ve modified mine because my time zone and extensions have changed.

Visit the Local Website

Commercial Plugins: Do you have paid plugins or themes? They will lock you into a specific domain, so if you visit your local WordPress site now, the plugins may cripple themselves. To prevent that, add this code to wp-settings.php. If there are problems, you can add more allowed hosts below.

At worst, you should now only get warnings like:

Warning: The URL //api.wpwave.com/hide_my_wp.json?checking_for_updates=1&installed_version&li=xxxx does not point to a valid plugin metadata file. WP HTTP error: User has blocked requests through HTTP. in /var/www/html/…/plugin-update-checker.php on line 245

So far, we have a Docker container with port 80 open in a VM on some machine. I’ve given the VM a static IP (192.168.10.110) and hostname on a LAN, so if I visit http://byedaddy, I will be redirected to the URL of my staging site back on GoDaddy. Not good.

I’ll pick one Linux machine to modify the /etc/hosts file to redirect the website URL back to the VM.

When I visit http://staging.innisfailapartments.com, it takes a very long time to load, but the page loads, kind of. A compression plugin broke. The page is also glitchy. You have to be passionately in love with WordPress to put up with this. I, on the other hand, will cut losses and remove plugins.

content decoding failed

Careful: Moving a WordPress site may cause file and folder permissions to change, ownership to change, commercial plugins that phone home to change, 3rd-party APIs to stop working, and just calamity in general. With a keyboard machete, I started chopping out plugins until the page loaded performantly so I can perform a post mortem afterward.

Finally, with warning notices turned off and some plugins disabled, the WordPress site looks good.

Innisfail site working locally

For Now: Make the staging site and all links HTTP. You cannot edit the SQL dump directly because many strings are serialized with fragile string-length information. I’ve included WP-CLI in the Docker Compose file. Simply run:

wp search-replace 'https://...' 'http://...' [-dry-run].

Top


Step 8. Generate Static HTML Offline

For WordPress, many years ago I modified a plugin called Simply Static that walks the site and generates HTML files and copies the JS, CSS, images, etc. so another folder. I modified this plugin to include asset load balancing (it’s an SEO thing), as well as an automated purge of the Cloudflare cache.

WordPress Simply Static plugin
WordPress Simply Static plugin

For most people, out of the box, this is good enough. There are many static HTML plugins for WordPress, but a heavily-customized Simply Static plugin works well for me.

Successful export of static HTML and assets
Successful export of static HTML and assets

Alternative: Wget for Website Mirroring

The GNU Linux program wget is amazing. Most people use it interchangeably with curl, but did you know it can recursively mirror a website and make in-situ changes to links in the HTML pages?

404 Page: You may need the --content-on-error flag if you want to capture the 404 page. Even better, make sure there is a /not-found/ path that renders a 404 page, but with status code 200. In that case, you may need to run wget twice, the second time to specify the normally-not-discoverable 404 page URL.

Alternative: HTTrack Website Copier

HTTrack Website Copier is old, but it’s free, works in Linux and Windows, and it does the job. I happen to have a copy from 2003, but it is still maintained.

HTTrack Website Copier

The Static 404 Page: Make it Generic

Be sure to remove any unique information from the static /not-found/index.html such as “Nothin found for foo” and replace it with the generic “Nothing found”. You can either do this manually with grep or sed, or modify your theme’s 404.php beforehand.

Automatically Update the Copyright Year

You can use a script and HTML snippet like these. Simply edit your footer.php and confirm your staging site looks good.

Note: If you use my style of IIFE and addEventListener, you can place this script in the header.

Side Note: What is Load Balancing?

Ever heard of SEO-hurting browser blocking?4 Let’s use Google Chrome as an example of this.

Chrome has a limit of 6 connections per hostname, and a max of 10 connections. This essentially means that it can handle 6 requests at a time coming from the same host, and will handle 4 more coming from another host at the same time. This is important when realizing how many requests are firing off from different hosts on the same page. (ref)

I made a custom algorithm to check the size of each CSS, JS, WOFF, etc. that loads in the header, sort them, and then alternatingly make them load from either ericdraken.com or static.ericdraken.com. This ensures I get my ten downloads, not six, to get the page styling finished faster than other websites would.

Exceed Chrome's limit of 6 downloads per domain with load balancing
Exceed Chrome’s limit of 6 downloads per domain with load balancing

All the static HTML and assets should now be under /var/www/production in the Docker container which is mounted inside the ByeDaddy VM.

Top


Step 9. Sync Static Production Files to S3

By all accounts, we should be able to do:

aws s3 sync . s3://mybucket --delete

Let’s see how easy it is to sync my production/ folder. First, we need an S3 bucket.

Use the same name as the website for the S3 bucket
Use the same name as the website for the S3 bucket

What should your bucket be named?

Your bucket name must be the same as the CNAME. (ref)

The host header in a request is supposed to match the bucket name for routing reasons. We can create a bucket named innisfailapartments.com with defaults like ACLs disabled, no tags, no versioning, and no locking, however, we must set the bucket to public: uncheck everything and ignore the warnings.

Enable static website hosting with a click
Enable static website hosting with a click

I’ll set the index document to index.html, and the error document to a custom not-found/index.html page. Be sure to not have any leading slashes: “paths” in S3 are keys in a map. My empty bucket is now public.

With no public-access JSON policy set, S3 returns 403 errors
With no public-access JSON policy set, S3 returns 403 errors

We need to add a JSON policy to actually allow public GET access. My JSON policy looks like this:

After adding the JSON policy, the error changes.

New Cloudflare error

Back in the Cloudflare DNS console, I’ll update the CNAMEs for the domain and some subdomains.

Update Cloudflare DNS CNAMEs for the TLD and SLDs
Update Cloudflare DNS CNAMEs for the TLD and SLDs

Configure the AWS CLI

We’ll want to use the CLI to work with S3 because the AWS UI makes my head and eyes hurt. First, create a new user to partition access for websites away from any other neat AWS projects you have going.

Create a new IAM user in AWS for the websites
Create a new IAM user in AWS for the websites

Next, create a keypair, note the secrets, and run aws configure in the ByeDaddy VM:

JSON: In the above configuration, be sure to use ‘json’, not ‘JSON’. If you make a mistake, you can use sed 's/JSON/json/g' -i ~/.aws/config.

Sync Local Files to S3

We can now sync a local folder to the S3 bucket.

Excellent. The static HTML files have been uploaded to S3.

Successful upload of production HTML files to S3
Successful upload of production HTML files to S3

And, here they are.

Successful upload of static files to S3
Successful upload of static files to S3

Top


Step 10. Verify the 404 Page Works

Let’s check that the 404 page works.

The 404 page doesn't work
The 404 page doesn’t work

Objects in S3

Make sure the index.html and error page do not have a leading slash as tempting as it may be to add them. “Paths” or keys in S3 do not start with a leading slash.

No leading slash

Top


Step 11. Allow Cross-Origin Resource Sharing (CORS)

Now, in every company I have ever contributed to, inevitably someone would yell out, “Silly, mother-sillying CORS!” (the language here has been toned down).

S3 CORS violation errors
S3 CORS violation errors

Here is a simple, permissive CORS policy that works well for a static site.

We can add the CORS declaration manually through the AWS UI in the S3 CORS section.

Top


Step 12. Allow Subdomain Virtual Hosts with S3

Remember how I said that I have a load-balancing algorithm to rewrite asset URLs to use subdomains to get around Chrome’s simultaneous-download limit? How will subdomains work when the bucket name is also supposed to be the domain name? Duplicate the bucket using the subdomains? That’s one suggestion, but that comes with a cost.

No such bucket

Ninja Solution: Spoof the Host Header

Let’s be ninjas and trick S3 into thinking the hostname is always innisfailapartments.com. Back in Cloudflare, we can set Transform Rules, one of which is to modify the headers sent back to the origin. See where I am going with this?

Cloudflare request header transform to always use the root domain
Cloudflare request header transform to always use the root domain

Looks good, right? Wrong. Any attempt to modify the host header requires a paid subscription ($20/mo at this time). That’s fine: pay Cloudflare – they’re awesome. But, can we get around this limitation? Yes. For one, 302-redirects will work as Page Rules, but that’s ugly.

Fragile Solution: Proxy Subdomain Requests Through a Worker

Piping or proxying the request through a JavaScript Worker is also ugly but could work (if you figure out timing, memory limits, and lots of edge cases). Here is how it would work just for completeness.

Ugly Solution: S3 Bucket-to-Bucket Redirect

S3 buckets have the option to redirect to another bucket. Let’s explore that.

Redirect one S3 bucket to another
Redirect one S3 bucket to another
S3 redirects are not cloaked from the user
S3 redirects are not cloaked from the user

Dang, it. An S3 redirect to another bucket is not cloaked from the user.

Crazy, Effective Solution: Duplicate the Bucket5

I really, really don’t want to write a Cloudflare Worker to be an HTTP proxy just to avoid paying Cloudflare $20/mo for a host header rewrite. However, I also really, really don’t want the SEO hit of all the 301-redirects back to my TLD. How much extra is the cost to duplicate an S3 bucket out of curiosity?

Recall the S3 pricing:

  • $0.0004 (per 1000) * 293 = $0.12 for the GET requests.
  • $0.023 (per GB) * 14GB = $0.32 for S3.
  • $0 for the 6.8GB bandwidth.

The GET requests do not increase. The bandwidth does not increase. Only the storage cost doubles from 32 cents to 64 cents. Done! Let’s just duplicate the bucket and call it a day.

Did You Know: Not all crazy ideas are bad ideas.

Bucket Replication Results

Are you seeing this page-load speed? Amazing.

S3 page load speed is amazing
S3 page load speed is amazing

Top


Step 13. Query Strings and S3 Bucket Keys

We’re kicking butt now, but there are some gotchas. One of which is how to deal with query strings (e.g. ?ver=1.4.1) in GET requests for S3 objects? S3 is an object map store, so an exact “pathname” (as a key in the map) is required and is even case sensitive. Apache and NGINX helpfully ignore the case and try to figure out what you are requesting, but S3 is unforgiving. Just add a Cloudflare Request Transform to drop the query string on requests to the origin (S3).

Update: It seems that S3 now ignores the query string in bucket requests. Awesome.

To bust the Cloudflare cache, query strings still do the trick if you set the cache level appropriately.

Top


Step 14. Enable HTTPS Websites with HTTP-Only S3

S3 doesn’t respond to HTTPS requests; websites use HTTPS.

We can use Flexible TLS from Cloudflare so website visitors access “the website” via HTTPS (TLS), and then Cloudflare proxies the request to S3 via HTTP.

Enable Flexible TLS in Cloudflare
Enable Flexible TLS in Cloudflare

If Cloudflare is set to use Flexible TLS, then Cloudflare will rewrite requests to S3 using HTTP (no S).

Setting your encryption mode to Flexible makes your site partially secure. Cloudflare allows HTTPS connections between your visitor and Cloudflare, but all connections between Cloudflare and your origin are made through HTTP. (ref)

Yoast SEO Plugin: Some social links like og:url, og:images, and twitter:image will use the protocol of the staging site: HTTP. This can negatively affect SEO.

We know the staging site can be either HTTP or HTTPS, so if we simply change the bound WordPress port to 443, change the staging URL protocol, and perform a simple trick, then our static HTML will have all (or mainly) HTTPS URLs.

Protocol of the staging site

We could run an NGINX Docker container to forward various local sites to various WordPress staging stacks all essentially vhosted on port 443. I mean, do we want to have the best Docker infrastructure on the block, or break up with GoDaddy and get on with life? But, I digress. Instead, let’s perform a simple trick and make one site respond via HTTPS:

  • Bind ports 443:443 in the Compose file.
  • Add a snake oil TLS cert via apt.
  • That’s it (also, docker-compose down and up).
HTTPS is now working in the WordPress staging site
HTTPS is now working in the WordPress staging site

Finally, rebuild the static HTML files and sync them to (both of) the S3 buckets again.

Gotcha: If you customized your wp-config.php, check on your definitions regarding TLS:

Gotcha: If you have hard-coded HTTP URLs in your WordPress posts, you may need to run:
wp search-replace 'http://...' 'https://...' [-dry-run].

Alternative: Write a Replacement Plugin

You could write a plugin that rewrites the URLs via regex on static HTML creation. However, try not to miss escaped URLs in scripts like: https:\/\/staging..

Alternative: Replace URLs with SED:

You can manually run sed on all the static *.html files to replace http:// with https://. You will have the same problem as above, however.

Alternative: Use Protocol-Relative URLs

You can use protocol-relative // URLs everywhere (except in OpenGraph meta tags) so both HTTP and HTTPS work.

Top


Summary of AWS S3 Bucket CLI commands

Top


Please see Part Two for the rest of this article; I’m splitting this into two parts because the DOM reached almost thirteen thousand nodes which hurts your browsing experience. Sorry about that.

Too many DOM nodes for SEO and more
Too many DOM nodes for SEO and more

Next: Part Two


Notes:

  1. RAM usage changes time to time, but 98% usage is dreadful at any time.
  2. I’m a tenant in CageFS so there is no way to count how many real users are on my shared server, but I found over 700 domains point to the same server.
  3. To be fair, this patch level is for Feb 2022, which is only two months out of date.
  4. SEO is hurt because Google takes into account when the page first paints, and when it first becomes interactive.
  5. Complain if you want about me using the word ‘crazy’, but according to https://dictionary.cambridge.org/dictionary/english/crazy, the first definition is “stupid or not reasonable” so please don’t be crazy and go out of your way to suggest an alternative word.