Block YouTube Ads on AppleTV by Decrypting and Stripping Ads from Profobuf

So many ads

In a Nutshell

I discovered that putting a man-in-the-middle proxy between my Apple TV and the world lets me decrypt HTTPS traffic. From there, I can read the Protocol Buffer data Google uses to populate YouTube with ads. It is too CPU-intensive to decode Protobuf on the fly, so instead I found a flaw in the YouTube implementation of a Protobuf feature that lets us reliably change one byte to obliterate ads.

What follows is a reference guide for setting up a bare-metal network router to block malicious ads, obnoxious ads, tracking, clickbait, cryptojackers, scam pop-ups, Windows spying on you, and more, using blocklists to protect all networked devices.


Goal: Let’s build a cryptographically strong router with FreeBSD and pfSense to eliminate YouTube ads by exploiting a feature in the Google Protocol Buffer format—blocking pre-roll, mid-roll, and end-roll ads on Apple TVs and iPhones network-wide.
Disclaimer: I want to support content creators, so after a few months of blocking YouTube ads I started paying for YouTube Premium; just because I can break something doesn’t mean I should.

Sections

Part 1 – Set Up pfSense on Bare Metal

  1. Why Block Ads and Behavior Tracking?
  2. Required Router Hardware
  3. Unboxing the Hardware
  4. Install pfSense on Bare Metal
  5. First pfSense Boot
  6. Enable the AES-NI Cryptographic Instruction
  7. Enable RAM-Disk Caching
  8. Dashboard Widgets
  9. Ad Blocking with pfBlockerNG
  10. Isolate LANs for Security
  11. Class B IPv4 172.31.1.0/24 Network for Untrusted Devices
  12. Add Firewall Rules

Part 2 – Isolate Network LANs

  1. Set Up the Untrusted Wi-Fi AP
  2. Automatic pfSense Configuration Backups
  3. Unable to Reach 172.31.1.x from 192.168.10.x
  4. Replace Stock Firmware on the AC1200 Wi-Fi Access Point
  5. Archer C5 v2 to the Trash, R7000 as the New Wi-Fi AP
  6. Set Up the Trusted Wireless Network
  7. Network-Device Interconnectivity Check
  8. Windows File-Sharing Gotchas
  9. Public Service Announcement: Microsoft Edge Browser

Part 3 – Set Up DNS Ad Blocking

  1. Block Clickbait, Incessant Ads, and Dangerous Sites
  2. Intercept All DNS Requests, Even to Hard-coded DNS Servers

Part 4 – Trick the YouTube Ad Algorithm

  1. How to Restrict YouTube Ads on Apple TV?
  2. Trick the YouTube Ad Algorithm Instead
  3. Research Into YouTube Advertising Spend
  4. New Goal: Convince YouTube I’m 70 and in Italy
  5. Selectively Route Apple TV Through the VPN
  6. Selectively Route Apple TV YouTube Traffic Through the VPN
  7. Gotcha: DNS Race Condition
  8. Gotcha: Authentication Failure — 403 Forbidden Error
  9. Gotcha: YouTube Is Now Showing UK Ads, Not Italian Ads
  10. Find a VPN Exit Node Without an ASN Leak
  11. Hijack Google Video DNS Queries
  12. New Goal: Programmatically Add IPs to the Firewall Policy Rule
  13. Research Python Methods to Hijack DNS Queries
    i. Perform an Rsync Disk Backup
    ii. Install the pfSense REST API
    iii. Explore the Unbound Python Module
  14. Smoke Test: A Python DNS-Hijacking Script

Part 5 – Decrypt HTTPS Traffic

  1. New Goal: Research and Install a Squid-Like Proxy
    i. Fun Fact: Jailbreaking iPhones in Japan
  2. Install a Fake-but-Trusted CA Certificate on Apple TV and iPhone
  3. Experiment With Squid and SquidGuard
  4. Self-Host the CA Certificate
  5. Abandoning Squid: Too Slow, Too Heavy
    i. Run an Rsync Diff of Changes
  6. Install MITMProxy in a FreeBSD Jail
  7. Exploring MITMProxy
  8. Patch MITMProxy Source Code for Server SNI Interrogation

Part 6 – Intercept Apple TV and iOS YouTube Ads

  1. Smoke Test: Intercept YouTube Ads With MITMProxy
  2. Examine uBlock Origin Regex Patterns for Inspiration
  3. Surgically Alter the JSON Response to Remove Ads
  4. The iOS YouTube App Uses Protobuf, Not JSON
  5. Timing Analysis to Detect Ad Videos
  6. Decode the YouTube Protobuf Responses
  7. Ad-URL Polymorphism
  8. Smoke Test: Intercept and Decode Protobuf in Python
    i. Pure-Python Benchmarks
    ii. Pure C++ Benchmarks
  9. Fuzzing the YouTube Video-Ad Responses
  10. Use Burp Suite Tools for Penetration Testing
  11. Exfiltrate the Proto Schemas From the App, Cleanly

Part 7 – Reverse-Engineer Protobuf Messages

  1. Hardcore Deep Dive into Protobuf and Wire Format
  2. Exploit a Protobuf Feature to Easily Remove All Ads by Changing One Byte
  3. Smoke Test: Remove Ads from Protobuf in O(n) time
  4. Analysis of This Successful Adblocking Technique
    i. Summary
    ii. Timing Analysis
    iii. Knock-On Benefits
    iv. Future-Proof
    v. Should Google Be Worried?
  5. The MITMProxy YouTube Adblocking Script

Part 8 – Summary

  1. YouTube Premium
    i. Experiment in Ad Viewing
    ii. $0.15 as a Ballpark CPV
    iii. CPV from U.S. Advertising Spend Divided by Total Views
    iv. Is YouTube Premium Worth It?
  2. DMCA, Sony, Viacom
  3. Summary of Accomplishments

Why Block Malicious Ads and Behavior Tracking?

You are a valuable commodity, bought and sold without your knowledge or consent. You will be tricked with clickbait, distracted by intrusive ads, and enticed to leave the site you are on at every opportunity. Plus, everything you do online is monitored so your habits and searches can be remarketed and resold for years.

clickbait

Privacy — Knowing what you watch and read, which phone you own, what you stream on Netflix, what you shop for, what you ask Alexa, your taste in music, and more is unbelievably valuable to advertisers. Spying on people became such a problem that Europe passed the GDPR, forcing every site to ask if you accept cookies (and we blindly click “OK” just to hide the banner). We need to wrestle privacy back ourselves.

Bandwidth — If privacy doesn’t concern you, consider this: between 25 % and 40 % of network traffic is ads, tracking scripts, and JavaScript loaders for trackers like fingerprint.js, googletagmanager.js, or real-time analytics such as Hotjar. Have a 100 Mbps connection? Functionally, it may run at 60 Mbps.

Clickbait — “You won’t believe what Tom Cruise did next—he…” You may click, and then you’re caught in the spider’s web. Fake news, “sponsored” posts disguised as articles, or “underscored” content can route you to pages with a dozen shady ads that bypass Google’s filters. Clickbait is incredibly profitable to scammers.

Cryptojacking — Some sites load crypto-mining JavaScript (e.g., CoinHive.js) that overheats and abuses your computer to earn a few pennies. Others inject scripts that try to drain your crypto wallet or trick you into sending cryptocurrency.

Takeaway: Tracking and tricking you is highly lucrative—and only you can stop it.

Top ↩


Required Router Hardware

Virtual machines, Docker images, and Raspberry Pis are not performant enough to protect an entire SMB network. Instead, we need dedicated hardware with a cryptographic instruction set whose only job is to route, decrypt, and monitor packets. Here’s what I used:

  • A mini PC with the AES-NI instruction set (e.g., J4125)
  • Several gigabytes of DDR4 RAM (e.g., 32 GiB)
  • A decent mSATA SSD (e.g., 128 GiB)
  • A USB drive to flash pfSense

Top ↩


Unboxing the Hardware

I ordered a J4125 mini PC from AliExpress, 32 GB of DDR4 RAM, and a 128 GB mSATA SSD from Amazon, and I’m about to assemble them for the first time.

Warning: I searched diligently for a barebones mini PC that shipped without RAM or an SSD; nothing stops an overseas seller from including generic components while charging Samsung prices.
Tip: 128 GB of storage on a router? Yes. That’s plenty for logs, will reduce wear on the SSD, and leaves room for packet captures—or even an edge cache for NPM and Docker.

A beautiful box, isn’t it? It has only three LAN ports, but you can expand those with network switches.

The J4125 AES-NI quad-core fanless mini PC
The J4125 AES-NI quad-core fanless mini PC
The J4125 pfSense router built from a fanless mini PC
The pfSense router built from a J4125 fanless mini PC

Top ↩


Install pfSense on Bare Metal

I’ve never used pfSense before, so let’s explore it together. The compressed image is about 360 MB and can be flashed to a USB drive with the Etcher AppImage—very cool. VGA install or serial? I thought about serial, but:

Let's not use serial on the router
Serial looks painful—let’s skip it

Serial access would be a hassle in an emergency: the port is internal, and there’s no RS-232 or JTAG connector—just narrow header pins. Yikes. Let’s use VGA and plug in a USB keyboard—get ready to navigate with arrows and tabbing.

J4125 mini PC BIOS over VGA
J4125 mini PC BIOS over VGA

I’m following this guide on YouTube. I’ll pass on encrypting the disk since I would like to avoid entering a passphrase each time the mini PC reboots. A stripe disk is fine since there is only one disk. I have no idea what to expect yet, so I will pass on dropping to a shell for a more advanced configuration.

Top ↩


First pfSense Boot

I ejected the USB drive that contained the boot image (important) and rebooted the little box. It played a melody on the internal speaker—there’s a buzzer inside, and thankfully it isn’t very loud.

Do I need to have a LAN cable connected already, or can I just power it on? I’ll start pfSense and let it complain if it wants… and, according to the YouTube tutorial, I should guess which port is LAN 1. I’ll do that now.

I figured out that I should set LAN 1 to a static IP address outside my existing router’s DHCP range, so I chose 192.168.1.3. Now I can access the admin web portal (admin/pfsense). Hooray.

Yikes—the mini PC beeped at me and informed me that “admin” has logged in. That startled me a bit, but hey, that’s pretty neat.

First time logging into pfSense admin UI
First time logging into pfSense admin UI

Top ↩


Enable the AES-NI Cryptographic Instruction

I played around with the setup wizard, used the defaults, and reached the web configurator. The first thing that caught my eye was AES-NI CPU Crypto: Yes (inactive). I went out of my way to buy a mini PC with AES-NI—what gives?

Ah—AES-NI must be enabled under System › Advanced › Miscellaneous. Why doesn’t it auto-detect this and choose the best option? I’m glad I spotted that; otherwise, this mini PC might as well be a Celeron J1900 from yesteryear.

Top ↩


Enable RAM Disk

Having 32 GiB of RAM, let’s take advantage of that and use a generous amount for /var and /tmp, and since—hopefully—this 128 GiB SSD has wear-leveling, let’s take a RAM-disk backup every hour.

Let's take advantage of RAM disk
Let’s take advantage of RAM disk

Reboot! AES-NI is now active.

Top ↩


Dashboard Widgets

This dashboard is pretty slick. I’m just discovering that there are widgets that can be added to the Dashboard, including S.M.A.R.T. to alert us if the SSD is going bad. Nice.

Final pfSense dashboard after all setup
Final pfSense dashboard after all setup

Hang on—when I added the Services Status widget, something called PC/SC Smart Card Daemon shows up. What is that? Research shows it’s a daemon for hardware smart keys that we can probably do without. It can be disabled in the /etc/rc.bootup file like so:

Wait. After some time went by, I noticed the router slowed down—fatally.

IPsec without the SD Card Service will cripple the router
IPsec without the SD Card Service will cripple the router
Warning: Do NOT disable the Smart Card service; IPsec needs pcscd. If you start experimenting with an IPsec VPN tunnel and the daemon is disabled, your hard disk will fill up with logs and your CPU will run hot.

Top ↩


Adblocking with pfBlockerNG

This unboxing and setup has been fun, but I’d like to block all the bad traffic on my network. I’ve been using a workhorse of a DNS-level adblocker called Pi-Hole on a—yes—Pi, but it would be nice if I could reclaim that wee bit of hardware for something else and use a comparable add-on module in pfSense. Let’s explore that now.

pfBlockerNG is a very powerful package for pfSense® that provides advertisement and malicious-content blocking along with geo-blocking capabilities.

Question: Do I install the plain pfBlockerNG package or the pfBlockerNG-devel package that looks like a developer version? I’m a software developer, so this is for me, but am I a pfSense developer? No. Maybe it will show me advanced logs or let me mess about with Lua? Let’s Google this.

From here, random people say to install the development version. Another blogger advocates using the dev version as well. Meh, I guess we can install jq, rsync, and Python 3.8. It doesn’t feel like a development version since it has exciting dependencies.

Install pfBlockerNG-devel not the other one
Install pfBlockerNG-devel, not the other one

That was painless and added only about 20 MiB. It seems many dependencies are already part of pfSense. The knight at the end of Raiders would say I have chosen wisely (though, why did Indy age like a normal person up to Indy 4 if he drank the immortality water that the thousand-year-old knight also drank?).

Wizard time.

The pfBlockerNG wizard had four steps but step three is like 50 steps in one
The pfBlockerNG wizard had four steps but step three is like 50 steps in one

There are a lot of options in step three. This is not like Pi-hole at all. I’m going to come back to this and set up my network instead so I can retire my Nighthawk R700—or give it new life as a Wi-Fi AP.

Fix: If the pfb_dnsbl service won’t start or the status tab shows [ Missing CRON task ], try deleting the empty file /var/run/booting (ref).

Top ↩


Isolate LANs for Security

An opportunity presents itself: I can create real networks on each of the three router Gigabit ports (not VLANs). Should I do so? Yes—yes, I should. I’d like a dedicated hardware network for all my phoning-home spy devices (Alexas and Apple TV) so they don’t flood my main network with metrics and “sure I’m muted and not listening to you” audio payloads.

I can see it now: a Wi-Fi AP on a hardware LAN that is isolated from everything else, dedicated to these gadgets, and routed through the adblocker and able to trap hard-coded DNS queries to 1.1.1.1, 9.9.9.9, and others (I’ll have to explore this) so YouTube on my TV doesn’t sneakily bypass Pi-Hole any DNS-level blocker. It’s such a utopian outcome I may not be able to sleep.

I’ve decided that my bottom-shelf TP-Link router—so old that “AC1200” might as well be “A.D. 1200”—will become the Wi-Fi AP for those IoT spy devices.

In sum, there will be a dedicated hardware LAN:

  • with a wireless AP (AC1200) for Amazon/Apple gadgets and the TV,
  • with a wired switch for all the beefy computers and clusters in my lab,
  • with another wireless AP (R7000) just for iPhones and watches.

As an aside, since doing an Offensive Security hacking course, I rare-earth-magnet-strongly suggest isolating Wi-Fi devices from any critical LAN segments connected to machines used for daily banking, stock trading, or crypto wallets (aside: don’t trade crypto).

Top ↩


Class B IPv4 172.31.1.0/24 Network for Untrusted Devices

The Class B IPv4 range 172.16/16 is a valid block of private IP addresses. I’m not comfortable with Alexa and Apple TV being on the same network class as my main LAN segment, so I will banish them to the Class B private network at the hardware level, and my more-trusted LANs will stay on the traditional Class C network (192.168/16). This naturally mitigates any misconfigured iptables rules because there are no routes between the two networks.

Set up a physical network for the untrusted smart devices
Set up a physical network for the untrusted smart devices

Be sure to enable the DHCP resolver on the physical NIC that will connect smart devices (which mainly just tell me the weather and creepily listen to me sleep).

From this point, DHCP works on this new network, but by default it assigns IP addresses and performs no routing. All traffic is blocked.

Top ↩


Add Firewall Rules

We need to add rules manually so traffic on the physical NICs goes somewhere.

Our first rule to allow eth3 to access the Internet
Our first rule to allow eth3 to access the Internet

There’s a logging message; let me reproduce it:

Hint: the firewall has limited local log space. Don’t turn on logging for everything.

I read that as: “Congratulations on not cheaping out on your SSD. Now go forth and log everything, my son.”

I’m not a new-age, fancy-jazz, smart-plug–everything guy who forgot how to turn on a light without his phone, so I do not need “smart devices” on the same network as my phone (why create dozens of wireless attack vectors into your home?). I’m classically trained to biomechanically actuate an electromechanical current interrupter on the wall—and light, let there be.

Top ↩


Set Up the Untrusted Wi-Fi AP

How do I reach the admin UI of the AC1200 Wi-Fi AP now? I factory-reset it and plugged the WAN NIC into the ETH3 NIC on the pfSense router, but both devices just blink at me.

I suppose I can Wi-Fi into the factory-reset AC1200. Yikes—2016 was a bad year for responsive web UIs. This is horrible; I’ll pull out a netbook for this. One sec.

It seems the Archer C5 has no AP mode. This is my problem, not yours, but I’m still going to vent.

Oh, and the “refresh” icon at the top of the DHCP Leases page in pfSense is not “refresh”; it’s “reload service.” Whoops.

Well, I bricked the AC1200 router. I will have to run an Ethernet cable manually… but wait, my thin notebook PC has no Ethernet port and needs a USB-NIC adapter. Happy Friday (sarcasm).

Tip: Connect LAN to LAN, not the AP’s WAN to pfSense’s LAN, unless you want double NAT-you don’t.

There were shenanigans, but I set the LAN IP of the AC1200 to 172.31.1.100, the ETH3 NIC IP of the pfSense router to 172.31.1.1/24, and configured pfSense’s DHCP service on ETH3 to assign addresses 172.31.1.101–150. What failed was setting the AC1200 to 172.31.1.2; it was unreachable (reason unknown). Oh yes—I had to turn off firewall-y things and NAT Boost, basically dropping this TP-Link router’s power to that of a potato battery. The settings above let me access the AC1200 remotely now.

The other video ended, so I started following this YouTube tutorial (set playback speed to 1.5x).

There are some good tutorials on advanced pfSense... if you can sit through the ads
There are good tutorials on advanced pfSense—if you can sit through the ads

One more thing: I installed the nmap package for pfSense, scanned the AC1200 router, and found some sneaky ports open.

Port 20005/tcp is a print-server port that I’ve now closed. However, the Archer C5 AC1200 is vulnerable to all kinds of Kali mischief, so it was wise to put it on its own network. I’m not sure how to close port 22 and the sshd service on the AC1200 because the stock firmware is ancient and crippled, so I’ll just block port 22 for the whole LAN segment.

I’ve also disallowed private networks from ingressing on the WAN (see the next section for setting up a DMZ).

Top ↩


Unable to Reach 172.31.1.x from 192.168.10.x

Ping and Traceroute are aiding my efforts to reach the AC1200 Wi-Fi AP from my Trusted LAN. I went ahead and added the subnet to the Symantec firewall rules just in case (Symantec has its place now and then—and yes, I have spare PC CPU horsepower).

Configure Symantec to allow the Untrusted subnet
Configure Symantec to allow the Untrusted subnet

Now, ICMP packets are no longer blocked between networks, but I still can’t reach the AP’s web UI—even though I see the pings in the traffic logs.

I’ve even added an “any to any” firewall rule on the Untrusted network. No change.

Warning: If you run nmap as I did, software firewalls may detect the port scan and suspend your network connection for an hour by default.

Disable Symantec port scan detection[/note]

Let’s try a stealth scan instead: sudo nmap -sS -v 172.31.1.*.

I think a port scan has been detected
I think a port scan has still been detected

Nope, pfSense doesn’t like that at all. And the whole network stops working. Nice security! Also, dang.

The good news is that I’ve isolated the packet malaise to the TP-Link AC1200 box itself. I suspect I need to add net.ipv4.ip_forward=1 to forward packets with no addresses in them, but I’d need root access to the AC1200. Let’s burn it to the ground and rebuild from its sprinkler-soaked ashes.

Top ↩


Replace Stock Firmware on the AC1200 Wi-Fi Access Point

Of course, I cannot actually stop Untrusted LAN devices from reaching the AC1200, as they all exist downstream from the pfSense box.

DD-WRT open-source router firmware, meet my ancient Archer C5 and do your thing.

DD-WRT supports the Archer C5
DD-WRT supports the Archer C5

The Archer C5 doesn’t accept the DD-WRT firmware. Hmm… how about OpenWRT?

OpenWRT supports the Archer C5
OpenWRT supports the Archer C5

The Archer C5 doesn’t accept the OpenWRT firmware either. What the actual facepalm (WTAF)?

Wait. My hardware is revision 2 using Broadcom chipsets, which are notoriously difficult networking chips.

Careful: Devices with Broadcom Wi-Fi chipsets have limited OpenWRT support (due to the lack of FLOSS drivers for Broadcom chips). (REF: OpenWRT.org)

Alright—OpenWRT, DD-WRT, and Tomato all have no firmware for this AC1200 with unpopular Broadcom chipsets. Into the refuse bin it goes.

Top ↩


Archer C5 v2 Into the Refuse Bin, R7000 as the New Wi-Fi AP

I’ve dismantled the AC1200 so I don’t forget why I threw it out. It’s too bad because it’s so pretty on the inside, and they always say, “It’s what’s inside that counts… except if you are a router with Broadcom chips.”

Inside the Archer C5 v2 with Broadcom chips
Inside the Archer C5 v2 with Broadcom chips

The R7000 is factory-reset, and here is the first problem:

Tip: On factory reset, the Nighthawk R7000 is picky about password format. One rule is that no more than two identical consecutive characters are allowed. Thanks, Netgear, for basically publishing a regex to password crackers. Let’s disable all those rules with a few keystrokes to remove the JavaScript “blocking” the form submission. Now my admin password doesn’t match the regex and is super long. Muahaha, Netgear password crackers.

The R7000 is in AP mode, but I can still access the pfSense web management page from the Untrusted network. Let’s lock down the web UI in pfSense under Firewall Rules.

Untrusted network firewall rules
Untrusted network firewall rules

Top ↩


Set up the Trusted Wireless Network

The Untrusted network is now looking good. It’s time to make the other R7000 Nighthawk I have into a Wi-Fi AP as well, so my phone and watch have a safe place to connect—plus a laptop when I want to RDP into my wired machines from the kitchen. I was saving that for a honeypot AP, but I can come back to that later.

Let’s see if I can Wi-Fi into the Wireless LAN’s R7000…

Tip: Remember to physically unplug the pfSense upstream router from the R7000; the R7000 is too helpful and will switch into AP mode when it detects an upstream router, after which you can’t reach the web UI.

Since only my trusted devices should be on the Wireless LAN, I’ll turn off 2.4 GHz Wi-Fi because anything recent and wireless should support 5 GHz. That means those pesky AliExpress Pineapple Wi-Fi password stealers on the cheap side only use 2.4 GHz, so a neighbor will have to put in some effort to snoop on my network. Plus, 5 GHz gets blocked more easily by walls and concrete, so I prefer it for averting medium-range snooping. But I am so going to set up a honeypot and brake-check my faith in humanity.

It’s normally straightforward to put a Wi-Fi router into AP mode by disabling WAN and DHCP.

Top ↩


Network Devices Interconnectivity Check

Do all my dozens of computers, laptops, Pis, clusters, NAS drives, and the like still connect as before? Most important is my web-scraping bot in a hardened, RAIDed, dedicated machine with its own UPS. But alas, I cannot SSH into it even though the SSH handshake packets reach the hefty box.

Could this be our old frenemy IPv4 forwarding being disabled? Possibly. I’m able to SSH into the machine from my iPhone (seriously) when on the same network.

Nope. Adding net.ipv4.ip_forward = 1 in the right place with a restart did not yield joy.

According to dmesg -w (to tail dmesg logs), UFW (Uncomplicated Firewall) is not blocking ICMP requests or TCP requests on port 22. When I do something nutty like try to SSH on, say, port 23, I do see UFW block logs in dmesg. Confirmed: packets can reach that machine.

Running tcpdump src 192.168.10.100—the IP from the Trusted network on the target machine—shows it is responding to pings. I’m even getting replies to SSH handshake requests. So now we know that return packets are being dropped. Interesting! Aside: tcpdump is awesome.

Let’s follow the trail. Digging a little deeper, I see replies to ICMP and SSH handshakes being sent to some IP over HTTPS that I don’t recognize. Bizarre. When I run the usual ipinfo tools I see that replies are going over a VPN that I completely forgot about. Ha—replies to a different subnet are egressing over the VPN but cannot return properly. Neat.

VPN causes ACK packets to return over the wrong adapter
VPN causes ACK packets to return over the wrong adapter

Now that I remember what I did in 2019, I re-added NAT alias rules, and it’s showtime again.

Top ↩


Windows File Sharing Gotchas

Your path may be smoother, but I always seem to make the Trench Run—remote-piloting a handful of lead-filled X-Wings at light speed right through the Death Star’s reactor to make it go boom: the easy way.

I’ve added rules so static-DHCP Windows devices can talk to each other, but by default the Private Network profile in Windows Defender Firewall scopes rules to the local subnet. That isolates different subnets. We cannot simply relax the pfSense DHCP subnet mask to, say, 192.168.20.0/16; it conflicts with another subnet. Instead, just to get file sharing working, I relax the scope in Advanced Settings as shown below. Be sure to modify both Inbound and Outbound rules for SMB and ICMP.

Windows file sharing across subnets
Windows file sharing across subnets

Again, add whatever subnets you need instead of any.

Top ↩


Public Service Announcement: Edge Browser

Why does Microsoft Edge start automatically and keep running in the background, and why can’t I kill it with Ctrl + Alt + Del? If you’ve asked yourself this, you’re not alone. Edge launches at login and sticks around. Here’s the fix:

Prevent Microsoft Edge from starting or running in the background. Sneaky browser.
Prevent Microsoft Edge from starting or running in the background. Sneaky browser.

I suggest downloading Winaero Tweaker and applying its registry tweaks to tone down the Redmond Spy Machine.

Stop Microsoft from spying on you
Stop Microsoft from spying on you

Top ↩


Block Clickbait, Endless Ads, and Dangerous Sites

Thanks to web-browser and DNS-level adblockers (e.g., Pi-hole), it’s commonplace to block bad sites, crypto-miners, fingerprinters, trackers, remarketers, banners, pop-ups, fake tech-support alerts, and all manner of unscrupulousness designed to take advantage of you. Let’s take pfBlockerNG on pfSense for a spin.

pfBlockerNG blocking ad domains with graphs
pfBlockerNG blocking ad domains with graphs

The pie chart looks great. I followed this pfBlockerNG tutorial.

Tekgru.com pfBlockerNG tutorial blog
Tekgru.com pfBlockerNG tutorial blog

This is important: If you have multiple network interfaces (the mini PC has four), then you need to enable the Permit Firewall Rules option for multiple interfaces and select them.

DNSBL Permit Firewall Rules for multiple interfaces
DNSBL Permit Firewall Rules for multiple interfaces

Want discretion over blocklists? Let’s add a DNS blocklist related to gambling and reload pfBlockerNG to see whether a poker site is blocked on the Trusted LAN.

Some sketchy poker sites are now blocked
Some sketchy poker sites are now blocked

If you prefer the connection to close silently instead of rendering a PHP page, create a new PHP script with the following code and select it in the pfBlockerNG settings page:

Top ↩


Intercept All DNS Requests, Even to Hard-coded DNS Servers

Let’s make sure all clients behind the pfSense router use the local Unbound DNS server so pfBlockerNG can act on them. We do not want apps and home assistants to bypass our DNS server, so we have to add some NAT rules.

Trap all DNS and DNS+TLS requests
Trap all DNS and DNS+TLS requests

First, we have to block DNS over TLS (for now) and allow only local DNS requests (note the rule order):

Overarching DNS rules allowing only internal DNS queries
Overarching DNS rules allowing only internal DNS queries
Note: DNS over TLS must be blocked (for now) for all clients behind the pfSense router in order for DNS query trapping to succeed. An iPhone may show a Privacy Warning that the network is blocking encrypted DNS traffic. That is okay because we are encrypting upstream DNS requests to Cloudflare.

We can ignore the Privacy Warning in devices behind the pfSense router

Here is a NAT rule for one interface. I started by making a rule for each interface except WAN (obviously) like this:

Example rule to trap DNS queries on a given interface
Example rule to trap DNS queries on a given interface
Tip: NAT reflection should be disabled so the wild Internet cannot access our DNS server.

To make life simpler, I created a firewall alias of all non-WAN interfaces called Non_WAN. Covering IPv4 and IPv6, the redirect rules that send local DNS queries on port 53 to localhost look like this:

Firewall DNS query redirect rules to localhost
Firewall DNS query redirect rules to localhost

Let’s also log trapped DNS requests. Head to the Services › DNS Resolver page, click Display Custom Options, and add:

Well, hello there, Microsoft Windows. What are you up to trying to reach Google Tag Manager? Naughty OS. That request is now black-holed to a non-existent IP at 10.10.10.1.

Windows is trying to reach Google Tag Manager
Windows is trying to reach Google Tag Manager

Let’s turn our attention to the TV and see how it fares under DNS interception.

Top ↩


How to Restrict Apple TV and iPhone YouTube Ads?

YouTube: Regarding YouTube, the platform now shows two back-to-back ads—7-second and 15-second—nearly every few minutes. Why are the ads so incessant and so long? I don’t mind the occasional ad, similar to live TV, but these frequent interruptions would warrant FTC complaints if they were on broadcast television.

YouTube is tricky because ads are also videos that arrive from the same domain, so domain-name blockers like pfBlockerNG can’t filter them. The best pfBlockerNG or Pi-hole can do is block googleadservices.com—and only after you watch an ad video and click the ad.

Many people use a web browser such as Firefox or Chrome with uBlock Origin, which acts on JavaScript. It may be enough to watch YouTube in a browser and cast it to a so-called Smart TV. However, we can’t restrict ads in the iPhone YouTube app (without jailbreaking and compromising the device).

What are our options? How can we safely restrict YouTube ads on all network devices?

Top ↩


Trick the YouTube Ad Algorithm Instead

Thought Experiment: Among friends, let’s say English-speaking countries get ads for the most ridiculous things because their residents are assumed to have disposable income. Can we instead make YouTube think we are an undesirable advertising target?

What do ads in other parts of the world look like? Are people living in Antarctica or low-Earth orbit getting lots of ads, too?

Xkcd.com: Mess with advertisers
Xkcd.com: Mess with advertisers

What if we leverage this pfSense router to route YouTube location-tracking traffic through a VPN that terminates in some remote part of the world with fewer YouTube viewers per capita? In other words, let’s make ourselves undesirable to advertisers and see whether we get fewer ads.

Scotty from TNG episode 'Relics' understands the plan
Scotty from TNG episode ‘Relics’ understands the plan

Top ↩


Research into YouTube Advertising Spend

Let’s do some YouTube demographics research to find a part of the world avoided by advertisers.

Mobile advertiser spend by country in 2020 (REF: statista.com)
Mobile advertiser spend by country in 2020

Let’s also check some YouTube statistics about viewers by country for insights. Thinking about following some Reddit advice and VPN’ing into India? Think again.

Total YouTube views by country in 2019 (REF: ChannelMeter)
Total YouTube views by country in 2019 (REF: ChannelMeter)

That was 2019. This is 2020:

Top ten YouTube countries with population (REF: backlinko.com)
Top ten YouTube countries with population (REF: backlinko.com)

I’m not a digital advertiser, but I can see that people in the UK and Canada watch a large number of videos per sitting. If I were an advertiser, I’d pump those two countries with video ad after video ad because, statistically, those residents will take the eyeball kicking. All things being equal, I definitely need a VPN to terminate outside of Canada, the UK, and the United States (English-speaking countries) to enjoy YouTube more.

Does age play a factor? Who don’t advertisers want? I want to be that guy on paper.

YouTube age demographics as of 2020 (REF: backlinko.com)
YouTube age demographics as of 2020 (REF: backlinko.com)

Top ↩


New Goal: Let’s trick YouTube into believing I am a 70-year-old male living in Italy. Yes, that should definitely cut down on the Nespresso and Starbucks ads, at least.

How, then, to convince YouTube that I am a retired Sicilian living about a small chain of islands? I embellished that last part—seventy and in Italy is sufficient.

Let’s do this. In the YouTube account…

I am 71 years old
I am Iron Man 71 years old
I am in Italy
I am in Italy

It is doubtful this is all it takes for our goal. Let’s find a VPN exit point in Italy.

NordVPN has 60+ servers in Italy
NordVPN has 60+ servers in Italy

Nice. NordVPN, for example, has about 60 servers in Italy.

Top ↩


Selectively Route Apple TV Over the VPN

Let’s go through some tutorials to set up OpenVPN in pfSense. Just kidding! We’re going to use WireGuard—after all, we have the Intel AES-NI instruction set because we didn’t go cheap and buy a J1900 mini PC that sellers are trying to off-load.

I’ll now install the FreeBSD WireGuard package.

Install the WireGuard package in pfSense
Install the WireGuard package in pfSense

Next, add a tunnel and enable it. According to this thread and this thread on Reddit, we need to grab some WireGuard and NordLynx details—specifically the private key—from a sacrificial Linux VM and transpose those settings to the pfSense router. No problem.

Connect to Italy over WireGuard
Connect to Italy over WireGuard
WireGuard config information via wg show
WireGuard config information via wg show

Run sudo wg showconf nordlynx on the VM to see the private key needed for the pfSense tunnel configuration.

Here are various screenshots that show the steps in more detail.

VPN > WireGuard > Tunnels
VPN > WireGuard > Tunnels
VPN > WireGuard > Peers
VPN > WireGuard > Peers
Tip: Enter 1.0.0.0, then set the subnet mask to 0. Don’t choose 0.0.0.0; there’s a glitch or bug in the UI—or what-have-you. The result will still display as 0.0.0.0/0.
VPN > WireGuard > Settings
VPN > WireGuard > Settings
Interfaces > Interface Assignments
Interfaces > Interface Assignments

That should be enough to let Diagnostics curl to Italy.

Successfully connected to NordVPN through WireGuard on pfSense
Successfully connected to NordVPN through WireGuard on pfSense
Successfully connect to Italy and verified
Successfully connect to Italy and verified

Now that the easy part is out of the way, let’s set some policy rules to send Apple TV traffic over the VPN to Italy as a baseline test.

From Netgate, on the order of Firewall/NAT processing:

Traffic from LAN to WAN is processed as described in the following detailed example.
– Port forwards or 1:1 NAT on the LAN interface (e.g., proxy or DNS redirects)
– Firewall rules for the LAN interface:
– Floating Inbound rules on LAN
– Rules for interface groups that including the LAN interface
– LAN-tab rules
– 1:1 NAT or Outbound NAT rules on WAN
– Floating rules that match outbound on WAN

I’ll make an alias, for now, to hold some clients that have static-DHCP entries and hostnames I assigned in pfSense.

VPN clients in the Firewall > Aliases > IP page
VPN clients in the Firewall > Aliases > IP page

Floating rules in have high precedence, so I add new entries below the automatic pfBlockerNG rules and drop in a blue separator while I’m here.

Floating rule to route select clients over the VPN to Italy
Floating rule to route select clients over the VPN to Italy

And here’s the full rule as a tall screenshot:

Firewall > Rules > Floating rule to route select clients over the VPN
Firewall > Rules > Floating rule to route select clients over the VPN

Apply. Wait. Time to test with a notebook on the Untrusted network.

Google is entirely in Italian now
Google is entirely in Italian now

Google appears in Italian—very cool. Now for the Apple TV.

Apple TV's YouTube reports I am in Italy
Apple TV’s YouTube reports I am in Italy

Winner winner, chicken dinner. All my YouTube is in Italian. I still get some ads—fewer than before—and because Italians speak slowly and with a kind of charming accent, I don’t mind the Nutella spots at all.

With this technique I no longer feel manipulated by English-language ads. I have personalized ads off, but given my new status as a retired gentleman I should turn that back on to scare away advertising euros. I wonder whether Netflix and Amazon Prime behave differently…

Some Netflix assets are being blocked
Some Netflix assets are being blocked

Dang. Netflix is having problems. Amazon Prime is even worse. It looks like some CSS or font files are blocked, and the thumbnails aren’t loading. Time for Phase Two: tunnel only YouTube traffic over the VPN.

Warning: Do not try to send all Apple TV traffic over a VPN; Netflix, Prime, and others are wise to VPN providers and have gotten great at geofencing.

Top ↩


Selectively Route Apple TV YouTube Traffic Over the VPN

Let’s start by adding firewall-policy rules to send the most common YouTube domains over the VPN.

As I’m about to add the rules, my hands hover over the keyboard—I don’t yet know which domains to tunnel. They must be FQDNs (fully qualified domain names, no wildcards). Let’s open a Chromium-based browser and watch the traffic in DevTools.

Add the domains column to DevTools to see where YouTube calls
Add the domains column to DevTools to see where YouTube calls

Here are some candidate FQDNs to add:

But wait, I hear you ask—why accounts.google.com and gstatic.com? This is a precaution in case one of those domains is geo-checked. I wouldn’t put it past Google engineers to geo-tag the fonts domain (fonts.googleapis.com), but in the interest of performance, I’ll assume they don’t.

Here are my new rules; I chain two of them with a tag so I can limit YouTube tunneling to the same untrusted machines (including Apple TV).

YouTube domains to tunnel
YouTube domains to tunnel
Use a match rule before the tunnel rule
Use a match rule before the tunnel rule
The first rule matches VPN clients and tags them
The first rule matches VPN clients and tags them
The second rule tunnels tagged requests through the VPN
The second rule tunnels tagged requests through the VPN

And with that, YouTube thinks I’m in Milan, while Netflix and Prime Video still think I’m in Canada. The ads—oh, the ads—are now few and far between, and when they do appear, they’re a delight in that gentle, hypnotic Italian.

YouTube, Italy
YouTube, Italy

Top ↩


Time goes by…


Gotcha: DNS Race Condition

A day goes by, and I notice I get Nutella and Ferrero Rocher ads only mid-video, not at the start. Odd. Some digging turns up this:

Pertinent information about pfSense and hostname aliases
Pertinent information about pfSense and hostname aliases

This means that the hostnames are resolved to IP addresses once and those IPs are used in my VPN tunnelling policy rules.

A hostname entry in a host or network-type alias is periodically resolved and updated by the firewall every few minutes. The default interval is 300 seconds (5 minutes) and can be changed by adjusting Aliases Hostnames Resolve Interval under System > Advanced, Firewall & NAT. — pfSense docs

Ah-ha—this looks like a DNS race condition:

  1. The Alias Daemon resolves the FQDNs and updates their IPs.
  2. Hours later I power up the Apple TV.
  3. Because the DNS TTL is 1,440 seconds (24 minutes), the cached YouTube entries expired.
  4. Fresh DNS queries run. The new IPs are from a large pool, not guaranteed to match what the Alias Daemon resolved.
  5. Five minutes later, the Alias Daemon runs again and may resolve yet another set of IPs.

If the policy and the client disagree about which IPs belong to YouTube, traffic can miss the tunnel.

Mitigation: Force pfSense to ignore the target’s TTL and cache the Alias’ entries longer.

Override the minimum TTL of the target DNS entry
Override the minimum TTL of the target DNS entry

With that tweak, the Alias Daemon and the client stay in sync—no more DNS race condition.

Top ↩


Gotcha: Authentication Trouble, 403 Forbidden Error

Sometimes videos refuse to play. For security, YouTube embeds your IP in each googlevideo.com request. I wrote about this in 2016 in Download YouTube 4K Videos with PHP. The new snag is that various JavaScript and “are you human?” assets tunnel over the VPN, but mangled domains like r5---sn-hpa7kn76.googlevideo.com do not, so they emerge from the wrong IP. Cue the 403 Forbidden error.

YouTube authentication failure
YouTube authentication failure

Let’s fail fast with a quick experiment: Let’s grab the IP of that second-level domain (SLD), add it manually to the list of VPN-tunneled items, apply, and refresh YouTube.

Success. We need to route the mangled domains over the VPN as well.
Success—mangled domains must tunnel too

Excellent. Now we just need a way to tunnel the wildcard *.googlevideo.com. Unfortunately, NAT and firewall rules work with IPs, not wildcard hostnames. Can we predict or enumerate these domains?

A Wireshark capture of DNS requests shows the SLDs are hardly predictable:

Random SLDs from googlevideo.com
Random SLDs from googlevideo.com

Let’s drop into a browser with adblocking disabled and inspect the HAR waterfall to find my interactions that triggered ads.

Waterfall showing ad interactions coming from www.youtube.com
Waterfall showing ad interactions from www.youtube.com

What exactly are requests like
GET https://r7---sn-uxa0n-t8ge.googlevideo.com/generate_204
doing? I’ll give this problem some thought offline.

Top ↩


Gotcha: YouTube Is Now Showing UK Ads, Not Italian Ads

Before I can even solve the previous gotcha, British ads start showing up as frequently as if we’d done nothing at all. Ads from the UK are even more incessant than those from Canada, trailing only the USA and India in my earlier stats. It would be a complete failure if we end up with UK ads.

Why does this happen suddenly? I opened a fresh browser in a VM and tunnelled all traffic through Italy. The only leak I found appears when I query ipinfo.io over the Italian tunnel and see a UK address listed in the ASN. Could this small leak be the culprit?

It is possible the VPN is leaking unintended information
It is possible the VPN is leaking unintended information

Even with the browser language set to en_US and location services off, this is the only leak I can spot. In addition to a VPN that exits in Italy, it also has to be one that doesn’t leak ASN (Autonomous System Number—used for automated routing) pointing to a different country. Dang, Google, you’re good. Time to bring my A-game.

Top ↩


Find a VPN Exit Node with No ASN Leak

By visiting https://nordvpn.com/servers/tools/, I can see the available VPN endpoint nodes in Italy. There are plenty of WireGuard endpoints, too. To move things forward, I add an OpenVPN tunnel in pfSense, connect to several Italian nodes, and inspect their ASNs. I want to eliminate ASN leakage as the remaining GeoIP clue. I used this guide.

Through trial and error, I found a node whose ASN is registered to an ISP in Italy.

Found an exit node with no ASN leaks
Found an exit node with no ASN leaks

Beautiful. Bellissimo.

Italian content with Italian ads again
Italian content with Italian ads again

Top ↩


Hijack Google Video DNS Queries

To make any of this work, I need a technique to route the wildcard *.googlevideo.com domain through the VPN.

Thought Experiment: Suppose I write a plugin for pfSense that periodically greps the DNS query log, keeps track of the *.googlevideo.com queries, and adds them to a unique list of aliases for Google Video domains; if backed by an LRU-eviction policy, this could keep working indefinitely. However, if each video uses a unique, mangled domain, then this does not work unless I hit refresh on every single video.

On the other hand, if I “hold up” the DNS query for those *.googlevideo.com domains, add the IPs to some alias list, then allow the DNS response to finish the round-trip, we may be in business!

pfSense DNS resolver has user Python support
pfSense DNS resolver has user Python support

Where to even start? Here are some Python example scripts for inspiration. A quick mental reverse-engineering of a handful of scripts reveals that there are some event hooks available. Nice.

Among friends, let’s say that I can build up the pool of Google Video IPs in real time. How, then, do I add these IPs programmatically to the firewall alias list for YouTube without restarting the firewall? One person actually hacked the PHP scripts in pfSense—tempting, but I’ll do more research. Another person created a REST API for pfSense. Jackpot!

Top ↩


New Goal: We need to add IPs to the firewall-policy rule that tunnels YouTube videos over a VPN to avoid incessant, obnoxious North American ads. Because the IPs keep changing with those mangled second-level domains (SLDs), we’ll use Python 3 and a REST API to monitor the relevant DNS queries, capture the response IP(s), hold the response, add the IP(s) to the VPN-tunneling rule, and then release the DNS reply.

Research Python Methods to Hijack DNS Requests

Why this approach? It’s future-proof, modular, elegant, maintainable, automated, and it lends itself to a future decision tree that could eventually block YouTube ads outright.

First, I’ll enable SSHd in pfSense and take a peek around.

Enable SSHd in pfSense
Enable SSHd in pfSense
SSH into pfSense using the GUI credentials
SSH into pfSense using the GUI credentials

Rsync Disk Backup

Let’s take this opportunity to make a disk backup. du -h shows that only 800 MiB is in use on the SSD. Let’s rsync the whole box from our local machine; it should take about four minutes.

Tip: To verify the ownership and permissions are set in the extended attributes locally, run
getfattr -d -m ^ -R -- ~/.pfsense-backup

Install pfSense REST API

Now that we have a pfSense backup (I’m told just backing up config.xml works too), let’s install the REST API.

This part had me confused. You see, I was looking at the bottom of the screen wondering how the heck I could copy a truncated hash as a token. After a few tries, I noticed the green message at the top that I had been trained to ignore. It has the token.

Tricky UI screen to get the API token
Tricky UI screen to get the API token

With the API credentials set up, let’s test the API:

Successful API test
Successful API test

Explore the Unbound Python Module

Running find / -name "py*" shows that the current Python version is 3.8.

As for the Unbound DNS Resolver, I had some luck tinkering in nano and writing simple Python 3.8 code to log DNS-query messages. We now have both parts needed to dynamically update the firewall aliases and tunnel all YouTube traffic once and for all.

If you are looking for Python module docs for Unbound, here they are:

There are no readily available Python module docs for Unbound
There are no readily available Python module docs for Unbound

Run these commands to quickly build the documentation:

Warning: The example code is from Python 2.4, so be prepared to run Black and PyCharm code formatting, or run 2to3. Also, the most important part of this whole exercise (getting the IPs from the DNS reply) is missing, so here is the hint: import ipaddress. Don’t forget to manually hack the byte strings to pull out the proper IP addresses in binary form first.

Now we have Python docs and access to all the capabilities. Excellent.

Successful generation of Unbound Python docs with Sphinx
Successful generation of Unbound Python docs with Sphinx

Next, I take a backup of the OS/VM and install libtool and swig, then run ./configure --with-pythonmodule, make, fix a few errors in the Unbound code, and make again. That produces the generated Python module (unboundmodule.py), which removes all those missing-method red lines in PyCharm.

PyCharm can now find the missing methods we don't actually need to worry about
PyCharm can now find the missing methods we don’t actually need to worry about
First successful DNS response logging script
First successful DNS-response logging script

Top ↩


Smoke Test: A Python DNS-Hijacking Script

Here is a smoke test of the ability to hijack *.google.com DNS requests with reply IPs the script caught in just a few minutes (the timestamps simply maintain a crude LRU cache):

Smoke test for collecting IP addresses of *.google.com
Smoke test for collecting IP addresses of *.google.com

Duplicate IP addresses are possible, and that is fine. I let the smoke test run overnight. Here is the PoC (proof-of-concept) script I ran as the Unbound Python-module script.

When I woke up, the Unbound DNS Resolver segfaulted. Here are the logs:

We can see a full FQDN alias re-process on each firewall config update
We can see a full FQDN alias re-process on each firewall-config update
Failure: Capturing all the IPs from the DNS queries to *.googlevideo.com and *.google.com puts pfSense into a crawl as all the rules need to be reloaded on each addition.

Top ↩


New Goal: Research and install a Squid-like proxy, create a fake-but-trusted CA certificate, host it, install it in a browser as a PoC, decode TLS traffic, and victory dance.

Actually, it is not illegal to jailbreak most Apple TV boxes, so we could break in, add a root certificate valid for the pfSense box, MITM traffic from the Apple TV, and then Microsoft Bob is your uncle. That works because the pfSense box as the gateway can decrypt Apple TV traffic, inspect the request headers for the offending ad hostname, block the request, and re-encrypt other valid requests to Mountain View, California.

But, then my iPhone would still show ads because it is harder to jailbreak, plus banking apps may detect this and not work anymore. Jailbreaking is too extreme, anyway.

Fun fact: I used a jailbroken iPhone all the time in Japan because of a quirky cellphone law. You see, because of icky perverts who like to take photos inappropriately on elevators and escalators, Japan passed a law that made the camera shutter sound mandatory on all photos.
 
Super unfortunate was that taking a screenshot of a web page also made the same loud, un-muteable shutter sound. Imagine you are on a train and you screenshot a Google map, it makes that loud shutter noise, and then you get dirty looks from the train riders. Yeah, I had to jailbreak and zero out the camera-sound file.

Let’s see what it takes to spy on the HTTPS traffic from the Apple TV and iPhone to see if we can block ad URLs that way.

Top ↩


Install a Fake-but-Trusted CA Cert on Apple TV and iPhone?

Not wanting to jailbreak and add self-signed certs to Apple TV and iPhone, I wonder: how hard would it be instead to add fake-but-trusted Certificate Authority (CA) certificates to each device?

The “A” in CA means there is no higher entity to vet such a certificate. The “A” is so powerful that, back in 2001, only a Windows patch could revoke some dangerous VeriSign certificates. As a thought experiment, new CAs must come into existence from time to time—Let’s Encrypt is relatively new, for example. There should, then, be an in-warranty way to get a fake, trusted CA cert into an Apple TV and iPhone. If that is possible, an entire world of MITM spycraft becomes available to decrypt TLS packets in the clear and use good ol’ URL blocking on requests like:

Let’s see how easy this would be.

We can add fake, trusted CA certs to iPhone too
We can add fake, trusted CA certs to iPhone too

In fact, there are many, many CAs. Here is a quick find / -name "*.pem" in pfSense:

Many CAs exist already
Many CAs exist already

Top ↩


Experiment with Squid and SquidGuard

I’m aware of mitmproxy, but it needs to be side-channel installed onto the pfSense router. Let’s see if the squid3 proxy that is available as a pfSense package can do what we need. First, I will take a bare-metal backup again so I can roll back in case mitmproxy is better.

Install squid3 and ancillary packages
Install squid3 and ancillary packages

I’ve installed those packages, and naturally, there are more buttons and options than in a space shuttle. I’ll find a guide.

I’ve followed the steps in the guide. However, since I have a large SSD and generous RAM, I’ve made a dedicated folder /squid_cache (and chown squid:proxy) with 8 GiB of cache and a juicy allowance on the per-item cache size, which should also help with Docker and NPM speed-up. Two birds, one stone. With Transparent HTTPS support, this should be pretty rad.

Tip: If web traffic slows down while using Squid, here are some System Tunables that can make Squid faster (ref):

vfs.read_max 128
kern.ipc.nmbclusters 32768

Also, for local disk cache, aufs is asynchronous ufs (great for Docker too) and uses POSIX threads to avoid blocking the main Squid process on disk I/O.

We can actually generate a CA cert in pfSense itself.

Generate a CA in pfSense
Generate a CA in pfSense

Now, how to get it into the Apple TV and iPhone? It should be hosted somewhere, right? How about on the router?

Top ↩


Self-Host the MITM CA Certificate

Self-hosting with a single command is ridiculously easy. From the SSH shell in pfSense, I can create a web folder and server like so:

When I visit //pfsense:8000, I should get a blank page with “Hello.” From here, clients behind the pfSense router can temporarily access static documents.

To make life easier, here is a PHP script that forces the MITM certificate to download:

As another smoke test, I add the MITM CA to Chrome manually and enable SSL Filtering (TLS/SSL inspection). The defaults are fine in Squid. Here is the log file when I visit https://ericdraken.com:

Successful capture of TLS requests from a downstream client
Successful capture of TLS requests from a downstream client

Excellent.

However, on every other browser and machine there are HTTPS errors like so:

MITM certificate errors if the CA cert is missing
MITM certificate errors if the CA cert is missing
Locked out? If you get locked out of pfSense with a TLS error, you may have to disable Remote Cert Checks, as the pfSense web configurator uses a self-signed certificate. Alternatively, you can bypass the proxy for the pfSense UI under Bypass Proxy for These Destination IPs with pfsense; pfsense.localdomain.

Top ↩


Abandoning Squid: Too Slow, Too Heavy

After a day of painfully setting up Squid and SquidGuard, adding blacklists and manual regex patterns like .+?/pagead/.+, I’m having nothing but issues with Squid. Here are the top pain points:

  • It’s slow. It’s really slow.
  • The ACL (Access Control List) settings are cumbersome.
  • There is an issue with https://http/* (ref).
  • The SquidGuard URL filter takes eons to update a list.
  • The Squid UI is unbelievably lacking.

Squid makes me sad. I don’t get sad often, but Squid makes me sad with its promise and ultimate letdown. I’ve obliterated Squid and restored the router from the rsync backup I made earlier. Below is a handy script that shows a diff of what Squid and related packages added.

Rsync Diff of Changes

The output is something like this under the --dry-run option:

Top ↩


Install MITMProxy in a FreeBSD Jail

Even though written in Python, I’ll give mitmproxy a try next; at the very least it can be purpose-built to block YouTube ads with its rich API and Python-hook extensibility. It was a coin toss between mitmproxy and SSLSplit—a Metasploit hack tool—to achieve on-the-fly TLS interception, but the former can be scripted with Python and has a satisfying UI. Let’s go.

Careful: Please read the whole section before trying any commands because I backtracked a bit and want to explain why.

You’ll notice that there are only three binaries at about 24 MiB each. As I understand it, they include a self-contained Python 3 environment with frozen dependencies. I’d like to jail these binaries because—well, because. First, let’s see if there is a vulnerability report for mitmproxy at vuxml.freebsd.org. Nothing. How about at Exploit-DB? Nothing again. Good.

First, what version of FreeBSD is this pfSense install?

Now, according to this guide, I’ll need to set up jails myself because they are disabled in a default pfSense installation. Not knowing FreeBSD at all before today, I had to hack around to find a URL to download the ezjail package manually. After another bare-metal backup, here are the steps I took:

We need to do some hacking to get jail working on pfSense’s take on FreeBSD because jail is missing completely. What I’ve done is copy the jail binaries from a jail (via ezjail) back to the root system.

Let’s set up a jail for mitmproxy.

This is very important: We must enable raw sockets in this jail to allow transparent proxy mode to work. If not, MITMProxy will report errors such as “Transparent mode failure: FileNotFoundError(2, ‘No such file or directory’)” or “Cannot open connection, no hostname given.” This is because raw sockets are inaccessible and server information is unavailable. We can easily edit the ezjail config file per jail like so:

This is also very important: MITMProxy calls sudo -n /sbin/pfctl -s state, but there is no sudo in the jail. Run pkg install sudo inside the jail.

Sanity Check: If you run ping 1.1.1.1 inside the jail and you receive an error such as “ssend socket: Operation not permitted,” raw sockets are still blocked. If ping succeeds, raw-socket access is working as required.

Now we can copy over the mitmproxy binaries and take them for a spin.

Things get tricky at this point. Running any of the binaries above results in:

So, there is no /lib64 folder, nor any compatible dynamic linker that I can find. I tried this, however:

Apparently, there is a pkg install compat6x that can solve this for us (unavailable on pfSense), however, this is getting ridiculous! Let’s try a new tactic. Since we are in a jail, we are not bound to the crippled (read: secured) pfSense environment. Maybe we can install the mitmproxy package normally in a jail?

pkg install mitmproxy

And Bingo was his name-o. After this, simply running mitmproxy in the jailed console opens the MITMProxy UI. Nice. Note: this version may be one or two minor versions behind the master branch. Let’s clean up with rm -rf ~/mitm* /lib64 and do another bare-metal backup.

Top ↩


Exploring MITMProxy

This is getting exciting. First, in pfSense, add a virtual IP for 127.0.1.1 attached to localhost. Then, add a NAT rule to temporarily forward [Private IPs]:8080 to 127.0.1.1:8080 so the proxy is reachable from the LANs.

If I’m not already in the jail console, I run:

Next, I add the proxy setting 192.168.20.1:8080 to my sacrificial notebook (auto-wiped daily). When the browser opens, I can already see colorful log entries in the MITMProxy UI.

First logs of MITMProxy
First logs of MITMProxy

The next step is to fetch the auto-generated CA PEM file used by MITMProxy (~/.mitmproxy/mitmproxy-ca-cert.pem). Since any CA cert here is snake oil, I’ll use the provided one. TLS traffic from my devices is safe as long as I use my own proxies.

Let’s put our earlier self-hosting approach into action. Because there is no PHP in the jail, we spin up a Python 3 web server instead:

Tip: MITMProxy conveniently offers the same CA cert at mitm.it; visiting that URL serves the file automatically.

After installing the CA in the Trusted Root Store on my clean notebook (and rebooting), I see this:

MITM TLS interception is working well
MITM TLS interception is working well

Time to add the cert on my iPhone.

Successfully added a root CA to the iPhone
Successfully added a root CA to the iPhone

This is incredibly exciting. Can I LoJack the Apple TV box next?

Successfully installed a root CA on the Apple TV
Successfully installed a root CA on the Apple TV

Excellent.

But wait, the router is slowing down. mitmproxy is burning up the CPU… on idle.

MITMProxy is burning up the CPU while on idle
MITMProxy is burning up the CPU while on idle

Of course: Python is a single-threaded paradigm with the GIL (Global Interpreter Lock) ensuring threads do not actually run concurrently—unless they are blocking on I/O, which may be the case here(?). Except that most of the CPU work is to generate TLS certs on the fly for each request. Yikes. Running mitmdump forgoes the UI and extreme logging. The extreme logging of all the headers and full responses heavily slows down mitmproxy, but mitmdump by default logs entries like classic Apache logs—much kinder on the CPU.

Certificate Pinning Some advanced, high-security web servers have trouble with MITMProxy certificates because of Certificate Pinning—a technique where the server or client knows the expected certificate fingerprint in advance, so it cannot be forged. A workaround is to use the --ignore-hosts option to let them bypass the proxy.

For my fun, I’ll go with this CLI command:

While on YouTube, we can see the page ads clear as day with their unencrypted headers; can a simple regex now block them? They are exposed, and afraid, and their days have run out.

MITMProxy can see the YouTube ad URLs
MITMProxy can see the YouTube ad URLs

We can even see details about each request. For example, all the SAN info is laid out for this wide-reaching certificate. There are curiously a lot of *-cn.com domains covered by this cert.

We can see rich request and response details
We can see rich request and response details

Shortly, I’ll write a Python script to block YouTube /pagead/ URLs.

Top ↩


Patch MITMProxy Source Code for Server SNI Interrogation

This step may be optional for most, but as a reminder to myself: to make --allowed-hosts work better in Transparent Proxy Mode, the SNI of the server request needs to be checked against the list of regular expressions; otherwise, only the server’s IP is used for matching in many cases. Here is a quick patch I made that can be applied directly in the jail shell (or just type a few lines manually) for mitmproxy version 7.0.4:

With the above patch, I can now reliably intercept a few hosts and let all others pass through.

Reliable server host interception in MITMProxy transparent-proxy mode
Reliable server host interception in MITMProxy transparent-proxy mode

Top ↩


Smoke Test: Intercept YouTube Ads with MITMProxy

After reading the docs and navigating the mitmproxy source code in the PyCharm IDE, I’ve written a little script to block ads and tracking URLs coming from YouTube on my clean notebook. I won’t reproduce the code just yet because it didn’t succeed in blocking ads as hoped, so instead, I’ll spend time investigating why.

Here are the smoke-test filters I used: for a given top-level domain, URLs containing any of the following substrings are blocked:

My initial results look good. Everything I want blocked is faithfully blocked. Note: the (failed) entries come from my script, and the 502 failures come from pfBlockerNG black-holing the request.

MITMProxy blocking script is working
MITMProxy blocking script is working

Even in the DevTools Network panel, the requests are truly blocked.

YouTube requests are truly blocked in DevTools network panel
YouTube requests are truly blocked in DevTools network panel

Then why am I still seeing ads? I’ve disabled HTTP/2 so that subsequent requests on the same channel don’t slide by. Sometimes the ads skip on their own or fail to play, but they still appear. Interesting. Could YouTube be using WebSockets? I need inspiration, so I’ll look at uBlock Origin’s regex filters for ideas.

Tip: If you see the error OpenSSL Error([(‘SSL routines’, ‘ssl3_read_bytes’, ‘tlsv1 alert internal error’)]), the DNS blocker (i.e., pfBlockerNG) is breaking the upstream TLS handshake for that domain. Either whitelist it in pfBlockerNG (so the request goes through) or intercept it and block the connection in mitmproxy. This error happens to black-holed domains when the upstream TLS cert cannot be sniffed. The cleanest strategy is to use transparent MITM mode.

Top ↩


Examine uBlock Origin Regex Patterns for Inspiration

Here are some of the regex patterns/strings that uBlock Origin uses on YouTube.

uBlock Origin YouTube regex/filters from a web browser
uBlock Origin YouTube regex/filters from a web browser

At first blush, it seems that a community of like-minded individuals is playing whack-a-mole with YouTube’s HTML and JavaScript. This has got me thinking: How does a video know to play an ad with JavaScript?

How does YouTube know if the ad converts? They must target ads for individuals, so a given video must receive some unique information about an ad—such as the click link and alt text. WebSockets would be a pain to maintain, especially with all the mobile clients. They must be using stateless JSON to relay that pertinent information in an innocuous URL request that has no telltale signs of ad-ness. Let’s hunt for this info in the JSON replies captured by mitmproxy.

Key advertisement information contained in a JSON response
Key advertisement information contained in a JSON response

Snap, Crackle, and Pop. We have a new plan: surgically alter the JSON response body to eliminate—or Byzantine-up—the ad information.

Top ↩


Surgically Alter the JSON Response to Remove Ads

After a bit more playful exploration, a trove of blocklorne URLs is right there in the JSON payload. In fact, most of what I am trying to block shows up right here:

However, YouTube has bobby-trapped their UI and there is more than one way their obfuscated JavaScript code can pull down the ad details.

Let’s blow it all away right now.

After plenty of fun dissecting the YouTube UI and HTTP workflow—cookies, naughty service workers, and all—I am now able to strip away every pre-roll, post-roll, and mid-video ad. Here is a mitmdump screenshot showing select REST queries intercepted, decrypted, modified, then returned with updated headers (content length, etc.):

Success in removing YouTube ads via decrypted JSON responses
Success in removing YouTube ads via decrypted JSON responses

With this new capability, we could even inject JavaScript into the main YouTube page to subvert their code in an ECMAScript arms race—perhaps leveraging filters from uBlock Origin. For today, though, we can hang our hats on this accomplishment.

Success: We can strip out ads from the JSON payload for YouTube web ads using a router.

Top ↩


The iOS YouTube App Uses Protobuf, Not JSON

I can see very similar data in the Protocol Buffer (Protobuf) version of the same API calls as the web version in the YouTube iOS app. That complicates things somewhat: I cannot lean on JSONPath to hunt down advertisement sections, because with Protobuf the keys are just numbers that can even change.

The iOS version of the YouTube app uses Protobuf
The iOS version of the YouTube app uses Protobuf
Fun fact: YouTube compiles a large list of all the ads you are going to see and sends that to you in a sneaky payload. In fact, it is easier to visualize this when reading Protobuf. If you manage to exhaust that list, another large list will soon arrive.

I see strings like “Telus,” “Samsung TV,” “Boxing Week,” and “Buy now.” Remember when YouTube was a fun place? A fable about a golden goose comes to mind, Alphabet.

What is a Protocol Buffer? Here is an infographic from Data Science Blog.

Protobuf introduction
Protobuf introduction (Credit: Data Science Blog)

As a consequence of seeing unencrypted traffic from my iPhone, I’m taken aback by the sheer amount of tracking information laid bare; it’s like I have electrodes on my head and chest while I’m running on a treadmill, and a line of scientists in white lab coats with clipboards is recording everything about my internals. In other words: yikes!

Privacy concern: Your apps are tracking you like crazy—what you do, how long you dwell, when you leave a given app, and much more. The URL https://play.googleapis.com/log/batch shows up a lot in my logs.

The next question is: Does the iOS app protocol behave like the web app?

Top ↩


Timing Analysis to Detect Ad Videos?

The iOS network traffic is not like the web traffic; Google has teams and teams of engineers dedicated to making sure blocking their ads isn’t computationally feasible. Daunted but undeterred, I was staring at network requests letting my mind zone out when I noticed a pattern I had not seen before.

For the web version of YouTube, I can eyeball which URLs are ads and which are the videos I want to watch. Take a look:

Which are ad videos and which are content videos?
Which are ad videos and which are content videos?

How am I able to eyeball which video URLs are ads in this chaos?

Two ad videos between content videos
Two ad videos between content videos

Take a look at the query parameter range. For the web version, a chunk of the video I want is fetched from byte 0, then immediately another video is fetched with a range starting again at byte 0. Both happen nearly simultaneously—faster than a human can click a new video. It turns out this, together with examining the clen parameter for the full video length (short videos are likely ads), can reasonably let us detect and doctor ad videos.

However, the iOS YouTube protocol does not use the range query parameter or even the Range header; video chunks use a counter like &nr=2 and &nr=3, etc. We must reverse-engineer the Protobuf responses.

Top ↩


Decode the YouTube Protobuf Responses

Here are some decoded Protobuf log files I created, then opened in the PyCharm IDE.

Let's examine some Protobuf logs in the IDE
Let’s examine some Protobuf logs in the IDE

After logging decoded Protobuf messages to disk for offline analysis, I notice something that piques my interest.

I wonder what would happen if I were to, say, toggle those? This is tantalizing—but it feels like cheating, and hence no fun. Back to heuristics.

Thought Experiment: As with JSON, can I delete the Protobuf sections that serve up ads? Could I instead detect the ad videos in the payload, then dynamically modify their responses to be, say, a cached 0.01-second video file? Thirty- to three-hundred-second unskippable ads could vanish in a blink without blocking all those URLs.
Intercepted ad URLs from the Protobuf payload
Intercepted ad URLs from the Protobuf payload

Let’s start by blocking the ads as intended.

Top ↩


Ad URL Polymorphism

The Protobuf responses are a hot mess of bytes, but there are human-readable URLs that I can grep.

You’d think a simple LRU cache that blocks recently encountered ad URLs could be the way to go, but, alas, the ad URLs do not quite match the URLs sent over the wire. Also, who is to say that YouTube won’t randomize the position of query-string parameters one day? We need an O(1) lookup of flagged ad URLs that are polymorphic (and group homomorphic) to live ad URLs.

Detected ad URLs vs intercepted ad URLs
Detected ad URLs vs intercepted ad URLs

It might be tempting to split a query string into a sorted dictionary and reassemble it, but I have no way of knowing where the query-string boundary is. Plus, a live ad URL could add a key and disrupt the sorting.

Additionally, I’ve encountered URLs like this that purposely obfuscate the query params:

https://r4—sn-vgqsrns6.googlevideo.com/videoplayback
/expire/1640607416
/ei/WFrJYdWnFfyTsfIP4s2BsAk
/ip/121.35.98.26
/id/o-AE7swWOPOwXu3GyRght
/source/youtube
/requiressl/yes
/mh/wU/
/mm/31,26/…

Notice how /ip/121.35.98.26/ is just &ip=121.35.98.26?

I propose heuristically scanning for query and path parameters of ad URLs with high entropy and using those as keys (fingerprints). For example, in

https://rr6—sn-uxa0n-t8gz.googlevideo.com/initplayback?source=youtube
&orc=1&oeis=1&c=IOS&oss=1&oda=1&oad=5500&ovd=5500&oaad=11000&oavd=11000
&ocs=700&oputc=1&oses=1&ofpcc=1&osbr=1&osnz=1&msp=1&odeak=1&odepv=1
&osfc=1&id=58cc678216d6aaca&ip=121.35.98.26&initcwndbps=2125000
&mt=1640373902

One could note the following candidates in descending order of length:

  • rr6—sn-uxa0n-t8gz
  • 58cc678216d6aaca
  • 121.35.98.26
  • 1640373902
  • 2125000

Any or all of them could be lookup keys, each pointing to the same dictionary of deconstructed query parameters. A lookup of a live URL would involve the same process—find the highest-entropy parameters and check the URL dictionary for a match. The cache data structure could even be multi-level, with the root keys being just the length of the high-entropy strings.

Failure: Even with the ability to block polymorphic URLs, the video ads are still indistinguishable from content video without context from the Protobuf structure.

Top ↩


Smoke Test: Intercept and Decode Protobuf in Python

Python is Slow: Decoding ~500 KiB of raw Protobuf in pure Python is painfully slow.

Decoding ~500 KiB of Protobuf in pure Python—especially the step that expands it to over 1 MiB of human-readable text so I can parse the ad URLs—takes longer than the connection timeout most of the time. I’ll run some benchmarks using pure Python versus the native C++ library.

Pure Python Benchmarks

Pure C++ Benchmarks

If you caught that, it takes about 23 s in Python and 100 ms in C++! In this never-ending story, I need a way to parse the raw Protobuf payloads in Python using the C++ library libprotobuf.so. In the interest of time, I’ll use subprocess.Popen and communicate with the C++ protoc binary directly (since raw decoding isn’t supported in Python anyway).

Top ↩


Fuzzing the YouTube Video Ad Responses

How about fuzzing the ad-video responses? Now that I can isolate ad videos, as a smoke test I send back 200 responses with empty bodies, and the iOS app goes bananas—it enters an infinite loop with no delay, just hammering YouTube’s servers while trying to fetch the next part of the video in panic mode. I feel bad for their servers, so I stop. Then I wonder: what would a happy-path response payload look like?

Infinite spin-lock loop of YouTube trying to get the next bytes of the ad video
Infinite spin-lock loop of YouTube trying to get the next bytes of the ad video

Try as I might, when I send back empty 200s, 404s, or 503s, truncate response bodies, or just null-out part of the ad video, the iOS app crawls and then crashes spectacularly—with the dying breath of a messed-up iOS UI. I now block an error-reporting endpoint at /error_204/ that indicates a “dev assertion failed,” so I don’t make some overworked QA engineer pull out their hair.

Failure: We’ve learned that blocking ad URLs causes the app to deploy countermeasures, and even when defeated, the app hangs forever on the ad screen. We’ve also learned that fuzzing ad videos often causes the app to crash—there is even session metadata in the video-response chunks.

Let’s go back to what worked with JSON and obliterate the section of the Protobuf responses that contains the array of ad details.

Top ↩


Enter Burp Suite Tools for Penetration Testing

There is a library for Burp Suite called blackboxprotobuf (get the original Burp Suite version, not the PyPI fork, unless you like infinite-recursion bugs). It lets us decode raw Protobuf wire messages, inject something naughty, then re-encode them to see how a Protobuf endpoint behaves.

We are going to have so much fun together in this next section.

You may encounter a small world of pain, because some forks of blackboxprotobuf cause a stack overflow from deep recursion. You can spot this by adding sys.setrecursionlimit(200).

Compiling the original library source for Burp Suite and using the C++ bindings lets us transcode roughly 500 KiB of raw Protobuf in just a few seconds.

Tip: At the top of your import chain before you import protobuf, add

to use the C++ libprotobuf.so implementation whenever possible.

It is now possible to generate a best-guess .proto schema with a single function:

The schema isn’t perfect—it’s huge, deeply nested, and slow to pretty-print—but it’s good enough to pull out the ad details, as in this Protobuf-to-JSON sample:

Sample Protobuf to JSON showing a section of ads
Sample Protobuf-to-JSON showing a section of ads

The Python schema dump starts like this—and continues for about 250,000 more characters:

Reverse-engineering the full YouTube Protobuf schema sounds good on paper, but the target is spectacularly complex—and always moving.

Top ↩


Exfil the Proto Schemas from the App, Cleanly?

As fun as it is to reverse Protobuf and generate a best-guess schema, wouldn’t it be more ninja-like to exfil the actual, working .proto or schema files from the smartphone app? Let’s pull out the Protobuf schemas from the Android version of the YouTube app and see whether the schemas are the same—or at least compatible.

This is what I tried at first, but it went nowhere with the Protobuf Toolkit (PBTK). I reproduce it here so I remember what I tried:

After installing the Qt dependencies (pronounced “cute”), I was treated to a GUI.

PBTK – The Protobuf Toolkit
PBTK – The Protobuf Toolkit

Next, I grabbed the most recent release of a 100-MiB Android APK file from apkpure.com.

Excited in vain, the most PBTK could extract was a 59-byte proto file. Another tool called Apktool looked promising, but the best it can do is disassemble bytecode—not decompile it. That may be good enough for pentesters, though.

What ended up working for APK decompilation is a combination of a dedicated person’s dex2jar tool and a Java Decompiler. A helpful guide can be found here.

You can see that Google went out of its way to complicate reverse-engineering.

YouTube APK reversed into obfuscated Java classes
YouTube APK reversed into obfuscated Java classes

Google thoughtfully left a few hints.

All the Protobuf schemas laid bare and human-readable
All the Protobuf classes laid bare and human-readable

Upon deeper inspection, the Protobuf classes are right here, in Java, decorated with getters and setters. Since we are using Python and cannot get the true schema files, I will pause this approach for now.

Top ↩


Hardcore Deep-Dive into Protobuf and Wire Format

After gazing into a sea of decrypted network traffic again, then triggering errors and assertion fails on my iPhone with Protobuf fuzzing, and taking a peek at the error logs being phoned home, I notice that ads register for “slots” in a given video. They can register for pre-roll, mid-roll, end-roll, full-page, and ad pods (back-to-back ads). Blocking an ad URL causes an error along the lines of “some ad that doesn’t exist booked a slot,” and UI panic sets in.

I’m going to Sun Tzu the Protobuf Wire Format and come back in a bit…

I’m back. The Wire Format is surprisingly elegant, except for ZigZag encoding. Through trial and error, editing out chunks of Protobuf with a hex editor is just a no-go.

While computationally expensive, decoding, editing, and re-encoding without the original schema leads to a modified encoding. This is likely because we cannot detect whether ZigZag encoding is being used, or if a number is an int32, int64, sint32/64, varint, etc., plus the order of object fields is normally nondeterministic. Here is some Protobuf trivia on the matter:

Protobuf serialization gotchas
Protobuf serialization gotchas

Top ↩


Exploit a Protobuf Feature to Easily Remove All Ads by Changing One Byte

Feature or flaw? Well, “flaw” is a bit harsh. It is a design feature, actually, to make Protobuf robust. Let’s say among friends that the implementation of ads is flawed in YouTube’s Protobuf implementation. Yes, I like that better—Protobuf is quite elegant.

Casually poring over the C++ source code, an interesting comment in the Protobuf code catches my eye:

UnknownFieldSet is used to keep track of fields that were seen when parsing a protocol message but whose field numbers or types are unrecognized. This most frequently occurs when new fields are added to a message type and then messages containing those fields are read by old software that was compiled before the new types were added. (ref)

Yes, what to do with unknown fields? What to do indeed. And how easy would it be to change a 49399797 field key to, say, 49399796, making an entire sub-structure of advertisement and tracking information suddenly unavailable? Tantalizing.

If we can calculate the field tags in bytes with a little bit-twiddling, then we can use a simple regex to AMF1 the ad section in O(n) time.

As a motivating example, I’d like to find the field key 49399797, which is not as simple as searching for 2F1C7F5. Here is an implementation of a tag-scanning algorithm so you can see the bit-twiddling:

We know the wire type is 2 (length-delimited nested string/message), and one target field key is 49399797. When bit-twiddled, we get the target tag

AA FF B8 BC 01

where the final 01 happens to mean 2 (the wire type) in hex. In binary, this is:

10101010 11111111 10111000 10111100 00000001

Let’s lose the MSB from each byte as per the var-length wire format:

.0101010 .1111111 .0111000 .0111100 .0000001

Then we shift and add only the first four bytes since the LSB is first:

Finally, we shift out the number of wire type bits (3) to get back the field key:

395198378 >> 3 = 49399797

And that, folks, is a taste of how Wire Format works.

Fantastic. Now, all we have to do is scan the Protobuf bytes for classic ad URL signatures like /pagead/ to bound our field search, then move backward from there until we find the target(s) field tags and thus field keys we would like to denature (e.g. 49399797 –> 49399796).

Notice how the Protobuf response payload is 1.87 MiB?

It would be computationally expensive to decode, alter, and re-encode without the original .proto files, but a quick linear byte scan takes almost no effort!

Let me repeat that: Ordinarily, a ~1.8 MiB payload arrives, must be decoded in memory with the Protobuf schema, the structure walked, have ad nodes altered, then re-encoded, compressed, and passed on to the YouTube app. That is expensive work for the pfSense device!
Let me repeat the other thing: We just have to walk the raw Protobuf bytes received from YouTube and change one ad byte. Muhahaha.
Walking backward from the ad marker
Walking backward from the ad marker

A quick note: more than one matching field tag appears, but not all represent ads. That’s why I backtrack from the /pagead/ markers.

Multiple identical field tags may be present
Multiple identical field tags may be present

Top ↩


Smoke Test: Remove Ads from Protobuf in O(n)-Time

It works! In one pass, with no additional memory, I scan a 1.8 MiB chunk of gibberish-looking Protobuf data. Only at the 30,593rd byte (of 1.8 MiB) is the target found, and backtracking ~600 bytes yields the field key to denature. Not only is this amazing, but I no longer need to block *.googleadservices.com or URLs that contain /pagead/; those requests are never made in the first place anymore.

Successfully able to remove ads from the Protobuf response
Successfully able to remove ads from the Protobuf response

Top ↩


Analysis of This Successful Adblocking Technique

Summary

By taking advantage of a feature in Protobuf that lets it stay backward-compatible with schema changes—and noting that Protobuf is extremely sensitive to single-byte edits because of its compact format—we can change one byte in a critical spot and tell Protobuf that a deeply nested section belongs to a future schema version, so it is ignored. We can edit out ads elegantly.

Timing Analysis

Google returns huge Protobuf responses (for example, 1.8 MiB) that even include the iOS-app layout, so only native code (C++ / Swift) is fast enough to parse everything before the connection times out. I’ve shown that Python is several orders of magnitude too slow at decoding these payloads, so connections time out if Python touches them. With web-based JSON, the whole payload has to be parsed, edited, and re-serialized; with this Protobuf technique, the job takes microseconds—one linear scan plus a quick back-track—so it works for real-time adblocking with no blocklist. So neat.

Knock-On Benefits

Every *.googleadservices.com and /pagead/* URL on Apple devices comes from the Protobuf payload itself. Once that payload no longer contains ad data, those requests vanish for free. The YouTube app feels snappier because it never tries to fetch the ad URLs, so I avoid the endless block-list whack-a-mole. Ads never register for video “slots,” and the content just plays.

Future-Proof

This is a heuristic technique that looks for two strings: /pagead/ and a calculated field tag nearby, so the approach is future-proof.

Walking backward from the ad marker
Walking backward from the ad marker to find the field key

Even if Google changes the field tag (and breaks millions of apps and Apple TVs before they upgrade), it’s an academic exercise to enhance the script and discover the new tag(s) automatically.

Should Google Be Worried?

No, not at all.

This is a highly specialized technique to block Apple-device YouTube ads (or Instagram, WhatsApp, Facebook, etc. tracker traffic). The CPU requirement to decrypt and re-encrypt HTTPS traffic greatly exceeds what a Raspberry Pi can deliver. Even if some company repackaged my script into a NIC dongle, it likely wouldn’t be powerful enough. An NVIDIA Shield could handle it, but Android users can already patch binaries directly. My technique targets Apple-device owners who don’t want to compromise the OS, which further narrows the audience.


The MITMProxy YouTube Adblocking Script

Here is the MITMProxy add-on script that serves as a proof of concept to block YouTube ads on networked Apple devices. The script can be run as follows (note the prerequisites in the script and install them first). Name the file youtube.py, then run:

mitmdump --listen-port 8080 --listen-host 127.0.0.1 -s "youtube.py"

Here is the script, including a fairness function to allow ads 5% of the time:

This script works in Python for a TLS-decrypting man-in-the-middle proxy that is also written in Python. As a working proof-of-concept, it’s pretty rad. Of course, it can be rewritten in Rust, Go, or any language other than single-threaded Python, but, as an intellectual exercise to defeat ads served from the same domain as content, it’s elegant.

Top ↩


YouTube Premium

It’s unknown if CAD $9.99/mo $11.99/mo (about $13.43/mo with tax) is even reasonable: Do I personally incur CAD $11.99 of cost to advertisers each month?

How much does YouTube advertising cost?
Source

Since ads are auctioned, the CPV (cost-per-view) varies. Also, many ad campaigns have a capped daily budget, so theoretically there should be fewer ads in the evening as budgets run out during the day.

Experiment in Ad Viewing

I watched YouTube on and off for a day on a clean notebook computer with private browsing. My history shows that I only “watched” ten videos:

  • I fast-forwarded through a few to skip the “like and subscribe” padding.
  • I jumped to the end of one just to get to the “top three” in a “top twenty” list.
  • Two were low-quality, so I left early.
  • The rest were music videos.

In all, for watching parts of ten videos, I was exposed to eight ads, and only two were skippable (which I skipped).

$0.15 as a Ballpark CPV

Let’s use USD $0.15 as a CPV. In one day, let’s say I incurred 8 x $0.15, or $1.20 to advertisers. Extrapolated to one month, that is roughly USD $36/mo. Do I really cost advertisers USD $36/mo for very casual YouTube viewing? That sounds terrible for advertisers.

CPV from U.S. Advertising Spend Divided by Total Views

From Statista, U.S. advertisers spent $15.1 billion on YouTube in 2019, while U.S. residents watched 916 billion videos (ref). That averages to $15.1B / 916B, or USD $0.0165 per view. For me, that’s only about USD $0.13 per day.

Extrapolated to one month, I theoretically caused advertisers to spend roughly USD $3.96. Wait, that’s nowhere near USD $10 for Premium. Hmmm…

Is YouTube Premium Worth It?

YouTube Premium subscription fee as of Jan, 2022

During my ad-viewing experiment I muted the hardware and often looked away, so ad spend was wasted on me—sorry about that. Yet I still want to support creators. At CAD $13.48 per month, Premium costs more than the ads I am personally served, and more than a Netflix subscription. The only way to justify Premium is to run YouTube constantly in the background.

However, I truly enjoy a handful of creators, so I may let their videos loop in the background. I’ll try the three-month Premium trial while still monitoring what Google tracks about me.

YouTube Premium network traffic
YouTube Premium network traffic

Top ↩


DMCA, Sony, Viacom

Recently I learned that because of abuses of the Digital Millennium Copyright Act of 1998, YouTube creators who make reaction videos or “easter-egg” breakdowns can have their videos claimed by companies like Sony or Viacom. From the moment a claim is filed, all ad revenue flows to the claimant, not to the creator—so I may unknowingly be giving nothing to my favorite channels.

Did you know? Many fair-use and game-commentary videos receive automated copyright claims, sending ad revenue to large companies with deep legal pockets while creators get nothing. No wonder so many move to Patreon.

Top ↩


Summary of Accomplishments

I rarely give up, so this is an instance of going into an extreme problem-solving mode to solve a fun problem loosely using cryptography and reverse-engineering. In the end, a single byte turned it all around, so it was all worth it to come to an elegant and satisfying solution.

Success: We were able to set up a hardware router from scratch, segment LANs into trusted and untrusted zones, set up traditional DNS adblocking, add a transparent MITM proxy, and ultimately block YouTube ads on networked Apple devices, performantly.

Note: Now that the hard part is done, I’ll consider paying for YouTube Premium—trackers are still heavily blocked.

Top ↩


Notes:

  1. Adios, My Friend