Puppet Several Headless Chrome Instances Behind Different VPNs in Docker, No IP Leaks

For my machine learning projects, I need data you just can’t buy. This requires SPA (Single-Page App) web data extraction involving multiple clicks and page scrolling that curl can’t handle. Headless Chrome puppeted by RDP (Remote Debug Protocol) is a brilliant solution for this. Here is how I orchestrated several headless Chrome instances across several VPNs in Docker.

Attempts at Supplying Efficient Logic-Level Power to Sensitive Electronics

A power supply, when suddenly turned off, bleeds voltage slowly. Attached electronics experience a gradual voltage decline from 5V to 3.3V and eventually to zero. The problem is that microcontrollers and microprocessors don’t know how to behave with under-voltage. Their behavior and flash memory integrity is not defined. Flash memory can even be erased. Here I outline my attempts to achieve an efficient logic-level power supply.

Big Data Backup to S3 Glacier via Java SDK with Spend Evaluation

Why use AWS Glacier for big data backup? It’s exceedingly inexpensive to archive data for disaster recovery on Glacier. AWS Glacier is only US$0.004 per GB/mo, and their SDK is beautiful. Here I outline a pricing matrix for cloud storage providers, and I take a look at the Java SDK for working with AWS Glacier to effectively archive 200GB a week.

Block Malicious and Obnoxious Ads and Scripts at the DNS Level

Ads are getting more and more aggressive. Some ads are even malicious. Some sites even load crypto-currency mining scripts in the background in JavaScript. Users have discovered that a lot of traffic is advertisement or tracking scripts, and putting a damaging strain on our mobile device data plans and batteries. Here I explain how to safely setup Pi-hole – a network-wide ad-traffic blocker – in a Docker container on an external Linux device as a hardware DNS server to block ads.

Profile Remote Java Apps with VisualVM or JMC

Sometimes remote Java apps leak memory or are killed by the OS. Let’s connect through an SSH tunnel to a remote JVM running on an embedded Ubuntu system and profile memory and CPU usage with free tools VisualVM and JStatD, or Java Mission Control. No firewall adjustments are needed. We’ll also set up JMX connections to allow remote heap dumps and garbage collection. Finally, I’ll explore the features of VisualVM.

Troubleshooting a Voltage Discrepancy

Breadboard power supplies cost less than a dollar on AliExpress. They are quite convenient for quickly powering and prototyping microprocessor circuits, Arduino projects with sketches, USB-powered prototypes, and on. The imagination is the limit. I spent the morning trying to figure out why my MB102 breadboard power supply was outputting only 3.5V, not the expected 5.0V.

Experience with Inexpensive MicroSD Cards

For the cluster computing project I’m working on, I need 28 microSD cards. There was an AliExpress sale with good reviews, so I ordered a batch of 30 microSD cards, and at a great price point at the time. As long as the cards are Class 10 and work then we should be good, right? Results: Half are fake or defective. The rest are painfully slow. No refunds.

Cleaning Raw Candle Data for Time-Series Analysis

Problem: How to clean the raw OHLCV candle data from the broker for time series analysis? Suppose we have an autonomous program that prioritizes and continually downloads the latest minute and day candles, as well as periodically gets new symbols from the broker. The problem is that the candles are not guaranteed to be full-period […]

Acquiring Candle Data for Quantitative Financial Analysis Research

This would make a good interview question: There are about 120,000 public North American securities, bonds, rights, and index symbols. You have a paid API that can access all of them in OHLCV format if they are quotable. There are two critical API constraints: 15,000 calls per hour 20 calls per second Napkin math Minute […]

Automated Web Testing with PHP and Selenium

Things break. Just the other day through a series of seemingly unrelated events, a new Microsoft x509 certificate made its way into a security handshake process which went unnoticed until current single sign-on sessions began to expire. Had we also had automated security testing, we would have caught this one-off. I’ll explain how I set […]

Algorithm: Optimized PDF Web Page Print Layout

I’d like to share my efforts to prevent page breaks in the middle of paragraphs and maximize the use of page space when printing web pages to PDF. I’ll outline how this PHP+NodeJS+Chrome tool and algorithm accomplish this. The motivation is to prevent pictures from being cut off, cut halfway through, or from being pushed […]

Running Xvfb on a RHEL Shared Host (without X)

This is how I compiled the Xorg Server for RHEL on a CentOS machine with modifications to create a portable Xvfb binary. Xvfb (X virtual framebuffer) is an in-memory display server for Linux and Unix-like OSes. It enables running graphical applications without a display such as running a headless browser (e.g. A full-blown Firefox instance […]

Mitigating AdWords Click Fraud

Every now and then there is an hours-long campaign of fraudulent AdWords-clicking from countries all over the world, ranging from Iran to Singapore, dedicated to clicking my cost-per-click Google ads in a vain attempt to exhaust a given daily budget early. My hat goes off to the chap for organizing the attack, or at least […]

Posted in: Problem StoriesTags: AdWords, Cloudflare, Google, GTM, Htaccess

Pokémon Go Scanner

The inspiration to make my own Pokémon Go scanner came from this great site FastPokeMap.se (and Twitter feed). Try this site first before venturing out to make your own scanner. It’s a neat site, but unfortunately each scan is slow takes upwards of 20 seconds, and the failure rate is high. It’s strength comes from […]