Here I outline an algorithm to parse and efficiently store gigabytes of financial snapshots of thousands of companies in order to graph fundamental changes in their health over time, and to perform machine-learning experiments on the fundamental value of those companies.
For my machine learning projects, I need data you just can’t buy. This requires SPA (Single-Page App) web data extraction involving multiple clicks and page scrolling that curl can’t handle. Headless Chrome puppeted by RDP (Remote Debug Protocol) is a brilliant solution for this. Here is how I orchestrated several headless Chrome instances across several VPNs in Docker.
A power supply, when suddenly turned off, bleeds voltage slowly. Attached electronics experience a gradual voltage decline from 5V to 3.3V and eventually to zero. The problem is that microcontrollers and microprocessors don’t know how to behave with under-voltage. Their behavior and flash memory integrity is not defined. Flash memory can even be erased. Here I outline my attempts to achieve an efficient logic-level power supply.
Why use AWS Glacier for big data backup? It’s exceedingly inexpensive to archive data for disaster recovery on Glacier. AWS Glacier is only US$0.004 per GB/mo, and their SDK is beautiful. Here I outline a pricing matrix for cloud storage providers, and I take a look at the Java SDK for working with AWS Glacier to effectively archive 200GB a week.
This is a problem story about how I preferred Java to other languages to communicate with a troublesome financial REST API endpoint because Java is a strongly-typed and verbose language where it is easy to write unit tests and build up solid modules to make a complete, resilient project.
Normally data packets come and go on the same interface, but VPN routing causes response packets to return through the tunnel and are dropped as unsolicited traffic – the connection hangs until a timeout. This makes it difficult to SSH into a server with an active VPN connection, but I explain a way do just that.
Sometimes remote Java apps leak memory or are killed by the OS. Let’s connect through an SSH tunnel to a remote JVM running on an embedded Ubuntu system and profile memory and CPU usage with free tools VisualVM and JStatD, or Java Mission Control. No firewall adjustments are needed. We’ll also set up JMX connections to allow remote heap dumps and garbage collection. Finally, I’ll explore the features of VisualVM.
Breadboard power supplies cost less than a dollar on AliExpress. They are quite convenient for quickly powering and prototyping microprocessor circuits, Arduino projects with sketches, USB-powered prototypes, and on. The imagination is the limit. I spent the morning trying to figure out why my MB102 breadboard power supply was outputting only 3.5V, not the expected 5.0V.
My newer-model Panasonic microwave oven stopped working. To get it working I needed to get past anti-tamper screws and “special” fuses. I suspect Panasonic wants us to buy another microwave instead. Not this time!
For the cluster computing project I’m working on, I need 28 microSD cards. There was an AliExpress sale with good reviews, so I ordered a batch of 30 microSD cards, and at a great price point at the time. As long as the cards are Class 10 and work then we should be good, right? Results: Half are fake or defective. The rest are painfully slow. No refunds.
Problem: How to clean the raw OHLCV candle data from the broker for time series analysis? Suppose we have an autonomous program that prioritizes and continually downloads the latest minute and day candles, as well as periodically gets new symbols from the broker. The problem is that the candles are not guaranteed to be full-period […]
Before acquiring financial time-series candles, I need to know the database schema, storage growth, and cost of maintaining the database. How large could financial data grow and cost?
This would make a good interview question: There are about 120,000 public North American securities, bonds, rights, and index symbols. You have a paid API that can access all of them in OHLCV format if they are quotable. There are two critical API constraints: 15,000 calls per hour 20 calls per second Napkin math Minute […]
Things break. Just the other day through a series of seemingly unrelated events, a new Microsoft x509 certificate made its way into a security handshake process which went unnoticed until current single sign-on sessions began to expire. Had we also had automated security testing, we would have caught this one-off. I’ll explain how I set […]
Here are few PHP web shell scripts I found in a production server in late 2016. I’ll show some of them, sneaky as they be, and then my efforts for securing a production server.
I’d like to share my efforts to prevent page breaks in the middle of paragraphs and maximize the use of page space when printing web pages to PDF. I’ll outline how this PHP+NodeJS+Chrome tool and algorithm accomplish this. The motivation is to prevent pictures from being cut off, cut halfway through, or from being pushed […]
These are the steps I took to compile Firefox so it can run on a RHEL shared hosting server which doesn’t have D-Bus installed and only has GLibc 2.12. Situation You want to run headless Firefox on a shared host running RHEL You don’t have privileged access (e.g. no root) Your shared host only has […]
This is how I compiled the Xorg Server for RHEL on a CentOS machine with modifications to create a portable Xvfb binary. Xvfb (X virtual framebuffer) is an in-memory display server for Linux and Unix-like OSes. It enables running graphical applications without a display such as running a headless browser (e.g. A full-blown Firefox instance […]
Every now and then there is an hours-long campaign of fraudulent AdWords-clicking from countries all over the world, ranging from Iran to Singapore, dedicated to clicking my cost-per-click Google ads in a vain attempt to exhaust a given daily budget early. My hat goes off to the chap for organizing the attack, or at least […]
The inspiration to make my own Pokémon Go scanner came from this great site FastPokeMap.se (and Twitter feed). Try this site first before venturing out to make your own scanner. It’s a neat site, but unfortunately each scan is slow takes upwards of 20 seconds, and the failure rate is high. It’s strength comes from […]