Effects of Storing Financial Numbers as Integers over Decimals: Huge Data Savings
My broker’s API returns financial data as doubles (numbers like 9.999999997) so I’ve been storing them as 8-byte REALs in SQLite. With periodic vacuuming, the databases have grown in line with expectations. However, I’ve always wanted to explore storing the data as integers. There are three amazing benefits to storing financial numbers as integers:
- There are no rounding errors when storing integers.
- SQLite can store integers using between 1 byte and 8 bytes (ref).
- Analysis using integers is inherently faster than with floats and doubles.
After transforming the table schemas and data of about 212 GB, here are the results:
Before (212 GB)
Start | End | Open | High | Low | Close | Volume | Status |
---|---|---|---|---|---|---|---|
INT 4B | INT 4B | REAL 8B | REAL 8B | REAL 8B | REAL 8B | INT 4B | INT 1B |
After (142 GB)
Start | End | Open | High | Low | Close | Volume | Status |
---|---|---|---|---|---|---|---|
INT 4B | INT 4B | INT 1~8B | INT 1~8B | INT 1~8B | INT 1~8B | INT 4B | INT 1B |
Here are some specific reduction examples.
I took two databases and migrated them to all integers then vacuumed them. With active symbol ABX I achieved a 29% reduction, and with retired symbol CBW I achieved again a 29% reduction. The benefits are immediate and real.
Data representation with integers
Here is a before and after animation of ABX showing how the financial numbers are represented now. I multiplied all the REALs by 100,000, rounded them naturally, and cast them to INTEGERs. This maintains a five-decimal precision and supports huge financial numbers into the trillions and beyond.