About Eric Draken

My computers


Publications


Integrating Content and Structure into a Comprehensive Framework for XML Document Similarity Represented in 3D Space

  • Eric Draken
  • Tamer N. Jarada
  • Keivan Kianmehr
  • Reda Alhajj
.

Chapter: Learning Structure and Schemas from Documents
Volume 375 of the series Studies in Computational Intelligence pp 275-287

Topics

Abstract

Learning Structure and Schemas from DocumentsXML is attractive for data exchange between different platforms, and the number of XML documents is rapidly increasing. This raised the need for techniques capable of investigating the similarity between XML documents to help in classifying them for better organized utilization.

In fact, the idea of similarity between documents is not new. However, XML documents are richer and more informative than classical documents in the sense that they encapsulate both structure and content; on the other hand, classical documents are characterized only by the content. According, using both the content and structure of XML documents to assign a similarity metric is relatively new. Of the recent research and algorithms proposed in the literature, the majority assign a similarity metric between 0.0 and 1.0 when comparing two XML documents. The similarity measures between multiple XML documents may be arranged in a matrix whereby data mining may be done to cluster closely related documents. In this chapter, the authors have presented a novel way to represent XML document similarity in 3D space.

Their approach benefits from the characteristics of the XML documents to produce a measure to be used in clustering and classification techniques, information retrieval and searching methods for the case of XML documents. We mainly derive a three-dimensional vector per document by considering two dimensions as the document’s structure and content, while the third dimension is a combination of both the structure and content characteristics of the document. The outcome from our research allows users to intuitively visualize document similarity.

Keywords

  • Similarity measures
  • XML
  • 3D space
  • visualization
  • intuitive representation
  • document similarity
  • platform independence

Citation

Draken, E. et al., 2011. Integrating Content and Structure into a Comprehensive Framework for XML Document Similarity Represented in 3D Space. Studies in Computational Intelligence, pp.275–287. Available at: http://dx.doi.org/10.1007/978-3-642-22913-8_13.

.

Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword: An Experimental Query Rewriter in Java

  • Eric Draken
  • Shang Gao
  • Reda Alhajj
.

Book title: Advanced Database Query Systems: Techniques, Applications and Technologies
Chapter 12: Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword: An Experimental Query Rewriter in Java pp 287-303

Topics

Abstract

Advanced Database Query Systems: Techniques, Applications and TechnologiesRelational Algebra (RA) and structured query language (SQL) are supposed to have a bijective relationship by having the same expressive power. That is, each operation in SQL can be mapped to one RA equivalent and vice versa. Actually, this is an essential fact because in commercial database management systems, every SQL query is translated into equivalent RA expression, which is optimized and executed to produce the required output.

However, RA has an explicit relational division symbol (÷), whereas SQL does not have a corresponding explicit division keyword. Division is implemented using a combination of four core operations, namely cross product, difference, selection, and projection. In fact, to implement relational division in SQL requires convoluted queries with multiple nested select statements and set operations. Explicit division in relational algebra is possible when the divisor is static; however, a dynamic divisor forces the coding of the query to follow the explicit expression using the four core operators. On the other hand, SQL does not provide any flexibility for expressing division when the divisor is static. Thus, the work described in this chapter is intended to provide SQL expression equivalent to explicit relational algebra division (with static divisor). In other words, the goal is to implement a SQL query rewriter in Java which takes as input a divide grammar and rewrites it to an efficient query using current SQL keywords. The developed approach could be adapted as front-end or wrapper to existing SQL query system.Users will be able to express explicit division in SQL which will be translated into an equivalent expression that involves only the standard SQL keywords and structure. This will turn SQL into more attractive for specifying queries involving explicit division.

Keywords

  • Relational algebra
  • SQL
  • optimized retrieval
  • divide keyword
  • relational division

Citation

Draken, E., Gao, S. & Alhajj, R., Making Query Coding in SQL Easier by Implementing the SQL Divide Keyword. Techniques, Applications and Technologies, pp.287–303. Available at: http://dx.doi.org/10.4018/978-1-60960-475-2.ch012.


Top Portfolio


Aikido Hombu Timetable iPhone App

May 2013 – Jun 2016

English version

  • Hombu Timetable iOS app
  • Hombu Timetable iOS app
  • Hombu Timetable iOS app
  • Hombu Timetable iOS app

A labour of love, the Hombu App, as it is affectionately known, uses heuristic1 schedule parsing to retrieve aikido schedule data from the world headquarters of Aikikai aikido in Tokyo (where I also practiced aikido for 5 years).

This app was desperately needed because the official web site at the time was still a 90s-era site which required up to 4 clicks to get at the schedule for the day, plus it wasn’t mobile-friendly. Viewing the schedule required POST form submissions so bookmarking was impossible. On top of that, the schedule would change at the drop of a hat, so we would be often surprised by different teachers.

This app solved several problems by using push notifications and a pleasing chime to alert users to schedule changes which were often. By adding pictures of the teachers, scanning the schedule at glance is quick and convenient, unlike the aforementioned 4-click method. Additionally, the back-end remembers teacher changes so visually one can see what changes have been made far into the future as well – up to 60 days, whereas the official schedule is limited to only 14 days.

The main language of users is Japanese, but many visitors speak English. Although told it is impossible to change the app language on-the-fly, I figured out how to do just that – seamlessly change the app language without leaving the app. As time went on, I added video support with animated video thumbnails and video pop-outs so again one doesn’t have to leave the app. Videos can also be saved from YouTube for offline viewing in the app. This technique has worked flawlessly for over three years despite numerous YouTube API changes.

Technology Used

  • Objective-C (iOS app)
  • Core Java 7, PHP (servers)
  • MySQL DB backend
  • Static JSON generation
  • Automatic app updater tool
  • Weather API integration
  • YouTube downloader and inline player
  • Database failover to an alternate server

Real-Time Multi-Threaded Financial Exchange Data Collector

Dec 2017 – Present

A massive project resulting in over 32,000 LOC2 and over 400 JUnit5 and parametric tests designed to consume real-time financial data as fast as possible from an unforgiving API, store it efficiently, perform ETL in-place, and continuously back up the data off-site. The result is a battle-tested, lean, multi-threaded Java application designed to use 8 cores efficiently, store data on RAID-1 HDDs, transform data for downstream use by Apache Spark, and provide real-time monitoring via Slack updates.

Motivation: To purchase a subset of the above data costs thousands of dollars a month on subscription, is incomplete, lags, and suffers from survivorship bias. Getting minute-resolution data is even more expensive. Collecting the raw financial data directly from the exchange provides us with complete and actionable data for ML modeling.

Result: This project saves thousands of dollars a month and is superior to using 3rd-party data sources.

Technology Used

  • Core Java 9 with Maven
  • REST API with OAuth
  • AWS S3 Java SDK
  • Spring Framework
  • Logback Framework
  • JUnit5, Mockito, Hamcrest
  • Websockets (streaming data)
  • JMX status monitoring
  • Slack status reporting
  • MySQL and SQLite (over 200GB)
  • Embedded hardware and custom Linux
  • RAID-1, continuous backups to S3

Competitive Intelligence Sites Monitor

Dec 2016 – Dec 2018

A fun project designed to monitor competitors for promotions, sales, and new products. Using Spring, this piece of software regularly scrapes other websites, RSS feeds, sitemaps, and various other web touch points looking for changes. Combined with automatic git diff, as well as headless-Chrome based screenshots, significant changes are recorded and sent via Slack to various channels for various competitors.

Result: All competitor intelligence is delivered to Slack channels daily eliminating the need to remember and check those sites and feeds manually.

Technology Used

  • Core Java 8 with Maven
  • Slack API SDK
  • Spring Framework
  • Logback Framework
  • JUnit, Mockito
  • Hibernate ORM
  • MySQL and JDBC
  • Headless Chrome

Websites


Canada Vacations

A massive undertaking involving months of project planning, user journey plotting, data architecting, data warehousing, gigabytes of digital asset management, real-time API communication, and analytics to create a completely new and fast website to disrupt the luxury train travel space in Canada. I was involved in each stage of planning and designed and built 100% of the backend CMS.

CanadaVacations.com trip page
CanadaVacations.com trip page

Some of the key challenges I overcame were:

  • Managing 17,000+ photos on AWS
  • Making WordPress faster with Memcache and Redis
  • Using in-memory SQLite for fast site searches
  • Rapid development using Docker
  • Synchronizing the production and development databases
  • Extensive customization with ACF3
  • Reliable communication with product and pricing API (also built by myself)

Worldwide Brands Canada

This site makes use of 100% static HTML with lazy-loaded image assets, minified CSS and JS, and parallel loading for faster asset downloads. This site is responsive and mobile-friendly.

Worldwidebrands.ca marketing site
Worldwidebrands.ca dropshipping marketing site

Here are some exciting performance insights into the above site courtesy of tools.pingdom.com. This site is hosted on $6/mo shared host with basic HDD disks, but the performance comes using parallel loading and taking advantage of Cloudflare’s caching ability like a CDN. Google Analytics is the bottleneck.

Performance insight results for worldwidebrands.ca
Performance insight results for worldwidebrands.ca

Pronunciation Power

The site below is in Japanese and makes use of minified CSS and JS, parallel loading, and rich media like a responsive HD YouTube video embedded in the frame of a modern MacBook with an animated preview GIF. This is done so the YouTube clip is only loaded when the play button is pressed, yet the animation draws the visitor in to press the play button. In addition, MP3/OGG audio clips are HTML5-embedded into a listening quiz with a final score and advice presented to the quiz-taker. The main CTA (call-to-action) buttons stand out in a pleasing way.

Pronunciation-power.com language software site
Pronunciation-power.com English language software site

Here are even better performance insights into the above site again courtesy of tools.pingdom.com. This site is hosted on a cheaper $4/mo shared host with basic spinning HDD disks, but the performance gain comes using parallel loading and utilizing Cloudflare’s caching capability. Google Analytics is again the bottleneck.

Performance insight results for pronunciation-power.com
Performance insight results for pronunciation-power.com

Masakokoro Aikido Dojo

This was one of my first designs made nearly 7 years ago! It’s not updated much by the owner, but it’s still in existence and the graphic elements have held up over time.

Masakokoro Aikido dojo
Masakokoro Aikido dojo

Ericdraken.com

This site was hosted on an expensive shared host with abysmal performance, so I moved it to serverless and now it is effortless to maintain, plus due to the speed and TTFB being quick, the SEO bump has been remarkable for such a niche website. Also, thank you for visiting!

Ericdraken.com performance report
Ericdraken.com performance report


Notes:

  1. Heuristics are needed because different people who oversee the official schedule enter dates differently, in different formats, and sometimes wildly different (i.e. 2016/5/08, 16-01-5, 2016年5月12).
  2. 32 KLOC as counted by find . -name '*.java' | xargs wc -l
  3. Advanced Custom Fields