Realtime sales data webscraper

A program to export and analyse sales data from a 3rd party site.

This was a project I was contracted to do by a small local jewelry engraving business. They use a third party site to manage their online sales which unfortunately did not have any sales data exporting features.

The problem

The business wanted to be able to store a list of the most popular initials people had engraved for specific items. This would allow them to reduce down time for their employees and more quickly meet demand.

The solution

I developed a web scraper in Python using the Selenium library; this could browse to the sales site, login and extract data from the sales history page. The program was designed to export CSV files that could easily be used to see the popularity of each initial.

The implementation

Using Selenium made a lot of sense at the time, as the website was very dynamic it avoided any incompatibilities that a headless scraper may have. If I wear to redesign the bot now I would likely look into using much lower level libraries and potentially reverse engineering the API used by their website if it is in a usable format as this would avoid the many edge cases that you encounter with Selenium, improve speed and reduce processing time. Allowing the application to run in real time on a headless device like a Raspberry Pi.

Future plans

It would be brilliant to expand this application into a more advanced inventory prediction/management tool, by analysing their other products and also accounting for time of year it would be possible to produce an estimation of what products they should be stocking. (For example near valentines, mothers day and Christmas certain items will likely become more popular). This could also allow for more creative advertising by marketing items that are popular.