I created Hayfevr.ly (GH repo to come) to solve a problem I had. Two local allergy clinics and a TV news station posted daily pollen readings on their websites, so I had to keep re-checking three different websites each morning to find out what was making me sneeze each day.

I wrote a web scraper using Python and Selenium that checks these websites for new readings, updates the daily pollen counts as newreadings come in, and summarizes the data in one convenient place on the website.

Tech Stack

Python
Selenium for the scraper
Tesseract for OCR to extract pollen counts published only as graphics
AWS S3 for web service
MySQL for storing current and historical readings