lessons from a side project: score scraping

Yesterday I started (and kind of finished) a score scraping project from a local league. About an hour in I almost quit, feeling both “this is impossible” and “this is useless.” But it turns out it wasn’t impossible. And it may not yet be useless. So what did I learn?

  1. I’m not sure how to word this into a concise point, but I started by stripping out the HTML, which was a mistake. The scores are in a table, so knowing when each row ended was a great help. Plus, there’s a fixed number of columns (a mix of th and td), so having the HTML really did help.
  2. If you’re going to use a library, especially for something extensive that will take a while to replace if you decide to then NOT use the library, maybe check the docs and see if 1- there are any docs 2 – they are any good and 3 – if it does what you want.

I switched from Lokijs to NeDB and now I’ll probably switch to AlaSQL – IF it supports what I need. Thankfully switching from Lokijs to NeDB was pretty easy, since they used identical insert functions/methods/whatever.

So now that I’ve more or less done what I started to do – scrape the scores and sort by record, I’m interested in things like points for, points against, points at home, on the road, point differential, etc. Because why not. And those things seem to be impossible (very difficult?) with NoSQL.

Reply