GoScrapr is a web scraper tool written in go
- compile
make
- run with
./bin/scraper <url> [rule set]
The rule set is a json file, look at ruleSet.json for the structure
- this is the core tool, there should be an api to use it, so no all prints should be replaced by error code except for the text scraped, which should be passed to the cout
- rule priority functionnality
- write xlsx writer, for simple html table to xlsx table
- python ML to image recognition, then image to xlsx table
- [[MAYBE]] model to recognise patterns and predict tables