A job scheduler and analysis tool for webscraping (and other) tasks.
Curently the following datasources are implemented:
"
- facebook posts and reactions scrape facebook posts, comments and reactions (like, heart, etc)
- gab (nazi-twitter) crawl posts for user
- google dorking find interesting files and download them
- json to csv convert json array into csv
- mail sends mails and files - mostly usefull in pipelines
- masscan udp based port scanner (requires docker)
- motiondetection script to to motionanalysis in directory with videofiles
- onionlist download tor-catalogue from onionlist.org
- onions.danwin1210.de download tor-catalogue from danwin1210.de, and creates screenshots of each website in the result
- tiktok get video metadata per hashtag, download them and analyse the text using easyOCR
- url generic http scraper
- urlscreenshotter scrapes comma separated list of urls and creates screenshot of each of them"
- copy template dir in ./jobs
- define fields in fields.js which are needed to start the job
- a job can output one or multiple files
- no directories should be used, please use archives
- use job_id.ext (eg job_id.json) as filename
- simple configuration of actions/datasources, also from 3rd party modules/repos
- job monitoring and scheduling
- schedule jobs
- sqlite, csv and json browser
- separation of datasets/artifacts (one archive per crawl)
- scalable amount of workers (also on other machines)
- GUI to create and schedule jobs
- Displays pending, running and done jobs
- Display csv and sqlite datasets
- Can be distributed (workers and c&c on different locations/servers)
- Jobs are managed through json files (and can be distrubuted with an adapter like pouchDB)
- Multithreaded
npm i
npm run all