Key topics
- Software Engineering
- Infrastructure
- Amazon Web Services
Objective
We are currently pulling tweets on an ad-hoc basis. Since we need as much training data as possible to build good models, and because Twitter only gives us 7 days of data at any given time, we want to build a service that regularly pulls and saves data from Twitter.
First steps
In order of complexity, we will want to spin up a server on glitch.me, Heroku, or AWS EC2. The simplest implementation of this tweet puller would be a job that hits the Twitter API and dumps the response to a file. We can schedule this job using cron.
Useful tools
Crontab Man Pages - man7.org
Twitter API Docs
Key topics
Objective
We are currently pulling tweets on an ad-hoc basis. Since we need as much training data as possible to build good models, and because Twitter only gives us 7 days of data at any given time, we want to build a service that regularly pulls and saves data from Twitter.
First steps
In order of complexity, we will want to spin up a server on glitch.me, Heroku, or AWS EC2. The simplest implementation of this tweet puller would be a job that hits the Twitter API and dumps the response to a file. We can schedule this job using
cron.Useful tools
Crontab Man Pages - man7.org
Twitter API Docs