Skip to content

Scrapes indeed job posts using advanced search feature of indeed.com

License

Notifications You must be signed in to change notification settings

mughees936/Indeed-Job-Scraper

Repository files navigation

Indeed-Job-Scraper

Scrapes indeed job posts using advanced search feature of indeed.com

Description* Indeed.py contains the whole source code for this automation, result-.json contains the scraped results againset the user entered query. This script uses proxies from proxy_list.txt When user searches on indeed it returns a web page containing listings of matching jobs but each job post is loaded in an iframe when clicked. To bypass this problem instead of using requests or urllib, I used selenium as it creates a new instance of browser hence the page will behave as if in a real browser.

Usage Input to indeed.py is passed using command line arguments. It takes input in the form of dictionary item. There are five flags for which values can be set. 1 -ttl used to set job title in indeed.py 2 -jt used to set job type in category 3 -rad used to define within radius of jobs 4 -age used to define the age of job 5 -loc used to set the location in query

Example command to invoke script would be something like

$ python3 indeed.py -ttl "Python Developer" -jt all -rad 10 -age any -loc "New York"

The above command searches for Python Developer jobs of all type within 10 miles radius of New york from any time. There is no specific order in passing the arguments but you'll have to use the switches -ttl, -jt, -rad, -age, -loc to pass a value correctly otherwise script won't recognize it and a default value will be used.

-ttl switch expects a string value to be used as job title in search query. Note, if value contains a space, enclose the value in double or single quotes

-jt switch expects any value from below mentioned list [all, commisions, contract, intenship, fulltime, parttime, temporary]

-rad expects a value of type int which uses below mentioned conventions

0 = only in
5 = 5mi
10 = 10mi
15 = 15mi
15 = 25mi
50 = 50mi
100 = 100mi

-age switch expects a value as per mentioned in below convention

any = anytime
15 = within 15 days 
7 = within 7 days
3 = within 3 days
1 = since yesterday
last = since my last visit

-loc switch expects a string value for location

Requirements

python 3.x

selenium

random_user_agent

chromedriver should be placed in the same directory as indeed.py

About

Scrapes indeed job posts using advanced search feature of indeed.com

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages