Static website, generated daily, that tracks the use of essay questions within job applications.
Dashboard (updated daily): https://usajobsloyaltytests.netlify.app/
The system:
- Daily scrapes questionnaires from USAStaffing and Monster Government
- Identifies jobs asking "How would you help advance the President's Executive Orders and policy priorities in this role?"
- Shows trends by agency, location, grade level, and time
- Updates automatically via GitHub Actions
This site uses data from the USAJobs API but is not an official USAJobs project.
-
Install Git LFS (Large File Storage): This project uses Git LFS for parquet files. Install it before cloning:
# macOS brew install git-lfs # Ubuntu/Debian apt-get install git-lfs # Windows choco install git-lfs
After cloning the repository:
git lfs install git lfs pull
-
Create virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Create .env file (only needed for current jobs collection):
# .env USAJOBS_API_TOKEN=your_api_token_here # Get from https://developer.usajobs.gov/
Note: The API key is only required for collecting current jobs. Historical data collection does not require authentication.
Workflow for data updates:
# Collect current jobs
cd src/generate_data
python ./run_data_pipeline.pyWorkflow for questionnaire site updates:
# Collect current jobs
cd src/generate_site
python ./run_questionnaire_pipeline.pyHistorical data collection (if needed):
- Single year: src/generate_data/run_single.sh
- Multiple years: src/generate_data/run_parallel.sh
# Single year:
src/generate_data/run_single.sh range 2024-01-01 2024-12-31
# Multiple years:
src/generate_data/run_parallel.sh 2020 2021 2022- Parquet Files: Storage format
historical_jobs_YEAR.parquet: Historical job announcements by yearcurrent_jobs_YEAR.parquet: Current job postings by year
- Logs: Stored in
logs/directory with aggressive data gap detection
See docs/CONTRIBUTING.md.
Licensed under the LGPL 3.0; see LICENSE.txt for details.