Estimate how frequently Python packages are imported across public GitHub repositories.
We determine package popularity by:
- Randomly sampling GitHub repositories with Python as the main language
- Analyzing Python import statements in these repositories
- Extrapolating findings based on the total Python repository count (~18M repositories
The system continually improves its accuracy by sampling additional repositories every 6 hours via GitHub Actions.
Note: We have stopped considering standard Python libraries but have not yet removed all the data.
Script | Purpose |
---|---|
find_repos.py | Queries GitHub API for random Python repositories |
analyze_imports.py | Extracts import statements from repository files |
count_libs.py | Aggregates and calculates package usage statistics |
update_readme.py | Refreshes this README with latest data |
total_python_repos.ipynb | Estimates total Python repository count on GitHub |
File | Description | Format |
---|---|---|
repos.jsonl | Details of processed repositories | JSONL |
imports.jsonl | Raw import statements extracted from repos | JSONL |
library_counts.csv | Aggregated package usage statistics | CSV |
Our GitHub Actions workflow orchestrates the entire process:
Find Random Repos → Analyze Imports → Count Package Usage → Update Statistics → Refresh README
Rank | Library | Count |
---|---|---|
1 | numpy | 661 |
2 | requests | 321 |
3 | pandas | 273 |
4 | matplotlib | 230 |
5 | torch | 221 |
6 | django | 170 |
7 | utils | 128 |
8 | sklearn | 128 |
9 | cv2 | 121 |
10 | scipy | 108 |
Last updated: 2025-03-27 18:35:18 UTC