forked from openzim/wikimedia_wp1_hitcounter
-
Notifications
You must be signed in to change notification settings - Fork 0
kelson42/wikimedia_wp1_hitcounter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Note: This requires a lot of disk space. The raw data uses about 1 GB per day of data, The output for 40 days of data takes about 16 GB. 1. Download the raw data If you're running on Tool Labs, call link-month.sh with each year and month you want to index. This will create links in source/ for each of the files in that month. Otherwise, download the hourly pagecounts-*.gz files from http://dumps.wikimedia.org/other/pagecounts-raw/ (projectcounts aren't needed.) 2. Make the list of average daily hitcounts, which will be created in the file hitcounts.raw.gz. Run sh make-raw.sh On Tool Labs, the grid engine should be used: jsub -cwd -j y ./make-raw.sh This will send the output of the command to ~/make-raw.out ; you can monitor this file with tail -f ~/make-raw.out
About
[ARCHIVED] Scripts to get the aggregated page views for all articles of a WM project (before pageview API was released)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Perl 77.0%
- Shell 23.0%