Skip to content

Releases: Leader-board/Wikimedia-statistics

discord_overall_oct23: Update readme.md

23 Oct 14:36
bf2d0df

Choose a tag to compare

You'll need to contact me at leader-board at outlook.com for the full data (i.e, the messages themselves), since the mods for whatever reason have forced me to keep it off public view. You'll need to ask them why; I got weird responses citing the ToS without proof. Or just contact me on Meta/Wikibooks (username: Leaderboard). I've basically being coerced by them - so this is not by choice (and they threatened consequences should I not comply).

Wikipedia Discord pickle

15 Sep 19:51
011f184

Choose a tag to compare

Pickle data for each channel; each is a Pandas dataframe. Useful if one wants to do further analysis on Python using Pandas.

Wikipedia Discord overall

15 Sep 19:50
011f184

Choose a tag to compare

This includes

  • CSV and SPSS files, including some basic analysis. A PDF version of this analysis is also available.
  • User frequency data for the overall discord.

Wikipedia Discord user frequency

15 Sep 12:52
4649383

Choose a tag to compare

This section contains the user frequency (i.e, by number of messages sent) for every channel in Wikipedia's Discord.

Note the use of a rather uncommon separator (╡)- this is to prevent conflicts with usernames that may have common separators.

Wikipedia Discord JSON raw

15 Sep 12:47
4649383

Choose a tag to compare

This section contains the raw JSON data for every publicly-viewable channel at Wikipedia's Discord, as of a sample taken on September 13th (~7:30 - 10:30 PM BST). DiscordChatExporter was used for this.

Converting the JSON

Use the script in this repository to work with the JSON. Alternatively, Excel and pickle formats are available, with SPSS and CSV formats available for the combined data (i.e, over all channels).

Cross-wiki user data

14 Apr 19:54

Choose a tag to compare

This contains the user edit (and global edit count) of every wiki, separated by the pipe operator ("|"). This CSV was created using the Java program in this repository.

Note that the CSV itself is about 128 GB, which is too large to store as a single file. Hence I've split it into 70 CSV files, each of them with a million rows. These CSVs compress really well (a test showed that the original CSV file could be compressed to as low as 822 MB).

Review the notes on handling this file - it's large and not amenable to common analysis tools such as Excel.

Raw user account data

29 Mar 21:11
499b219

Choose a tag to compare

Contains the list of all accounts and the number of edits that account made on each Wikimedia project.

Note that due to the size, the larger ones cannot be loaded fully onto Excel. SQL or tools like SPSS would be required.

Processed rank and percentile user data

29 Mar 21:31
d8618fd

Choose a tag to compare

Contains, for selected wikis, the rank and the percentile of every user in terms of edit count.