Releases: Leader-board/Wikimedia-statistics
discord_overall_oct23: Update readme.md
You'll need to contact me at leader-board at outlook.com for the full data (i.e, the messages themselves), since the mods for whatever reason have forced me to keep it off public view. You'll need to ask them why; I got weird responses citing the ToS without proof. Or just contact me on Meta/Wikibooks (username: Leaderboard). I've basically being coerced by them - so this is not by choice (and they threatened consequences should I not comply).
Wikipedia Discord pickle
Pickle data for each channel; each is a Pandas dataframe. Useful if one wants to do further analysis on Python using Pandas.
Wikipedia Discord overall
This includes
- CSV and SPSS files, including some basic analysis. A PDF version of this analysis is also available.
- User frequency data for the overall discord.
Wikipedia Discord user frequency
This section contains the user frequency (i.e, by number of messages sent) for every channel in Wikipedia's Discord.
Note the use of a rather uncommon separator (╡)- this is to prevent conflicts with usernames that may have common separators.
Wikipedia Discord JSON raw
This section contains the raw JSON data for every publicly-viewable channel at Wikipedia's Discord, as of a sample taken on September 13th (~7:30 - 10:30 PM BST). DiscordChatExporter was used for this.
Converting the JSON
Use the script in this repository to work with the JSON. Alternatively, Excel and pickle formats are available, with SPSS and CSV formats available for the combined data (i.e, over all channels).
Cross-wiki user data
This contains the user edit (and global edit count) of every wiki, separated by the pipe operator ("|"). This CSV was created using the Java program in this repository.
Note that the CSV itself is about 128 GB, which is too large to store as a single file. Hence I've split it into 70 CSV files, each of them with a million rows. These CSVs compress really well (a test showed that the original CSV file could be compressed to as low as 822 MB).
Review the notes on handling this file - it's large and not amenable to common analysis tools such as Excel.
Raw user account data
Contains the list of all accounts and the number of edits that account made on each Wikimedia project.
Note that due to the size, the larger ones cannot be loaded fully onto Excel. SQL or tools like SPSS would be required.
Processed rank and percentile user data
Contains, for selected wikis, the rank and the percentile of every user in terms of edit count.