Curated list of awesome resources and links about Software Analytics.
This list is an open community project. Feel free to contribute your ideas to it.
What is "Software Analytics"?
Software analytics is analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions.
-- Tim Menzies, Thomas Zimmermann
ordered by "from theory to practice"
- Influential Papers
- Systematic Literature Reviews
- Academic Courses
- Books
- Blog Posts
- Podcasts
- Talks
- Microsites
- Hands-Ons
- Lists of Tools
- Related Awesome Lists
This section lists important papers from which Software Analytics has emerged.
- Thomas Zimmermann, Peter Weisgerber, Stephan Diehl, and Andreas Zeller: Mining Version Histories to Guide Software Changes (2004) - A real classic about the idea of using software version control systems to guide software changes (awarded with the ICSE 10 Years Most Influential Paper Award 2014).
- Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte, and Ekrem Kocaganeli: The inductive software engineering manifesto: principles for industrial data mining (2011) - Makes you aware of various aspects that you have to consider if you want to implement Software Analytics in the industry.
- Andrew Begel and Thomas Zimmermann: Analyze this! 145 questions for data scientists in software engineering (2013) - a kind of meta-level paper about the questions that arise during software development which may be answered with Software Analytics.
- Ahmed E. Hassan and Tao Xie: Software intelligence: the future of mining software engineering data (2010) - discusses (among other topics) which types of software data can be used with existing data mining techniques.
- Tim Menzies and Thomas Zimmermann. Software Analytics: So What? (2013) - contains a critical discussion on how far has Software Analytics got in the recent years. Includes a short overview of influential papers as well.
These meta research papers give an overview of existing papers and/or studies in the area of Software Analytics.
- Tamer Mohamed Abdellatif, Luiz Fernando Capretz, Danny Ho: Software Analytics to Software Practice - A Systematic Literature Review (2015) - looks at past papers that applied Software Analytics in practice.
- João Caldeiraa, Fernando Brito e Abreua, Jorge Cardosob, Toacy Oliveira: Software Development Analytics in Practice - A Systematic Literature Review (2020) - provides an aggregate view of Software Development Analytics studies from 2010 to 2019.
Courses from the academic world that lecture Software Analytics in-depth.
- Canadian Summer School on Practical Analyses of Software Engineering Data (2011) - a collection of talks about early adoptions of Software Analytics in practice. February 2022: unfortunately, this site is gone now. Link leads to archive.org.
- Prof. Dr. Jürgen Döllner: Automated Visual Software Analytics (2015) - Very detailed and precise explanations with focus on the visualization of software analyses.
- Christian Bird, Tim Menzies, Thomas Zimmermann: The Art and Science of Analyzing Software Data. Morgan Kaufmann (2015) - a comprehensive work by some Software Analytics luminaries providing a good fundament for Software Analytics.
- Tim Menzies, Laurie Williams, Thomas Zimmermann: Perspectives on Data Science for Software Engineering. Morgan Kaufmann (2016) - a collection of short articles in the area of Software Analytics. Good and neutral discussion of the advances in the field and the limits of data-driven approaches.
- Adam Tornhill: Software X-Ray. Pragmatic Programmers (2018) - a book full of great software analysis on real code bases.
- Greg Wilson: Using Data Science to Explore Software Development (2017) - discusses the application of Data Science onto software data to answer questions that arise in software development projects.
- Prof. Dr. Rainer Koschke: Software Analytics in komplexen Software-Projekten (2019) - introductory article on Software Analytics (written in German).
- Thomas Zimmermann: The productive software engineer (2019) - an interview with Thomas Zimmermann about his current work on software analysis at Microsoft.
Experience shared by people who applied Software Analytics in practice.
- Justine Gehring: Code Graveyards: Resurrecting Legacy Systems with OpenRewrite (2024) - introduction to automated code improvement with the deterministic OpenRewrite refactoring engine in the era of Large Language Models.
- Elmar Juergens: Mining Repository Data to Debug Software Development Teams (2016) - shows how version control systems can reveal communication problems in development teams.
- Stephan LaRocca: Dependency Analysis of Legacy Applications with Oracle Graph (2021 - demonstrates the usage of graph technology to analyze legacy systems to find bounded contexts, support effective testing, and identify the most important business processes within an application.
- Dirk Mahler: Yes We Scan! Software Analysis Using jQAssistant (2015) - software analysis of industry software projects using a graph database.
- Nicolas Mervaillie: Fix Your Microservice Architecture Using Graph Analysis (2019) - graph-based software analysis is used to analyze a distributed system.
- Margaret-Anne Storey: Lies, Damned Lies and Software Analytics (2015) - an overview of the history and goals of Software Analytics. I especially like the part where Margaret talks about the risks of Software Analytics like data quality issues, missing trustworthiness of results, and ethical concerns.
- Oliver Tigges, Philipp Haußleiter: Software Dependency Analysis with Graph Databases (2015) - software libraries dependency analysis by using a graph database
- Adam Tornhill: Prioritizing Technical Debt as if Time and Money Matters (2019) - a fresh look at version control data mining to uncover behavioral patterns of development organizations.
- Nicki Watt: Explore your Microservices Architecture with Graph Theory & Network Science (2020) - usage of graph-based algorithms to spot problems in large distrubuted software systems.
- Thomas Zimmermann: Software Productivity Decoded: How Data Science helps to Achieve More (2018) - talk with an introduction to Software Analytics and its application in Microsoft.
Sites that collect various resources around Software Analytics topics:
- The Softvis Collection: A collection of beautiful and useful software visualizations - an older website that provides some ideas for software visualization including real-world examples.
Activities that let you experience Software Analytics all by yourself.
- Matt Eland: Visualizing Code in Jupyter Notebooks with Pandas and Plotly (2022) - in this data analytics demo, Matt utilized Jupyter Notebook, Python, Plotly, and Pandas to analyze an open source game project. He also provides the dataset to get your own analysis started.
- Markus Harrer: Software Analytics Workshop (2019) - a repository with a guided tour through first software analysis using Data Science approaches und tools.
- Markus Harrer: Software Analytics Katas (2021) - small challenges designed to train analytical thinking and the use of data-driven software analysis techniques.
There are plenty of tools out there that can support answering your questions in a data-driven way. This section lists existing lists so that you can find your tool that fit your specific needs:
- Analysis-Tools.dev - static analysis tools for many programming languages, build tools, config files and more.
- Awesome Open Source - Profilers - tools for in-depth runtime analysis.
- OpenAPM - lists various Application Performance Management tools that can help you to find performance problems.
This section lists other awesome lists in the area of data analysis in software development.
- Awesome Empirical Software Engineering - a curated repository of software engineering repository mining data sets.
- Awesome Network Analysis - a curated list of awesome network analysis resources. If you want to dig deeper into graph-based analysis of software, you can find plenty of resources in this list.
- Awesome Machine Learning On Source Code - cool links & research papers related to Machine Learning applied to source code (MLonCode).
Did you like this list? Contributions are very welcome! Read the contribution guidelines first and add your ideas!