- About
- Learning Goals
- Curriculum Overview
- How to Use This Curriculum
- Extra Bibliography
- References
- Notes and Clarifications
This Self-Taught Data Science Curriculum is a structured roadmap that I created to guide myself in learning this field independently and for free. My motivation for developing this material came from my desire to deepen my knowledge in data science and analytics, making the most of the available online resources.
The program covers everything from fundamentals to advanced topics, including programming, mathematics, statistics, machine learning, deep learning, and big data. To achieve this, I selected high-quality courses and learning materials that are freely accessible.
If you also want to learn data science on your own or expand your knowledge in the field, this roadmap can serve as a solid foundation for your journey.
- Python: Data manipulation, visualization, and machine learning.
- R: Statistical modeling and advanced data analysis.
- Linear Algebra, Calculus, Probability, and Inferential Statistics.
- Bayesian Methods, Regression, and Machine Learning Theory.
- SQL and NoSQL Databases.
- Data lakes and cloud computing solutions.
- Big Data processing with Spark and Hadoop.
- Supervised and Unsupervised Learning.
- Neural Networks and Natural Language Processing (NLP).
- Reinforcement Learning and AI Ethics.
The curriculum is divided into well-structured sections, each covering essential areas of data science:
- Fundamentals - Basic concepts and data literacy (~40h).
- Mathematics & Statistics - Essential mathematical foundations (~90h).
- Programming - Python & R for data science (~215h).
- Data Mining - Extracting insights and patterns (~120h).
- Databases - SQL and database management (~80h).
- Big Data - Processing large-scale datasets (~85h).
- Machine Learning - Core ML concepts and models (~120h).
- Deep Learning - Advanced AI techniques (~125h).
- Data Warehousing - Data integration and storage (~300h).
- Cloud Computing - Cloud solutions for data science (~120h).
A detailed breakdown of each section, including recommended courses, can be found in the repository.
This roadmap is flexible and can be adapted based on your learning pace and background:
- โ Follow it sequentially if you're starting from scratch.
- โ Skip sections if you already have knowledge in a particular area.
- โ Combine different resources, projects, and additional readings.
Each module contains curated courses with estimated effort and certification options when available.
In this first section, my goal is to establish a solid foundation in data science by understanding the role of data in decision-making, the fundamentals of the field, and the key tools used by professionals. Additionally, I aim to develop a clear understanding of what it means to be a data scientist, the essential skills required, and how to apply this knowledge in practice.
The main skills I want to acquire in this stage include:
- โ Understanding what data is and how it can be used
- โ Fundamental concepts of data science and its impact on various industries
- โ Familiarity with essential tools for data analysis and manipulation
This course provides a clear introduction to what data is, how it is generated, and how it can be used to answer questions and support decision-making. I chose this course to build a conceptual foundation before moving on to more complex techniques.
Skills developed:
- Understanding the concept of data and its different forms
- Practical applications of data usage in problem-solving
- Introduction to data collection, organization, and interpretation
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data โ What It Is, What We Can Do With It | Johns Hopkins University | ~11h | Certificate of Completion | โ |
This course offers an overview of the field of data science, exploring the responsibilities of a data scientist, the stages of the data analysis process, and its applications. It helps to better understand the career and the importance of data science in the modern world.
Skills developed:
- Understanding what data science is and its applications
- Insights into the data science lifecycle
- Knowledge of the key tools and technologies used in the field
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
What is Data Science? | IBM Skills Network | ~11h | Certificate of Completion | โ |
This course is essential for gaining familiarity with the fundamental tools used by data scientists. It introduces basic programming concepts, version control, and project organizationโessential elements for working with data in a structured and efficient way.
Skills developed:
- Introduction to R and RStudio
- Basic concepts of Git and GitHub for version control
- Insights into data science workflows
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
The Data Scientist's Toolbox | Johns Hopkins University | ~18h | Certificate of Completion | โ |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Linear Algebra for Machine Learning and Data Science | DeepLearning.AI | ~34h | -- | -- |
Calculus for Machine Learning and Data Science | DeepLearning.AI | ~25h | -- | -- |
Probability and Statistics for Machine Learning and Data Science | DeepLearning.AI | ~33h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Data Science in Python | University of Michigan | ~34h | -- | -- |
Applied Plotting, Charting & Data Representation in Python | University of Michigan | ~24h | -- | -- |
Applied Machine Learning in Python | University of Michigan | ~31h | -- | -- |
Applied Text Mining in Python | University of Michigan | ~25h | -- | -- |
Applied Social Network Analysis in Python | University of Michigan | ~26h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
R Programming | Johns Hopkins University | ~27h | -- | -- |
Advanced R Programming | Johns Hopkins University | ~18h | -- | -- |
Building R Packages | Johns Hopkins University | ~20 | -- | -- |
Building Data Visualization Tools | Johns Hopkins University | ~12h | -- | -- |
Mastering Software Development in R | Johns Hopkins University | ~3h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data Visualization | University of Illinois Urbana-Champaign | ~15h | -- | -- |
Text Retrieval and Search Engines | University of Illinois Urbana-Champaign | ~30h | -- | -- |
Text Mining and Analysis | University of Illinois Urbana-Champaign | ~33h | -- | -- |
Pattern Discovery in Data Mining | University of Illinois Urbana-Champaign | ~17h | -- | -- |
Cluster Analysis in Data Mining | University of Illinois Urbana-Champaign | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Relational Database Design | University of Colorado | ~34h | -- | -- |
The Structured Query Language (SQL) | University of Colorado | ~26h | -- | -- |
Advanced Topics and Future Trends in Database Technologies | University of Colorado | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Big Data | University of California | ~17h | -- | -- |
Big Data Modeling and Management Systems | University of California | ~13h | -- | -- |
Big Data Integration and Processing | University of California | ~17h | -- | -- |
Machine Learning with Big Data | University of California | ~23h | -- | -- |
Graph Analytics for Big Data | University of California | ~13h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Supervised Machine Learning: Regression and Classification | DeepLearning.AI | ~33h | -- | -- |
Advanced Machine Learning Algorithms | DeepLearning.AI | ~34h | -- | -- |
Unsupervised Learning, Recommenders, Reinforcement Learning | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Neural Networks and Deep Learning | DeepLearning.AI | ~24h | -- | -- |
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | DeepLearning.AI | ~23h | -- | -- |
Structuring Machine Learning Projects | DeepLearning.AI | ~06h | -- | -- |
Convolutional Neural Networks | DeepLearning.AI | ~35h | -- | -- |
Sequence Models | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Database Management Essentials | Colorado Boulder | ~122h | -- | -- |
Data Warehouse Concepts, Design, and Data Integration | Colorado Boulder | ~62h | -- | -- |
Relational Database Support for Data Warehouses | Colorado Boulder | ~71h | -- | -- |
Business Intelligence Concepts, Tools, and Applications | Colorado Boulder | ~21h | -- | -- |
Design and Build a Data Warehouse for Business Intelligence Implementation | Colorado Boulder | ~31h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Cloud Concepts 1 | University of Illinois Urbana-Champaign | ~24h | -- | -- |
Cloud Concepts 2 | University of Illinois Urbana-Champaign | ~19h | -- | -- |
Cloud applications 1 | University of Illinois Urbana-Champaign | ~15h | -- | -- |
Cloud applications 2 | University of Illinois Urbana-Champaign | ~19h | -- | -- |
Cloud Networks | University of Illinois Urbana-Champaign | ~22h | -- | -- |
Cloud Computing Project | University of Illinois Urbana-Champaign | ~21h | -- | -- |
If you're looking for deeper insights, consider these additional resources:
- The Elements of Statistical Learning - Hastie, Tibshirani, Friedman.
- Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani.
- Bayesian Statistics - Peter M. Lee.
- Artificial Intelligence: A Modern Approach - Stuart Russell.
- Deep Learning Papers Reading Roadmap - Collection of AI research papers.
- SQL for Smarties - Joe Celko.
- The Missing Semester of Your CS Education - MIT.
These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.
- Course durations are approximate and based on platform estimates.
- Some books were accessed through university partnerships, but if you don't have access... well, explore alternative ways. If possible, support authors by purchasing them.
- The curriculum is continuously evolving as new resources become available.
Sources used to structure this curriculum:
- OSSU Data Science - Open-source university model.
- AI Expert Roadmap - AI & Data Science roadmap.
- Roadmap SH - Learning paths for various tech disciplines.
- USP Statistics Course - Inspiration for course selection.