Skip to content

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

Notifications You must be signed in to change notification settings

marcoshsq/The_Self-taught_Data_Scientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 

Repository files navigation

Developer Roadmap

Data Science and Analytics Self-Taught Program


๐Ÿ“Œ Summary


๐Ÿง  About

This Self-Taught Data Science Curriculum is a structured roadmap that I created to guide myself in learning this field independently and for free. My motivation for developing this material came from my desire to deepen my knowledge in data science and analytics, making the most of the available online resources.

The program covers everything from fundamentals to advanced topics, including programming, mathematics, statistics, machine learning, deep learning, and big data. To achieve this, I selected high-quality courses and learning materials that are freely accessible.

If you also want to learn data science on your own or expand your knowledge in the field, this roadmap can serve as a solid foundation for your journey.


๐ŸŽฏ Learning Goals

1๏ธโƒฃ Programming for Data Science

  • Python: Data manipulation, visualization, and machine learning.
  • R: Statistical modeling and advanced data analysis.

2๏ธโƒฃ Mathematics & Statistics for Data Science

  • Linear Algebra, Calculus, Probability, and Inferential Statistics.
  • Bayesian Methods, Regression, and Machine Learning Theory.

3๏ธโƒฃ Databases, Data Warehousing, and Big Data

  • SQL and NoSQL Databases.
  • Data lakes and cloud computing solutions.
  • Big Data processing with Spark and Hadoop.

4๏ธโƒฃ Machine Learning & Deep Learning

  • Supervised and Unsupervised Learning.
  • Neural Networks and Natural Language Processing (NLP).
  • Reinforcement Learning and AI Ethics.

๐Ÿ“š Curriculum Overview

The curriculum is divided into well-structured sections, each covering essential areas of data science:

  1. Fundamentals - Basic concepts and data literacy (~40h).
  2. Mathematics & Statistics - Essential mathematical foundations (~90h).
  3. Programming - Python & R for data science (~215h).
  4. Data Mining - Extracting insights and patterns (~120h).
  5. Databases - SQL and database management (~80h).
  6. Big Data - Processing large-scale datasets (~85h).
  7. Machine Learning - Core ML concepts and models (~120h).
  8. Deep Learning - Advanced AI techniques (~125h).
  9. Data Warehousing - Data integration and storage (~300h).
  10. Cloud Computing - Cloud solutions for data science (~120h).

A detailed breakdown of each section, including recommended courses, can be found in the repository.


๐Ÿ“Œ How to Use This Curriculum

This roadmap is flexible and can be adapted based on your learning pace and background:

  • โœ… Follow it sequentially if you're starting from scratch.
  • โœ… Skip sections if you already have knowledge in a particular area.
  • โœ… Combine different resources, projects, and additional readings.

Each module contains curated courses with estimated effort and certification options when available.

Section 01 - Fundamentals (~40h)

In this first section, my goal is to establish a solid foundation in data science by understanding the role of data in decision-making, the fundamentals of the field, and the key tools used by professionals. Additionally, I aim to develop a clear understanding of what it means to be a data scientist, the essential skills required, and how to apply this knowledge in practice.

The main skills I want to acquire in this stage include:

  • โœ… Understanding what data is and how it can be used
  • โœ… Fundamental concepts of data science and its impact on various industries
  • โœ… Familiarity with essential tools for data analysis and manipulation

Courses

๐Ÿ“Œ Data โ€“ What It Is, What We Can Do With It (Johns Hopkins University)

This course provides a clear introduction to what data is, how it is generated, and how it can be used to answer questions and support decision-making. I chose this course to build a conceptual foundation before moving on to more complex techniques.

Skills developed:

  • Understanding the concept of data and its different forms
  • Practical applications of data usage in problem-solving
  • Introduction to data collection, organization, and interpretation
Course Offered by Effort Certificate, if applicable Status
Data โ€“ What It Is, What We Can Do With It Johns Hopkins University ~11h Certificate of Completion โœ“

๐Ÿ“Œ What is Data Science? (IBM Skills Network)

This course offers an overview of the field of data science, exploring the responsibilities of a data scientist, the stages of the data analysis process, and its applications. It helps to better understand the career and the importance of data science in the modern world.

Skills developed:

  • Understanding what data science is and its applications
  • Insights into the data science lifecycle
  • Knowledge of the key tools and technologies used in the field
Course Offered by Effort Certificate, if applicable Status
What is Data Science? IBM Skills Network ~11h Certificate of Completion โœ“

๐Ÿ“Œ The Data Scientist's Toolbox (Johns Hopkins University)

This course is essential for gaining familiarity with the fundamental tools used by data scientists. It introduces basic programming concepts, version control, and project organizationโ€”essential elements for working with data in a structured and efficient way.

Skills developed:

  • Introduction to R and RStudio
  • Basic concepts of Git and GitHub for version control
  • Insights into data science workflows
Course Offered by Effort Certificate, if applicable Status
The Data Scientist's Toolbox Johns Hopkins University ~18h Certificate of Completion โœ“

Section 02 - Mathematics and Statistics for Data Science (~90h)

Course Offered by Effort of Certificate, if applicable Status
Linear Algebra for Machine Learning and Data Science DeepLearning.AI ~34h -- --
Calculus for Machine Learning and Data Science DeepLearning.AI ~25h -- --
Probability and Statistics for Machine Learning and Data Science DeepLearning.AI ~33h -- --

Section 03 - Programming for Data Science

Section 03-A - Python Language for Data Analysis (~140h)

Course Offered by Effort of Certificate, if applicable Status
Introduction to Data Science in Python University of Michigan ~34h -- --
Applied Plotting, Charting & Data Representation in Python University of Michigan ~24h -- --
Applied Machine Learning in Python University of Michigan ~31h -- --
Applied Text Mining in Python University of Michigan ~25h -- --
Applied Social Network Analysis in Python University of Michigan ~26h -- --

Section 03-B - R Language for Statistical Analysis and Modeling (~75h)

Course Offered by Effort of Certificate, if applicable Status
R Programming Johns Hopkins University ~27h -- --
Advanced R Programming Johns Hopkins University ~18h -- --
Building R Packages Johns Hopkins University ~20 -- --
Building Data Visualization Tools Johns Hopkins University ~12h -- --
Mastering Software Development in R Johns Hopkins University ~3h -- --

Section 04 - Data Mining (~120h)

Course Offered by Effort Certificate, if applicable Status
Data Visualization University of Illinois Urbana-Champaign ~15h -- --
Text Retrieval and Search Engines University of Illinois Urbana-Champaign ~30h -- --
Text Mining and Analysis University of Illinois Urbana-Champaign ~33h -- --
Pattern Discovery in Data Mining University of Illinois Urbana-Champaign ~17h -- --
Cluster Analysis in Data Mining University of Illinois Urbana-Champaign ~16h -- --

Section 05 - Databases and SQL (~80h)

Course Offered by Effort Certificate, if applicable Status
Relational Database Design University of Colorado ~34h -- --
The Structured Query Language (SQL) University of Colorado ~26h -- --
Advanced Topics and Future Trends in Database Technologies University of Colorado ~16h -- --

Section 06 - Big Data (~85h)

Course Offered by Effort Certificate, if applicable Status
Introduction to Big Data University of California ~17h -- --
Big Data Modeling and Management Systems University of California ~13h -- --
Big Data Integration and Processing University of California ~17h -- --
Machine Learning with Big Data University of California ~23h -- --
Graph Analytics for Big Data University of California ~13h -- --

Section 07 - Machine Learning (~120h)

Course Offered by Effort Certificate, if applicable Status
Supervised Machine Learning: Regression and Classification DeepLearning.AI ~33h -- --
Advanced Machine Learning Algorithms DeepLearning.AI ~34h -- --
Unsupervised Learning, Recommenders, Reinforcement Learning DeepLearning.AI ~37h -- --

Section 08 - Deep Learning (~125h)

Course Offered by Effort Certificate, if applicable Status
Neural Networks and Deep Learning DeepLearning.AI ~24h -- --
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization DeepLearning.AI ~23h -- --
Structuring Machine Learning Projects DeepLearning.AI ~06h -- --
Convolutional Neural Networks DeepLearning.AI ~35h -- --
Sequence Models DeepLearning.AI ~37h -- --

Section 09 - Data Warehousing (~300h)

Course Offered by Effort Certificate, if applicable Status
Database Management Essentials Colorado Boulder ~122h -- --
Data Warehouse Concepts, Design, and Data Integration Colorado Boulder ~62h -- --
Relational Database Support for Data Warehouses Colorado Boulder ~71h -- --
Business Intelligence Concepts, Tools, and Applications Colorado Boulder ~21h -- --
Design and Build a Data Warehouse for Business Intelligence Implementation Colorado Boulder ~31h -- --

Section 10 - Cloud Computing (~120h)

Course Offered by Effort Certificate, if applicable Status
Cloud Concepts 1 University of Illinois Urbana-Champaign ~24h -- --
Cloud Concepts 2 University of Illinois Urbana-Champaign ~19h -- --
Cloud applications 1 University of Illinois Urbana-Champaign ~15h -- --
Cloud applications 2 University of Illinois Urbana-Champaign ~19h -- --
Cloud Networks University of Illinois Urbana-Champaign ~22h -- --
Cloud Computing Project University of Illinois Urbana-Champaign ~21h -- --

๐Ÿ“– Extra Bibliography

If you're looking for deeper insights, consider these additional resources:

Mathematics

Machine Learning & AI

Programming & Databases

These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.

๐Ÿ“ Notes and Clarifications

  • Course durations are approximate and based on platform estimates.
  • Some books were accessed through university partnerships, but if you don't have access... well, explore alternative ways. If possible, support authors by purchasing them.
  • The curriculum is continuously evolving as new resources become available.

๐Ÿ”— References

Sources used to structure this curriculum:


Developer Roadmap


About

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published