Skip to content

Applying classroom knowledge from Foundations of Data Science class

Notifications You must be signed in to change notification settings

haydensiebers/DataScienceAlgorithmsClass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Algorithms: Implementation and Applications

This repository contains implementations of various data science algorithms based on a course that introduces both the theoretical foundations and practical applications of key concepts in the field. The course provides an in-depth exploration of different algorithms in data science, with a focus on clustering, regression, dimension reduction, and manifold learning.

Current Implementations:

This repository currently includes implementations of the K-Means Clustering algorithm and the Kernel Density Estimation (KDE) algorithm.

###K-Means Clustering

I wrote a K-Means clustering algorithm in Python to simplify an image by reducing it to a set of k dominant colors. Below are some examples:

alt text

I also used K-Means clustering algorithm to classify animal taxon.

###Kernel Density Estimation

I implemented a Kernel Density Estimation (KDE) algorithm for various kernels in OpenGL Shading Language on ShaderToy. Using KDE, I created a generative artwork:

alt text

Check out the Kernel Density Estimation shader on ShaderToy.

Course Overview

The course consists of 26 lectures covering a wide range of topics in data science algorithms. Below is an outline of the major topics:

  1. Statistical Learning Theory:

    • VC Dimension
    • PAC Learning
  2. Classification Algorithms (both linear and nonlinear):

    • Support Vector Machines (SVM)
    • K-Means Clustering
    • Spectral Clustering
  3. Regression and Density Estimation Algorithms:

    • Kernel Ridge Regression
    • Kernel Density Estimation
  4. Dimension Reduction:

    • Principal Component Analysis (PCA)
    • Random Projection
    • Johnson-Lindenstrauss Lemma
    • Multidimensional Scaling
    • Distance Matrices & Schoenberg Transform
  5. Nonnegative Matrix Factorization (NMF)

  6. Nonlinear Dimension Reduction and Manifold Learning:

    • Background of Nonlinear Techniques
    • Revisiting PCA
    • Locally Linear Embedding (LLE)
    • Diffusion Maps

About

Applying classroom knowledge from Foundations of Data Science class

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages