This repository contains implementations of various data science algorithms based on a course that introduces both the theoretical foundations and practical applications of key concepts in the field. The course provides an in-depth exploration of different algorithms in data science, with a focus on clustering, regression, dimension reduction, and manifold learning.
This repository currently includes implementations of the K-Means Clustering algorithm and the Kernel Density Estimation (KDE) algorithm.
###K-Means Clustering
I wrote a K-Means clustering algorithm in Python to simplify an image by reducing it to a set of k dominant colors. Below are some examples:
I also used K-Means clustering algorithm to classify animal taxon.
###Kernel Density Estimation
I implemented a Kernel Density Estimation (KDE) algorithm for various kernels in OpenGL Shading Language on ShaderToy. Using KDE, I created a generative artwork:
Check out the Kernel Density Estimation shader on ShaderToy.
The course consists of 26 lectures covering a wide range of topics in data science algorithms. Below is an outline of the major topics:
-
Statistical Learning Theory:
- VC Dimension
- PAC Learning
-
Classification Algorithms (both linear and nonlinear):
- Support Vector Machines (SVM)
- K-Means Clustering
- Spectral Clustering
-
Regression and Density Estimation Algorithms:
- Kernel Ridge Regression
- Kernel Density Estimation
-
Dimension Reduction:
- Principal Component Analysis (PCA)
- Random Projection
- Johnson-Lindenstrauss Lemma
- Multidimensional Scaling
- Distance Matrices & Schoenberg Transform
-
Nonnegative Matrix Factorization (NMF)
-
Nonlinear Dimension Reduction and Manifold Learning:
- Background of Nonlinear Techniques
- Revisiting PCA
- Locally Linear Embedding (LLE)
- Diffusion Maps

