This repo is for Fall2024 Comp5331 group 6 project: Resilient k-Clustering.
The URL of the GitHub repository is at: https://github.com/yoannalhc/comp5331-grp6
- Download the repository to a local environment.
- Download the datasets and place them in the correct folder. (Refer to Datasets)
- Install the dependency using
pip install -r requirements.txtwith requirements.txt and Python 3.9. - Run COMP5331_Project.ipynb in order (skip the section
Preprocess datasetsif processed datasets are downloaded).
Raw datasets are downloaded from:
- BIRCH, HIGH-DIM(low): https://cs.joensuu.fi/sipu/datasets/
- Uber: https://www.kaggle.com/datasets/fivethirtyeight/uber-pickups-in-new-york-city
- Brightkite, Gowalla: https://snap.stanford.edu/data/index.html#locnet
Process the raw datasets by following the section Preprocess datasets in COMP5331_Project.ipynb or download the processed datasets.
-
Processed datasets can be found at: https://hkustconnect-my.sharepoint.com/:f:/g/personal/hcloaf_connect_ust_hk/Ene4W-vYgMZIvV-Si5BI1HIBGO4pm0OZMmArLiOTuj3upA?e=DW9ZeS
-
Download them and put them into
./dataset
project
📂dataset
└───📂birch
│ │ 📜shrink_birch1_epsilon.csv
│ │ 📜shrink_birch2_epsilon.csv
│ │ 📜shrink_birch3_epsilon.csv
└───📂high_dim
│ │ 📜dim032_epsilon.csv
│ │ 📜dim064_epsilon.csv
│ │ 📜dim128_epsilon.csv
└───📂snap_standford
│ │ 📜Brightkite_epsilon.csv
│ │ 📜Gowalla_epsilon.csv
└───📂uber
│ 📜uber_epsilon.csv
COMP5331_Project.ipynb: The entrance of the program, use it to runsrc/datasets.py: Contain dataset classessrc/resilient_k.py: Contain all the resilient algorithm-related classessrc/plot_helper.py: Contain function to plot the datasrc/evaluation.py: Contain all the evaluation-related classessrc/preprocess/helper.py: Contain helper function to process datasetsrc/preprocess/process_birch.py: Contain function to process the Birch datasetssrc/preprocess/process_geo.py: Contain function to process the geographic datasetssrc/preprocess/process_high_dim.py: Contain function to process the high dimensional datasetssrc/preprocess/process_uber.py: Contain the function to process the Uber dataset
See demo in COMP5331_Project.ipynb
We use Python 3.9 on Windows OS as the environment in our project.