Large Language Models are Interpreters of Text Clustering Results

This is a course project for IS 557 Applied Machine Learning: Team Project.

The dataset used for this project is based on the ClusterLLM (Zhang et al., 2023), which preprocessed the MTOP dataset (Li et al., 2021) by removing intents with only a few instances and keeping English-only data. Only the music domain has been selected for further experimentation which consists of 1341 samples and 24 intentions.

The src folder contains four Python scripts:

Environment setup

Download Ollama and install pre-trained LLMs:

# go to the command line and run the following commands
# to pull pre-trained LLMs to your local machine
ollama run llama3
ollama run mistral

Install brew
Install miniconda (restart your terminal afterwards)

brew install --cask miniconda
conda init zsh # (or conda init bash)

Setup Python virtual environment

conda env create -f environment.yml -n is577
conda activate is577

Run scripts

# Pick the script
python src/1_baselines.py

References

Yuwei Zhang, Zihan Wang, and Jingbo Shang. 2023. ClusterLLM: Large Language Models as a Guide for Text Clustering. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore. Association for Computational Linguistics.

Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, and Yashar Mehdad. 2021. MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2950–2962, Online. Association for Computational Linguistics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Large Language Models are Interpreters of Text Clustering Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Meng6/text-clustering-interpreter

Folders and files

Latest commit

History

Repository files navigation

Large Language Models are Interpreters of Text Clustering Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages