This is a course project for IS 557 Applied Machine Learning: Team Project.
The dataset used for this project is based on the ClusterLLM (Zhang et al., 2023), which preprocessed the MTOP dataset (Li et al., 2021) by removing intents with only a few instances and keeping English-only data. Only the music domain has been selected for further experimentation which consists of 1341 samples and 24 intentions.
The src
folder contains four Python scripts:
Environment setup
- Download Ollama and install pre-trained LLMs:
# go to the command line and run the following commands
# to pull pre-trained LLMs to your local machine
ollama run llama3
ollama run mistral
-
Install brew
-
Install miniconda (restart your terminal afterwards)
brew install --cask miniconda
conda init zsh # (or conda init bash)
- Setup Python virtual environment
conda env create -f environment.yml -n is577
conda activate is577
- Run scripts
# Pick the script
python src/1_baselines.py
References
Yuwei Zhang, Zihan Wang, and Jingbo Shang. 2023. ClusterLLM: Large Language Models as a Guide for Text Clustering. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13903–13920, Singapore. Association for Computational Linguistics.
Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, and Yashar Mehdad. 2021. MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2950–2962, Online. Association for Computational Linguistics.