The source code will be available in a few weeks, and we greatly appreciate your patience and understanding.
This is the anonymous repository for submitting paper Efficient Multi-task LLM Quantization and Serving for Multiple LoRA Adapters.
The project's code base is organized in the following directory structure.
├── data
├── figures
├── lorainlaid
├── models
├── test
├── third_party
├── install.sh
├── LICENSE
└── README.mdTo use this repository, please follow the steps below to install the required dependencies.
- Python (version 3.10.13)
- pip (version 24.0)
- cuda (version 12.1)
-
Clone the repository
-
Install the modified gptq in third_party
-
Install LoRA-Inlaid
-
Install Acceleration lib for exllama
We have organized the installation process in install.sh, and users only need to bash install.sh to complete it
Before we could run, we must prepare the models and datasets we need.
models/download_model.py: download the models we need.
models/download_lora.py: download the lora adapters we need.
Users can download all the models and lora adapters use the following commands:
cd models
bash download.shIt should be noted that for llama models, users should use their own access_tokens.
data/download_data.py: download the datasets we need.
Users can download all the datasets use the following commands:
cd data
bash download.shOnce users have finished the Preparation for data and models, they can quikcly start use the following commands:
cd test/performance
bash launch_server_7b_lorainlaid.shAfter the server has been launched, run the following commands to send requests:
bash run_exp_test.shTo run the main tests in paper, please follow the steps below.
We pack all test in one script, users should following the commands below:
cd test/accuracy
bash run_all_test.shPerformance tests include the test for Throughput, Latency, JCT and SLO.
# launch lorainlaid-7b
cd test/performance
bash launch_server_7b_lorainlaid.sh
# launch lorainlaid-13b
cd test/performance
bash launch_server_13b_lorainlaid.sh
# launch slora-7b
cd test/performance
bash launch_server_7b_slora.sh
# launch slora-13b
cd test/performance
bash launch_server_13b_slora.sh
# launch vllm-7b
cd test/performance
bash launch_server_7b_vllm-1.sh
bash launch_server_7b_vllm-2.sh
bash launch_server_7b_vllm-4.sh
# launch vllm-13b
cd test/performance
bash launch_server_13b_vllm-1.sh
bash launch_server_13b_vllm-2.sh
bash launch_server_13b_vllm-4.shOnce you have launched one of these servers, you should send requests by:
bash run_exp.sh