Create virtual environment, for example with conda:
conda create -n AdvAgent python=3.12.2
conda activate AdvAgent
Install dependencies:
pip install -r requirements.txt
Clone this repository:
git clone https://github.com/AI-secure/AdvAgent.git
Set up OpenAI API key and other keys to the environment:
(Our pipeline supports attacking various large language models such as GPT, Gemini, and Claude. Here, we take attacking GPT as an example.)
export OPENAI_API_KEY=<YOUR_KEY>
export HUGGING_FACE_HUB_TOKEN=<YOUR_KEY>
We conduct experiments on the Mind2Web dataset and test our approach against the state-of-the-art web agent framework, SeeAct.
Download the source data Multimodal-Mind2Web from Hugging Face and store it in the path data/Multimodal-Mind2Web/data/.
Download the Seeact Source Data and store it in the path data/seeact_source_data/.
Run the notebook data_generation.ipynb to filter data from the source dataset and construct the training set and test set.
Run training_data_generation.sh to test the quality of the data in the training set and construct datasets for SFT and DPO.
After completing the Data Generation section, your file structure should look like this:
├──task_demo_-1_aug
    ├──attack_dataset.json
    ├──subset_test_data_aug
    │   ├── train.json
    │   ├── test.json
    │   ├── augmented_dataset.json
    │   ├── predictions
    │   │   ├── prediction-4api-augment-data.jsonl
    │   │   ├── augmented_dataset_correct.json
    │   │   └── prediction-4api-augment-data-correct.jsonl
    │   └── imgs
    │       └── f5da4b14-026d-4a10-ab89-f5720418f2b4_9016ffb6-7468-4495-ad07-756ac9f2af03.jpg
    └── together
        └── data
            └── sft_train_data.jsonl
We fine-tune the model by calling Together AI's API. The basic training process is as follows (for more instructions, please refer to the Together AI docs):
Set up Together AI API key:
export TOGETHER_API_KEY=<YOUR_KEY>
Upload training dataset:
together files upload "xxx.jsonl"
Train the SFT model:
together fine-tuning create \
  --training-file "file-xxx" \
  --model "mistralai/Mistral-7B-Instruct-v0.2" \
  --lora \
  --batch-size 16
Download the SFT model:
together fine-tuning download "ft-xxx"
You can store the SFT model in the path data/task_demo_-1_aug/together/new_models/.
Run dpo_training.sh to train the DPO model.
Select the best training model based on the training curve, and run dpo_model_merge.sh to merge the model.
Run evaluation.sh to evaluate the SFT and DPO models.
If you find this code useful, please cite our paper:
