Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide the ability to make agent running on Server side #53

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -282,4 +282,9 @@ src/win-arena-container/client/.vscode/launch.json
# src/win-arena-container/client/evaluation_examples_windows/examples/chrome/12086550-11c0-466b-b367-1d9e75b3910e-wos.json
# src/win-arena-container/client/evaluation_examples_windows/examples/chrome/6c4c23a1-42a4-43cc-9db1-2f86ff3738cc-wos.json
# src/win-arena-container/client/evaluation_examples_windows/examples/chrome/7f52cab9-535c-4835-ac8c-391ee64dc930-wos.json
# src/win-arena-container/client/evaluation_examples_windows/examples/chrome/cabb3bae-cccb-41bd-9f5d-0f3a9fecd825-wos.json
# src/win-arena-container/client/evaluation_examples_windows/examples/chrome/cabb3bae-cccb-41bd-9f5d-0f3a9fecd825-wos.json

# Ignore the files when preparing agents.
src/win-arena-container/vm/setup/agents.json
src/win-arena-container/vm/setup/mm_agents/*
src/win-arena-container/vm/setup/Logs.txt
12 changes: 12 additions & 0 deletions AgentRepoConfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"repositories": [
{
"name": "UFO",
"url": "https://github.com/PaulJiangMS/UFO",
"runningmode": "server",
"setupscript": "windows_arena/setup.ps1",
"startuptype": "powershell",
"startuppoint": "windows_arena/startup.ps1"
}
]
}
40 changes: 38 additions & 2 deletions docs/Develop-Agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,11 @@

Want to test your own agents in Windows Agent Arena? You can use our default agent as a template and create your own folder under `src/win-arena-container/client/mm_agents`. You just need to ensure that your `agent.py` file includes the `predict()` and `reset()` functions.

## Steps to Create Your Custom Agent
# Steps to Create Your Custom Agent
The windows Agent Arena support two types of the agent, first is to only to have the prediction and utitlize the Desktop environment sdk to do the action exection. the second one is to
totally running on server mode.

## Client mode onboarding
### 1. Create a New Agent Folder

Navigate to the `mm_agents` directory:
Expand Down Expand Up @@ -105,7 +108,40 @@ execute_actions(actions) # Function to execute the predicted actions

Once your agent is ready, submit a Pull Request (PR) to the repository with your new agent folder and code changes. Ensure your code follows the project's guidelines and is well-documented.

## Important Considerations
## Server mode onboarding
### Define your environment script

Prepare a script to setup the windows for your agent. you can refer to [UFO setup](https://github.com/microsoft/UFO/blob/dev/waa/windows_arena/setup.ps1)

### Define the start up script to accept the WAA prompt

The script is to accept the WAA prompt to run your agent on windows, refer to [UFO startup](https://github.com/microsoft/UFO/blob/dev/waa/windows_arena/startup.ps1)

### Define your agent repo setting to easily clone the agent code base

Add json element for your agent as below in `AgentRepoConfig.json`

```json
{
"repositories": [
{
"name": "UFO",
"url": "https://github.com/PaulJiangMS/UFO",
"runningmode": "server",
"setupscript": "windows_arena/setup.ps1",
"startuptype": "powershell",
"startuppoint": "windows_arena/startup.ps1"
}
]
}
```

### Note for service mode

You need to finish the steps for server mode first, then following the steps to prepare windows image.


## Important Considerations

- **Observation Data**: The `obs` dictionary contains vital information like screenshots, window titles, and clipboard content. Use this data to inform your agent's decisions.
- **Action Format**: The list returned by `predict()` should contain executable actions or code blocks that the environment can interpret.
Expand Down
4 changes: 4 additions & 0 deletions scripts/build-container-image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ done

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

# Run prepare-agents.sh
echo "Running prepare-agents.sh..."
$SCRIPT_DIR/prepare-agents.sh

echo "$SCRIPT_DIR/../"

if [ "$build_base_image" = true ]; then
Expand Down
59 changes: 59 additions & 0 deletions scripts/clear-agents.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash

# Get the directory of the current script
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")

# Path to the JSON configuration file
CONFIG_FILE="$SCRIPT_DIR/../AgentRepoConfig.json"
AGENTS_JSON_FILE="$SCRIPT_DIR/../src/win-arena-container/vm/setup/agents.json"
CLIENT_AGENT_FOLDER="$SCRIPT_DIR/../src/win-arena-container/client/mm_agents"
SERVER_AGENT_FOLDER="$SCRIPT_DIR/../src/win-arena-container/vm/setup/mm_agents"
WIN_STORAGE="$SCRIPT_DIR/../src/win-arena-container/vm/storage"

# Remove the AGENTS_JSON_FILE if it exists
if [ -f "$AGENTS_JSON_FILE" ]; then
echo "Remove $AGENTS_JSON_FILE."
rm "$AGENTS_JSON_FILE"
fi

# Remove the WIN_STORAGE if it exists
if [ -d "$WIN_STORAGE" ]; then
echo "Remove $WIN_STORAGE."
sudo rm -r $WIN_STORAGE
fi

# Check if jq is installed
if ! command -v jq &> /dev/null
then
echo "jq could not be found, installing jq..."
sudo apt-get update && sudo apt-get install -y jq
fi

# Initialize an empty array to hold server repositories
server_repos=()

# Read the JSON file and clone the repositories
repos=$(jq -c '.repositories[]' "$CONFIG_FILE")
for repo in $repos; do
REPO_DIR_NAME=$(echo "$repo" | jq -r '.name')
RUNNING_MODE=$(echo "$repo" | jq -r '.runningmode')

# Set the target folder based on the running mode
if [ "$RUNNING_MODE" == "client" ]; then
TARGET_FOLDER="$CLIENT_AGENT_FOLDER"
elif [ "$RUNNING_MODE" == "server" ]; then
TARGET_FOLDER="$SERVER_AGENT_FOLDER"
server_repos+=("$repo")
else
echo "Invalid running mode: $RUNNING_MODE"
exit 1
fi

REPO_DIR="$TARGET_FOLDER/$REPO_DIR_NAME"

if [ -d "$REPO_DIR" ]; then
echo "Remove $REPO_DIR."
# Remove the repository directory
sudo rm -r $REPO_DIR
fi
done
59 changes: 59 additions & 0 deletions scripts/prepare-agents.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash

# Get the directory of the current script
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")

# Path to the JSON configuration file
CONFIG_FILE="$SCRIPT_DIR/../AgentRepoConfig.json"
CLIENT_AGENT_FOLDER="$SCRIPT_DIR/../src/win-arena-container/client/mm_agents"
SERVER_AGENT_FOLDER="$SCRIPT_DIR/../src/win-arena-container/vm/setup/mm_agents"
AGENTS_JSON_FILE="$SCRIPT_DIR/../src/win-arena-container/vm/setup/agents.json"

# Remove the AGENTS_JSON_FILE if it exists
if [ -f "$AGENTS_JSON_FILE" ]; then
rm "$AGENTS_JSON_FILE"
fi

# Check if jq is installed
if ! command -v jq &> /dev/null
then
echo "jq could not be found, installing jq..."
sudo apt-get update && sudo apt-get install -y jq
fi

# Initialize an empty array to hold server repositories
server_repos=()

# Read the JSON file and clone the repositories
repos=$(jq -c '.repositories[]' "$CONFIG_FILE")
for repo in $repos; do
REPO_URL=$(echo "$repo" | jq -r '.url')
REPO_DIR_NAME=$(echo "$repo" | jq -r '.name')
RUNNING_MODE=$(echo "$repo" | jq -r '.runningmode')

# Set the target folder based on the running mode
if [ "$RUNNING_MODE" == "client" ]; then
TARGET_FOLDER="$CLIENT_AGENT_FOLDER"
elif [ "$RUNNING_MODE" == "server" ]; then
TARGET_FOLDER="$SERVER_AGENT_FOLDER"
server_repos+=("$repo")
else
echo "Invalid running mode: $RUNNING_MODE"
exit 1
fi

REPO_DIR="$TARGET_FOLDER/$REPO_DIR_NAME"

if [ -d "$REPO_DIR" ]; then
echo "Directory $REPO_DIR already exists. Skipping clone."
else
echo "Cloning $REPO_URL into $REPO_DIR..."
git clone "$REPO_URL" "$REPO_DIR"
fi
done

# Print the server_repos array
printf '%s\n' "Repo is : ${server_repos[@]}"

# Create the agents.json file with the list of server repositories
jq -n --argjson repos "$(printf '%s\n' "${server_repos[@]}" | jq -s .)" '{"server_repositories": $repos}' > "$AGENTS_JSON_FILE"
16 changes: 15 additions & 1 deletion scripts/run-local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ model="gpt-4-vision-preview"
som_origin="oss"
a11y_backend="uia"
gpu_enabled=false
json_name="evaluation_examples_windows/test_all.json"
agent_settings=""

# Parse the command line arguments
while [[ $# -gt 0 ]]; do
Expand Down Expand Up @@ -110,6 +112,14 @@ while [[ $# -gt 0 ]]; do
mode=$2
shift 2
;;
--json-name)
json_name=$2
shift 2
;;
--agent-settings)
agent_settings=$2
shift 2
;;
--help)
echo "Usage: $0 [options]"
echo "Options:"
Expand All @@ -133,6 +143,8 @@ while [[ $# -gt 0 ]]; do
echo " --a11y-backend <a11y_backend>: The a11y accessibility backend to use (default: uia, available options are: uia, win32)"
echo " --gpu-enabled <true/false> : Enable GPU support (default: false)"
echo " --mode <dev/azure> : Mode (default: azure)"
echo " --json-name <name> The name of the JSON file to use (default: test_all.json)"
echo " --agent-settings <settings> The additional agent settings, which should be a json string."
exit 0
;;
*)
Expand Down Expand Up @@ -161,4 +173,6 @@ if [[ -z "$OPENAI_API_KEY" && (-z "$AZURE_API_KEY" || -z "$AZURE_ENDPOINT") ]];
log_error_exit "Either OPENAI_API_KEY must be set or both AZURE_API_KEY and AZURE_ENDPOINT must be set: $1"
fi

./run.sh --mode $mode --prepare-image $prepare_image --container-name $container_name --skip-build $skip_build --interactive $interactive --connect $connect --use-kvm $use_kvm --ram-size $ram_size --cpu-cores $cpu_cores --mount-vm-storage $mount_vm_storage --mount-client $mount_client --mount-server $mount_server --browser-port $browser_port --rdp-port $rdp_port --start-client $start_client --agent $agent --model $model --som-origin $som_origin --a11y-backend $a11y_backend --gpu-enabled $gpu_enabled --openai-api-key $OPENAI_API_KEY --azure-api-key $AZURE_API_KEY --azure-endpoint $AZURE_ENDPOINT
echo "$agent_settings"

./run.sh --mode $mode --prepare-image $prepare_image --container-name $container_name --skip-build $skip_build --interactive $interactive --connect $connect --use-kvm $use_kvm --ram-size $ram_size --cpu-cores $cpu_cores --mount-vm-storage $mount_vm_storage --mount-client $mount_client --mount-server $mount_server --browser-port $browser_port --rdp-port $rdp_port --start-client $start_client --agent $agent --model $model --som-origin $som_origin --a11y-backend $a11y_backend --gpu-enabled $gpu_enabled --openai-api-key $OPENAI_API_KEY --azure-api-key $AZURE_API_KEY --azure-endpoint $AZURE_ENDPOINT --json-name $json_name --agent-settings "$agent_settings"
18 changes: 17 additions & 1 deletion scripts/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ model="gpt-4-vision-preview"
som_origin="oss"
a11y_backend="uia"
gpu_enabled=false
agent_settings=""
json_name="evaluation_examples_windows/test_all.json"
OPENAI_API_KEY=""
AZURE_API_KEY=""
AZURE_ENDPOINT=""
Expand Down Expand Up @@ -109,6 +111,10 @@ while [[ $# -gt 0 ]]; do
gpu_enabled=$2
shift 2
;;
--agent-settings)
agent_settings=$2
shift 2
;;
--openai-api-key)
OPENAI_API_KEY="$2"
shift 2
Expand All @@ -125,6 +131,10 @@ while [[ $# -gt 0 ]]; do
mode=$2
shift 2
;;
--json-name)
json_name=$2
shift 2
;;
--help)
echo "Usage: $0 [options]"
echo "Options:"
Expand All @@ -151,6 +161,8 @@ while [[ $# -gt 0 ]]; do
echo " --azure-api-key <key> : The Azure OpenAI API key"
echo " --azure-endpoint <url> : The Azure OpenAI Endpoint"
echo " --mode <dev/azure> : Mode (default: azure)"
echo " --json-name <name> The name of the JSON file to use (default: test_all.json)"
echo " --agent-settings <settings> The additional agent settings, which should be a json string."
exit 0
;;
*)
Expand Down Expand Up @@ -201,6 +213,7 @@ echo "Using VM Setup Image path: $vm_setup_image_path"
echo "Using VM storage mount path: $vm_storage_mount_path"
echo "Using server mount path: $server_mount_path"
echo "Using client mount path: $client_mount_path"
echo "$agent_settings"

# Check if /dev/kvm exists
if [ ! -e /dev/kvm ]; then
Expand Down Expand Up @@ -301,8 +314,11 @@ invoke_docker_container() {
# Add the image name with tag
docker_command+=" $winarena_full_image_name:$winarena_image_tag"

# Escape double quotes
escaped_agent_settings=$(echo "$agent_settings" | sed 's/"/\\"/g')

# Set the entrypoint arguments
entrypoint_args=" -c './entry.sh --prepare-image $prepare_image --start-client $start_client --agent $agent --model $model --som-origin $som_origin --a11y-backend $a11y_backend'"
entrypoint_args=" -c './entry.sh --prepare-image $prepare_image --start-client $start_client --agent $agent --model $model --som-origin $som_origin --a11y-backend $a11y_backend --json-name $json_name --agent-settings \"$escaped_agent_settings\"'"
if [ "$interactive" = true ]; then
entrypoint_args=""
fi
Expand Down
4 changes: 3 additions & 1 deletion src/win-arena-container/Dockerfile-WinArena
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,10 @@ RUN if [ "${DEPLOY_MODE}" = "azure" ]; then \
WINDOWS_OEM_FOLDER='C:\\oem'; \
OEM_FOLDER='oem'; \
sed -i "s|${WINDOWS_DATA_FOLDER}|${WINDOWS_OEM_FOLDER}|g" "/${OEM_FOLDER}/install.bat"; \
sed -i "s|${WINDOWS_DATA_FOLDER}|${WINDOWS_OEM_FOLDER}|g" "/${OEM_FOLDER}/on-logon.ps1"; \
sed -i "s|${WINDOWS_DATA_FOLDER}|${WINDOWS_OEM_FOLDER}|g" "/${OEM_FOLDER}/setup.ps1"; \
sed -i "s|${WINDOWS_DATA_FOLDER}|${WINDOWS_OEM_FOLDER}|g" "/${OEM_FOLDER}/setupAgents.ps1"; \
sed -i "s|${WINDOWS_DATA_FOLDER}|${WINDOWS_OEM_FOLDER}|g" "/${OEM_FOLDER}/server/main.py"; \
sed -i "s|${WINDOWS_DATA_FOLDER}|${WINDOWS_OEM_FOLDER}|g" "/${OEM_FOLDER}/on-logon.vbs"; \
fi

# Copy client application
Expand Down
20 changes: 20 additions & 0 deletions src/win-arena-container/client/desktop_env/controllers/python.py
Original file line number Diff line number Diff line change
Expand Up @@ -713,3 +713,23 @@ def execute_shell_command(self, command):
except requests.exceptions.RequestException as e:
logger.error("An error occurred while trying to execute the command: %s", e)
return None

def run_agent(self, agent_name, instruction, agent_settings):
"""
Run the agent.
"""
# Prepare the data payload
payload = {
"agent": agent_name,
"instruction": instruction,
"agent_settings": agent_settings
}

response = requests.post(self.http_server + "/run_server_agent", json=payload)
if response.status_code == 200:
logger.info("Successed running agent: %s", agent_name)
logger.info("Agent response: %s", response.json())
return response.json()
else:
logger.error("Failed to run agent. Status code: %s", response)
return None
Loading