Skip to content

Commit c7f3174

Browse files
authored
Add DeepFinance sample (#124)
1 parent 679431b commit c7f3174

36 files changed

Lines changed: 5356 additions & 3 deletions

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,12 @@ logs/
6464

6565
# Agent-generated files
6666
**sessions_mount_dir/
67+
68+
**/checkpoints/*
69+
tuner/deep_finance/ori_data/**
70+
tuner/deep_finance/data/**
71+
tuner/deep_finance/scripts/**
72+
tuner/deep_finance/yaml/**
73+
tuner/deep_finance/yaml_template/**
74+
tuner/deep_finance/config/**
75+
tuner/deep_finance/trajectory/**

.pre-commit-config.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ repos:
77
- id: check-yaml
88
exclude: |
99
(?x)^(
10-
meta.yaml
10+
meta.yaml|
11+
tuner/deep_finance/config_template\.yaml
1112
)$
1213
- id: check-xml
1314
- id: check-toml
@@ -66,7 +67,7 @@ repos:
6667
| \.html$
6768
)
6869
args: [
69-
"--init-hook=import sys; sys.path.insert(0, 'alias/src')",
70+
"--init-hook=import sys; sys.path.insert(0, 'alias/src'); sys.path.insert(0, 'tuner/deep_finance')",
7071
--disable=W0511,
7172
--disable=W0718,
7273
--disable=W0122,

tuner/deep_finance/.env.example

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# API keys
2+
OPENAI_API_KEY="sk-xxx"
3+
OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
4+
RM_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
5+
RM_API_KEY="sk-xxx"
6+
OPENJUDGE_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
7+
OPENJUDGE_API_KEY="sk-xxx"
8+
STRONG_MODEL_API_KEY="sk-xxx"
9+
10+
SWANLAB_API_KEY="xxx"
11+
12+
# data path, save path
13+
ENV_SERVICE_ROOT="/path/to/env_service"
14+
CONDA_PATH="/path/to/conda/conda.sh"
15+
MODEL_PATH="/path/to/base_model"
16+
CKPT_SAVE_PATH="/path/to/ckpt_path"
17+
# 新增:数据文件路径配置
18+
TRAIN_DATA_PATH="/path/to/train_data"
19+
VAL_DATA_PATH="/path/to/val_data"
20+
21+
22+
TRAIN_REF_ANS_PATH="/path/to/train_reference_answer"
23+
VAL_REF_ANS_PATH="/path/to/val_reference_answer"
24+
25+
26+
# Port
27+
ADDR=""
28+
MCP_PORT=""

tuner/deep_finance/README.md

Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
# Training Financial Deep Research Agent with RL using AgentScope-Tuner
2+
3+
## Overview
4+
5+
DeepFinance is a reinforcement learning training framework for financial deep research agents. Instead of relying on human-annotated "gold answers", it drives the model to autonomously explore optimal research strategies through a **multi-dimensional reward system** (evidence traceability × analytical sufficiency × readability).
6+
7+
## Task Setting
8+
9+
### Agent Goal
10+
11+
Given a financial research question (stock analysis / industry research / event interpretation / macro analysis / stock screening), the agent must:
12+
- Call financial tools to collect real-world data
13+
- Generate a Markdown research report with academic-style citations
14+
- End the report with the `[TASK_COMPLETED]` marker
15+
16+
### Agent Type
17+
18+
The agent is implemented as a **ReActAgent**, following a two-phase deep research methodology (defined in `prompt/finance_analyst_prompt.md`):
19+
20+
**Phase 1: Outline First, Then Investigate**
21+
1. Identify the query type
22+
2. **Output a research outline first** (section headings + key questions per section) — no tool calls at this stage
23+
3. Investigate section by section, summarizing after each round of tool calls
24+
25+
**Phase 2: Deep Analysis and Report Generation**
26+
1. Generate a Markdown-format research report based on real data
27+
2. If evidence gaps are found during writing, allow 1–2 additional rounds of tool calls
28+
3. Append `[TASK_COMPLETED]` at the end of the report
29+
30+
> Why "plan first, then execute"? Letting the model freely explore in a complex tool environment typically leads not to "failing to call tools", but to "failing to form a complete research process" — the model grabs one piece of data and immediately starts local analysis, resulting in a loosely structured report. Requiring an outline first helps develop a stable research workflow and reduces ineffective exploration.
31+
32+
### Tool Environment
33+
34+
The agent communicates with the [Finance MCP](https://github.com/flowllm-ai/finance-mcp) service via MCP (Model Context Protocol), using **19 financial tools** (defined in `prompt/tool_prompt_builder.py`):
35+
- **Entity & Computation**: entity extraction, A-share historical price calculation
36+
- **General Capabilities**: DashScope search, Python/Shell code execution
37+
- **THS Specialized Data**: company fundamentals, shareholders, financials, earnings forecasts, news & announcements, institutional holdings, and 13 other specialized queries
38+
39+
**Tool Call Conventions:**
40+
- Up to **3 tools** per call, using multi-round progressive investigation
41+
- Summarize after each round of tool calls before deciding the next investigation direction
42+
43+
### Reward Design
44+
45+
The reward is split into **1 core objective + 3 constraints**:
46+
47+
| Role | Dimension | Code Module | Core Question |
48+
| :--- | :--- | :--- | :--- |
49+
| **Core** | Analytical Sufficiency (RM) | `judge/finance/` | Is the analysis thorough? Is the logic sound? |
50+
| Constraint | Presentation Quality | `judge/presentation_quality/` | Is information easy to access? Good reader experience? |
51+
| Constraint | Citation Grounding | `judge/grounding/` | Are key facts cited? Are citations real? |
52+
| Constraint | Citation Audit | `judge/audit/` | Do citations truly support the claims? |
53+
54+
**Scoring (Extract First, Then Score)**: The LLM first extracts structured information from the report (citations, evidence relationships, etc.), then Python rules compute the scores. For example, the Audit grader only requires the LLM to classify each citation as Supported / Overstated / Contradicted / Hallucinated / Irrelevant, and the final score is computed by rule-based code.
55+
56+
**Tool Call Penalty** (defined in `deep_finance_judge.py`):
57+
58+
| Tool Calls | Penalty |
59+
| :--- | :--- |
60+
| 0 calls | -1.0 |
61+
| 1–2 calls | -0.5 |
62+
| ≥ 3 calls | 0.0 (no penalty) |
63+
64+
**Default Weights** (configurable in `deepfinance_tuner.sh`):
65+
```bash
66+
RM_WEIGHT=0.5 # Analytical sufficiency (core objective)
67+
PRESENTATION_QUALITY_WEIGHT=0.2 # Presentation quality
68+
GROUNDING_WEIGHT=0.1 # Citation grounding
69+
AUDIT_WEIGHT=0.2 # Citation audit
70+
```
71+
72+
73+
## Code Implementation
74+
75+
### High-Level Overview
76+
77+
The implementation consists of three main components:
78+
1. **Workflow** (`run_deep_finance`): ReActAgent + Finance MCP tool interaction loop
79+
2. **Judge** (`deep_finance_judge`): Multi-dimensional evaluation engine, combining OpenJudge + rule-based scoring
80+
3. **Entry** (`main.py`): Calls `tune()` to launch training
81+
82+
### Agent Workflow
83+
84+
`run_deep_finance` implements the agent–tool interaction loop:
85+
86+
```python
87+
async def run_deep_finance(
88+
task: Dict[str, Any],
89+
model: OpenAIChatModel,
90+
auxiliary_models: Dict[str, OpenAIChatModel] | None = None,
91+
) -> WorkflowOutput:
92+
# 1. Extract system prompt and user query
93+
sys_prompt, user_query = _extract_sys_and_user(task)
94+
95+
# 2. Get Finance MCP toolkit (process-local singleton, lazily loaded)
96+
toolkit = await get_finance_mcp_toolkit()
97+
98+
# 3. Create ReActAgent
99+
agent = ReActAgent(
100+
name="deep_finance_react",
101+
sys_prompt=sys_prompt,
102+
model=model,
103+
enable_meta_tool=False,
104+
formatter=OpenAIChatFormatter(),
105+
toolkit=toolkit,
106+
)
107+
108+
# 4. Execute research task
109+
response = await agent.reply(msg=Msg("user", user_query, role="user"))
110+
111+
# 5. Extract tool call statistics
112+
tool_stats = await extract_tool_stats_from_agent(agent, total_time)
113+
metrics = compute_single_tool_metrics(tool_stats)
114+
115+
return WorkflowOutput(response=response_dict, metrics=metrics)
116+
```
117+
118+
**Key Features:**
119+
- MCP Toolkit is lazily loaded as a singleton per worker process, with built-in jitter to prevent thundering herd
120+
- System prompt is dynamically generated from `prompt/finance_analyst_prompt.md` (injecting current date and tool list)
121+
122+
### Judge Function
123+
124+
`deep_finance_judge` uses `DeepFinanceJudgeEngine` for multi-dimensional evaluation:
125+
126+
```python
127+
async def deep_finance_judge(
128+
task: Dict[str, Any],
129+
response: Any,
130+
auxiliary_models: Dict[str, ChatModelBase] | None = None,
131+
) -> JudgeOutput:
132+
engine = _get_judge_engine()
133+
reward, metrics = await engine.evaluate_one(task=task, response=response)
134+
return JudgeOutput(reward=reward, metrics=metrics)
135+
```
136+
137+
Evaluation flow:
138+
1. Build conversation history from response, convert to OpenJudge format
139+
2. Run multiple graders in parallel (Presentation Quality / Citation Grounding / Citation Audit)
140+
3. Run Finance RM (pairwise evaluation using a dedicated stronger model)
141+
4. Fuse scores + tool call penalty → final reward
142+
143+
### Launch Training with `tune()`
144+
145+
```python
146+
from agentscope.tuner import tune
147+
148+
tune(
149+
workflow_func=run_deep_finance,
150+
judge_func=deep_finance_judge,
151+
config_path="config_template.yaml",
152+
)
153+
```
154+
155+
For training configuration, refer to [config_template.yaml](./config_template.yaml). For full configuration details, see the [Trinity-RFT Configuration Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html).
156+
157+
## How to Run
158+
159+
### Dependencies
160+
161+
```bash
162+
# Recommended: use conda or uv to manage virtual environments
163+
conda create -n tune_example python=3.11
164+
conda activate tune_example
165+
166+
# Install core dependencies
167+
pip install agentscope vllm ray wandb
168+
169+
# Install OpenJudge
170+
git clone https://github.com/agentscope-ai/OpenJudge.git
171+
cd OpenJudge
172+
pip install -e .
173+
```
174+
175+
### Step 1: Install and Start Finance MCP Service
176+
177+
Finance MCP provides the financial tool suite (search, web crawling, THS data, etc.).
178+
179+
**Install:**
180+
```bash
181+
pip install finance-mcp
182+
```
183+
184+
**Start the service (SSE mode):**
185+
```bash
186+
finance-mcp \
187+
config=default,ths,crawl \
188+
disabled_flows='["tavily_search","mock_search","react_agent"]' \
189+
mcp.transport=sse \
190+
mcp.port=8040
191+
```
192+
193+
The service will be available at: `http://<server_IP>:8040/sse` (use `127.0.0.1` for local, replace with actual IP for remote access)
194+
195+
**Required API Keys (configure as needed in `.env`):**
196+
197+
| Variable | Purpose |
198+
|----------|---------|
199+
| `DASHSCOPE_API_KEY` | DashScope search |
200+
| `TUSHARE_API_TOKEN` | China A-share historical data |
201+
| `TAVILY_API_KEY` | Tavily search (optional) |
202+
203+
### Step 2: Configure Environment Variables
204+
205+
Copy `tuner/deep_finance/.env.example`, rename it to `.env`, and place it in the project root:
206+
207+
```bash
208+
# ==================== .env ====================
209+
# API keys (for Judge scoring and external tools)
210+
OPENJUDGE_API_KEY="sk-xxx"
211+
OPENJUDGE_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
212+
213+
# Base model and environment paths
214+
MODEL_PATH="/path/to/base_model"
215+
CONDA_PATH="/path/to/conda/conda.sh"
216+
CONDA_ENV="tune_example"
217+
218+
# Data and reference answer paths
219+
DATA_PATH="/path/to/train_data_dir"
220+
TRAIN_REF_ANS_PATH="/path/to/train_reference_answer.json"
221+
VAL_REF_ANS_PATH="/path/to/val_reference_answer.json"
222+
223+
# Cluster config (set WORLD_SIZE to 1 for single-machine)
224+
WORLD_SIZE=1
225+
MASTER_ADDR="127.0.0.1"
226+
227+
# Finance MCP service URL
228+
FINANCE_MCP_URL="http://127.0.0.1:8040/sse"
229+
```
230+
231+
### Step 3: Launch Training
232+
233+
No need to manually edit Python or YAML files. The launch script `deepfinance_tuner.sh` dynamically generates `config_template.yaml` and automatically starts the Ray cluster.
234+
235+
```bash
236+
bash deepfinance_tuner.sh
237+
```
238+
239+
**Key training parameters (configurable in `deepfinance_tuner.sh`):**
240+
241+
| Shell Parameter | Tuner Parameter | Default | Description |
242+
| :--- | :--- | :--- | :--- |
243+
| `GROUP_SIZE` | `repeat_times` | 4 | Parallel rollout samples per query |
244+
| `MAX_ENV_STEPS` | `max_env_steps` | 10 | Max agent-environment interaction rounds |
245+
| `BATCH_SIZE` | `batch_size` | 64 | Global batch size |
246+
| `OPENJUDGE_LLM` | `openjudge_llm` | qwen-flash | General model for OpenJudge scoring |
247+
| `FINANCE_JUDGE_LLM` | `finance_judge_llm` | qwen-max | Stronger model for financial analysis depth evaluation |
248+
| `ENGINE_NUM` | `engine_num` | Node // 2 | Number of vLLM async inference engines |
249+
| `GPU_PER_NODE` | `gpu_per_node` | 8 | GPUs per node |
250+
251+
## Code Structure
252+
253+
```
254+
deep_finance/
255+
├── main.py # Entry: defines workflow and judge functions
256+
├── deep_finance_judge.py # Judge engine: multi-grader fusion + reward computation
257+
├── config_template.yaml # Tuner config template (dynamically generated by shell script)
258+
├── deepfinance_tuner.sh # Multi-node distributed launch script
259+
├── deepfinance_tuner_single.sh # Single-machine launch script
260+
├── .env.example # Environment variable template
261+
├── judge/
262+
│ ├── finance/ # RM: domain-routed pairwise evaluation
263+
│ ├── presentation_quality/ # Presentation: 8-dimension rule-based scoring
264+
│ ├── grounding/ # Grounding: coverage + authenticity
265+
│ ├── audit/ # Audit: 5-level verdict classification
266+
│ └── traj_adapter.py # Trajectory format normalization
267+
├── metric_helper/
268+
│ ├── reward_metric_helper.py # Reward metrics aggregation
269+
│ └── tool_metric_helper.py # Tool call statistics
270+
└── prompt/
271+
├── finance_analyst_prompt.md # Agent system prompt (two-phase research flow)
272+
└── tool_prompt_builder.py # Tool documentation generator (19 financial tools)
273+
```

0 commit comments

Comments
 (0)