Skip to content

Commit 0296b99

Browse files
committed
Update usage instructions
1 parent 7e11196 commit 0296b99

File tree

3 files changed

+41
-25
lines changed

3 files changed

+41
-25
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ keys.cfg
44
**/output/**
55
**/eval_results/**
66
eval/logs/**
7+
*.h5
78

89

910
# -------

README.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ SciCode is a challenging benchmark designed to evaluate the capabilities of lang
1414
SciCode sources challenging and realistic research-level coding problems across 6 natural science disciplines, covering a total of 16 subfields. Scicode mainly focuses on 1. Numerical methods 2.Simulation of systems 3. Scientific calculation. These are the tasks we believe require intense scientific knowledge and reasoning to optimally test LM’s science capability.
1515

1616
## 🏆 Leaderboard
17+
1718
| Model | Subproblem | Main Problem |
1819
|---------------------------|------------|--------------|
1920
| Claude3.5-Sonnet | **26** | **4.6** |
@@ -27,8 +28,13 @@ SciCode sources challenging and realistic research-level coding problems across
2728
| Mixtral-8x22B-Instruct | 16.3 | 0 |
2829
| Llama-3-70B-Chat | 14.6 | 0 |
2930

31+
## Instructions to evaluate a new model
3032

31-
33+
1. Clone this repository `git clone [email protected]:scicode-bench/SciCode.git`
34+
2. Install the `scicode` package with `pip install -e .`
35+
3. Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save them as `./eval/data/test_data.h5`
36+
4. Run `eval/scripts/gencode_json.py` to generate new model outputs (see the [`eval/scripts` readme](eval/scripts/)) for more information
37+
5. Run `eval/scripts/test_generated_code.py` to evaluate the unittests
3238

3339
## Contact
3440
- Minyang Tian: [email protected]

eval/scripts/README.md

+33-24
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,33 @@
1-
- ## **Generate LLM code**
2-
3-
To run the script, go to the root of this repo and use the following command:
4-
5-
```bash
6-
python evaluation/scripts/gencode_json.py [options]
7-
```
8-
9-
### Command-Line Arguments
10-
- `--model` - Specifies the model name used for generating responses.
11-
- `--output-dir` - Directory to store the generated code outputs (Default: `evaluation/eval_results/generated_code`).
12-
- `--input-path` - Directory containing the JSON files describing the problems (Default: `evaluation/problem_json`).
13-
- `--prompt-dir` - Directory where prompt files are saved (Default: `evaluation/eval_results/prompt`).
14-
- `--temperature` - Controls the randomness of the generation (Default: 0).
15-
16-
- ## **Evaluate generated code**
17-
18-
Download `test_data.h5` at the path `evaluation/test_data.h5`.
19-
20-
To run the script, go to the root of this repo and use the following command:
21-
22-
```bash
23-
python evaluation/scripts/test_generated_code.py
24-
```
1+
## **Generate LLM code**
2+
3+
To run the script, go to the root of this repo and use the following command from the repository root:
4+
5+
```bash
6+
python evaluation/scripts/gencode_json.py [options]
7+
```
8+
9+
For example, to create model results with `gpt-4o` and the default settings, run
10+
11+
```bash
12+
python evaluation/scripts/gencode_json.py --model gpt-4o
13+
```
14+
15+
### Command-Line Arguments
16+
17+
- `--model` - Specifies the model name used for generating responses.
18+
- `--output-dir` - Directory to store the generated code outputs (Default: `eval_results/generated_code`).
19+
- `--input-path` - Directory containing the JSON files describing the problems (Default: `eval/data/problems_all.jsonl`).
20+
- `--prompt-dir` - Directory where prompt files are saved (Default: `eval_results/prompt`).
21+
- `--temperature` - Controls the randomness of the generation (Default: 0).
22+
23+
## **Evaluate generated code**
24+
25+
Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save them as `./eval/data/test_data.h5`
26+
27+
To run the script, go to the root of this repo and use the following command:
28+
29+
```bash
30+
python evaluation/scripts/test_generated_code.py
31+
```
32+
33+
Please edit the `test_generated_code.py` source file to specify your model name, results directory and problem set (if not `problems_all.jsonl`).

0 commit comments

Comments
 (0)