Question about potential issues in DA-Code gold datasets

Dear Yiming and Jianwen,

Thank you for releasing DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models and the accompanying datasets. This work is very helpful and intriguing. I’m currently evaluating models using the dataset downloaded from your project’s Hugging Face page, and **I may have found some inconsistencies in the gold files.**

**data-wrangling-007**: The task description asks to standardize the “value configuration,” mapping all DOHC variations to “DOHC.” However, the gold output still contains variations such as “DOHC with VIS” and “DOHC with VGT.”

**data-wrangling-001**: The task specifies removing records where total_gross == 0, but in the gold file the first row’s total_gross is 0.

**Besides the above two cases, I’ve noticed similar "Error Gold" cases elsewhere as well.** Could you please confirm whether the gold files currently on Hugging Face are correct? If there has been an update or if corrected gold outputs are available, I’d be grateful for a pointer. If I’ve misunderstood the intended rules, any clarification would also be very helpful.

Many thanks for your time and for the excellent work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about potential issues in DA-Code gold datasets #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about potential issues in DA-Code gold datasets #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions