Skip to content

Question about potential issues in DA-Code gold datasets #4

@LearningKeqi

Description

@LearningKeqi

Dear Yiming and Jianwen,

Thank you for releasing DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models and the accompanying datasets. This work is very helpful and intriguing. I’m currently evaluating models using the dataset downloaded from your project’s Hugging Face page, and I may have found some inconsistencies in the gold files.

data-wrangling-007: The task description asks to standardize the “value configuration,” mapping all DOHC variations to “DOHC.” However, the gold output still contains variations such as “DOHC with VIS” and “DOHC with VGT.”

data-wrangling-001: The task specifies removing records where total_gross == 0, but in the gold file the first row’s total_gross is 0.

Besides the above two cases, I’ve noticed similar "Error Gold" cases elsewhere as well. Could you please confirm whether the gold files currently on Hugging Face are correct? If there has been an update or if corrected gold outputs are available, I’d be grateful for a pointer. If I’ve misunderstood the intended rules, any clarification would also be very helpful.

Many thanks for your time and for the excellent work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions