Skip to content

Issues when running evaluation over gold standard #6

@michaelrglass

Description

@michaelrglass

As a sanity test on evaluating agents on DA-Code I checked the evaluation of the gold standard against itself. See https://github.com/michaelrglass/da-code/blob/gold-vs-gold/tests/self_eval_check.py

This revealed a few issues:

Easily fixed

Unknown failures

  • data-wrangling-038 (maybe because it is a .db file?)

ML score below 1

  • ml-cluster-001
  • ml-cluster-002
  • ml-cluster-003
  • ml-cluster-004
  • ml-cluster-006
  • ml-cluster-007
  • ml-cluster-008
  • ml-cluster-009
  • ml-cluster-010
  • ml-cluster-015
  • ml-cluster-017
  • ml-cluster-018
  • ml-cluster-021
  • ml-competition-003
  • ml-competition-005
  • ml-competition-007
  • ml-competition-009
  • ml-competition-010
  • ml-competition-011
  • ml-competition-013
  • ml-competition-014
  • ml-competition-015
  • ml-competition-019

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions