Issues when running evaluation over gold standard

As a sanity test on evaluating agents on DA-Code I checked the evaluation of the gold standard against itself.  See https://github.com/michaelrglass/da-code/blob/gold-vs-gold/tests/self_eval_check.py

This revealed a few issues:

### Easily fixed

* In `calculate_list`, the empty list is not equal to itself.
https://github.com/michaelrglass/da-code/blob/270c59b36c7c961c82a4d4c73b83e2932cd52638/da_agent/evaluators/metrics/text.py#L20
This is a problem in `di-text-029` and `di-text-035`

* The file `gold/ml-cluster-003/cluster.csv` is not present, instead it is `clustering.csv`
However, renaming this file gives a total score of 0.005763917074549833. Like a number of other 'ml' instances this has a non-zero, non-perfect score. 

### Unknown failures

* data-wrangling-038 (maybe because it is a .db file?)

### ML score below 1

* ml-cluster-001
* ml-cluster-002
* ml-cluster-003
* ml-cluster-004
* ml-cluster-006
* ml-cluster-007
* ml-cluster-008
* ml-cluster-009
* ml-cluster-010
* ml-cluster-015
* ml-cluster-017
* ml-cluster-018
* ml-cluster-021
* ml-competition-003
* ml-competition-005
* ml-competition-007
* ml-competition-009
* ml-competition-010
* ml-competition-011
* ml-competition-013
* ml-competition-014
* ml-competition-015
* ml-competition-019


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues when running evaluation over gold standard #6

Easily fixed

Unknown failures

ML score below 1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issues when running evaluation over gold standard #6

Description

Easily fixed

Unknown failures

ML score below 1

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions