Leverage DataPerf? #99

deanwampler · 2025-02-05T23:19:43Z

DataPerf is an MLCommons benchmark system for measuring dataset quality for particular purposes, analogous to model benchmarks. How should we leverage it or even contribute to it as part of our efforts to create trusted datasets?

See also this TSEI task #40

blublinsky · 2025-02-14T13:39:55Z

This is an interesting idea, but the current approach seems to be a bit limited. It seems to only get the best sellers, assuming that they all provide similar data. I think this is not exactly what we are looking for. In my mind, we are more interested in the data quality evaluation, that they do not seem to have in their project. So using their project as is does not seem feasible.

The option to consider there is to try to join forces with them to tackle data quality challenges, assuming that this is something that they are interested in doing

deanwampler added the data pipelines Defining and implementing data processing pipelines label Feb 5, 2025

deanwampler added this to FA5: OTDI Tasks Feb 5, 2025

deanwampler moved this to Todo in FA5: OTDI Tasks Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage DataPerf? #99

Leverage DataPerf? #99

deanwampler commented Feb 5, 2025

blublinsky commented Feb 14, 2025

Leverage DataPerf? #99

Leverage DataPerf? #99

Comments

deanwampler commented Feb 5, 2025

blublinsky commented Feb 14, 2025