Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage DataPerf? #99

Open
deanwampler opened this issue Feb 5, 2025 · 1 comment
Open

Leverage DataPerf? #99

deanwampler opened this issue Feb 5, 2025 · 1 comment
Labels
data pipelines Defining and implementing data processing pipelines

Comments

@deanwampler
Copy link
Contributor

DataPerf is an MLCommons benchmark system for measuring dataset quality for particular purposes, analogous to model benchmarks. How should we leverage it or even contribute to it as part of our efforts to create trusted datasets?

See also this TSEI task #40

@deanwampler deanwampler added the data pipelines Defining and implementing data processing pipelines label Feb 5, 2025
@deanwampler deanwampler moved this to Todo in FA5: OTDI Tasks Feb 5, 2025
@blublinsky
Copy link
Contributor

This is an interesting idea, but the current approach seems to be a bit limited. It seems to only get the best sellers, assuming that they all provide similar data. I think this is not exactly what we are looking for. In my mind, we are more interested in the data quality evaluation, that they do not seem to have in their project. So using their project as is does not seem feasible.

The option to consider there is to try to join forces with them to tackle data quality challenges, assuming that this is something that they are interested in doing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data pipelines Defining and implementing data processing pipelines
Projects
Status: Todo
Development

No branches or pull requests

2 participants