Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide if and how DPK will be used #109

Open
deanwampler opened this issue Mar 12, 2025 · 0 comments
Open

Decide if and how DPK will be used #109

deanwampler opened this issue Mar 12, 2025 · 0 comments
Labels
data pipelines Defining and implementing data processing pipelines

Comments

@deanwampler
Copy link
Contributor

Versus using lightweight, more ad-hoc alternatives. Some processing doesn't require DPK's scalability (like reading Croissant metadata across the HF datasets). What pipelines need DPK and when?

@deanwampler deanwampler added the data pipelines Defining and implementing data processing pipelines label Mar 12, 2025
@deanwampler deanwampler moved this to Todo in FA5: OTDI Tasks Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data pipelines Defining and implementing data processing pipelines
Projects
Status: Todo
Development

No branches or pull requests

1 participant