You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data Prep Kit (DPK) is an open source data engineering framework released by IBM Research. It was implemented to support the development of IBM's open source Granite family of LLMs.
DPK offers three value propositions:
Workflows as the basis of data transformations: rather than relying on raw Python code to execute complex transforms and handle fault cases, DPK is build on Kubeflow Pipelines, providing an abstraction for higher value workflows.
Since it is based on Kubeflow + Ray, workflows developed on a laptop can be seamlessly scaled up to clusters consisting of hundreds of nodes.
Since DPK is open source, a community can collaborate on implementing workflows to handle difficult problems facing LLM data engineering, such as detecting hate speech, personally identifiable information, and licensing issues...all problems that will require a collaboration to address.
The OTDI steering committee consists of several large, experienced, production quality data producers. The purpose of this ticket is to see if the three value props listed above are pain points the large data producers are facing, and if so, discuss how they are addressing them currently.
The text was updated successfully, but these errors were encountered:
Data Prep Kit (DPK) is an open source data engineering framework released by IBM Research. It was implemented to support the development of IBM's open source Granite family of LLMs.
DPK offers three value propositions:
The OTDI steering committee consists of several large, experienced, production quality data producers. The purpose of this ticket is to see if the three value props listed above are pain points the large data producers are facing, and if so, discuss how they are addressing them currently.
The text was updated successfully, but these errors were encountered: