Welcome to the AI Alliance Open Trusted Data Initiative (OTDI).
A high quality, trusted, open catalog / distributed repository of datasets for AI LLM pre-training and domain-specific fine-tuning that is amenable to a wide variety of use cases in enterprises, governments, regulated industries, and wherever high trust in the data foundations of AI is essential.
See GITHUB_PAGES.md for information on viewing the site locally with jekyll
.
This repo will also be used for implementations, such as the planned catalog and data pipelines, until such time as it makes sense to split work into separate repos. Miscellaneous other documentation, not in the website, is also captured here:
tools-notes
- Notes on potential tool choices.data-processing-notes
- Notes on requirements and data-specific tool choices.code
- TBD
We welcome contributions as PRs. Please see our Alliance community repo for general information about contributing to any of our projects. This section provides some specific details you need to know.
In particular, see the AI Alliance CONTRIBUTING instructions. You will need to agree with the AI Alliance Code of Conduct.
All code contributions are licensed under the Apache 2.0 LICENSE (which is also in this repo, LICENSE.Apache-2.0).
All documentation contributions are licensed under the Creative Commons Attribution 4.0 International (which is also in this repo, LICENSE.CC-BY-4.0).
All data contributions are licensed under the Community Data License Agreement - Permissive - Version 2.0 (which is also in this repo, LICENSE.CDLA-2.0).
Warning
Before you make any git commits with changes, understand what's required for DCO.
See the Alliance contributing guide section on DCO for details. In practical terms, supporting this requirement means you must use the -s
flag with your git commit
commands.
The website is published using GitHub Pages, where the pages are written in Markdown and served using Jekyll. We use the Just the Docs Jekyll theme.
See GITHUB_PAGES.md for more information.
Note
As described above, all documentation is licensed under Creative Commons Attribution 4.0 International. See LICENSE.CDLA-2.0).