Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are needs to support hidden/restricted data. Investigate what we might do #70

Open
deanwampler opened this issue Dec 12, 2024 · 1 comment
Labels
dataset catalog All aspects of managing the catalog and its use dataset requirements All aspects of the specification for acceptable datasets.

Comments

@deanwampler
Copy link
Contributor

Two classes of data that shouldn't be open sourced:

  1. Benchmark data that people don't want to be "vacuumed" into training data sets.
  2. Data with export or other security restrictions.
  3. Private organizational data.

While our goal is to support open datasets, should we try to address these needs?

@deanwampler deanwampler moved this to Todo in FA5: OTDI Tasks Dec 12, 2024
@deanwampler deanwampler added dataset catalog All aspects of managing the catalog and its use dataset requirements All aspects of the specification for acceptable datasets. labels Dec 12, 2024
@blublinsky
Copy link
Contributor

Currently Hugging Face already supports the notion of gated datasets. Here is their definition:

To give more control over how datasets are used, the Hub allows datasets authors to enable access requests for their datasets. Users must agree to share their contact information (username and email address) with the datasets authors to access the datasets files when enabled. Datasets authors can configure this request with additional fields. A dataset with access requests enabled is called a gated dataset. Access requests are always granted to individual users rather than to entire organizations. A common use case of gated datasets is to provide access to early research datasets before the wider release.

Will this satisfy the requirement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset catalog All aspects of managing the catalog and its use dataset requirements All aspects of the specification for acceptable datasets.
Projects
Status: Todo
Development

No branches or pull requests

2 participants