Releases: tensorflow/datasets
Releases · tensorflow/datasets
v2.0.0
- This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
- The default versions of all datasets are now using the S3 slicing API. See the guide for details.
- The previous split API is still available, but is deprecated. If you wrote
DatasetBuilder
s outside the TFDS repository, please make sure they do not useexperiments={tfds.core.Experiment.S3: False}
. This will be removed in the next version, as well as thenum_shards
kwargs fromSplitGenerator
. - Several new datasets. Thanks to all the contributors!
- API changes and new features:
shuffle_files
defaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the newread_config
parameter intfds.load
.urls
kwargs renamedhomepage
inDatasetInfo
- Support for nested
tfds.features.Sequence
andtf.RaggedTensor
- Custom
FeatureConnector
s can override thedecode_batch_example
method for efficient decoding when wrapped inside atfds.features.Sequence(my_connector)
- Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
- Beam datasets can use a
tfds.core.BeamMetadataDict
to store additional metadata computed as part of the Beam pipeline. - Beam datasets'
_split_generators
accepts an additionalpipeline
kwargs to define a pipeline shared between all splits.
- Various other bug fixes and performance improvements. Thank you for all the reports and fixes!
v1.3.0
Bug fixes and performance improvements.
v1.2.0
Features
- Add
shuffle_files
argument totfds.load
function. The semantic is the same as inbuilder.as_dataset
function, which for now means that by default, files will be shuffled forTRAIN
split, and not for other splits. Default behaviour will change to always be False at next release. - Most datasets now support the new S3 API (documentation)
- Support for uint16 PNG images
Misc
- Crash while shuffling on Windows
- Various documentation improvements
New datasets
- AFLW2000-3D
- Amazon_US_Reviews
- binarized_mnist
- BinaryAlphaDigits
- Caltech Birds 2010
- Coil100
- DeepWeeds
- Food101
- MIT Scene Parse 150
- RockYou leaked password
- Stanford Dogs
- Stanford Online Products
- Visual Domain Decathlon
v1.1.0
Features
- Add
in_memory
option to cache small dataset in RAM. - Better sharding, shuffling and sub-split
- It is now possible to add arbitrary metadata to
tfds.core.DatasetInfo
which will be stored/restored with the dataset. Seetfds.core.Metadata
. - Better proxy support, possibility to add certificate
- Add
decoders
kwargs to override the default feature decoding
(guide).
New datasets
More datasets added:
- downsampled_imagenet
- patch_camelyon
- coco 2017 (with and without panoptic annotations)
- uc_merced
- trivia_qa
- super_glue
- so2sat
- snli
- resisc45
- pet_finder
- mnist_corrupted
- kitti
- eurosat
- definite_pronoun_resolution
- curated_breast_imaging_ddsm
- clevr
- bigearthnet
v1.0.2
- Add Apache Beam support
- Add direct GCS access for MNIST (with
tfds.load('mnist', try_gcs=True)
) - More datasets added
- Option to turn off tqdm bar (
tfds.disable_progress_bar()
) - Subsplit do not depends on the number of shard anymore (#292)
- Various bug fixes
Thanks to all external contributors for raising issues, their feedback and their pull request.