Skip to content

Releases: tensorflow/datasets

v2.0.0

24 Jan 20:02
Compare
Choose a tag to compare
  • This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
  • The default versions of all datasets are now using the S3 slicing API. See the guide for details.
  • The previous split API is still available, but is deprecated. If you wrote DatasetBuilders outside the TFDS repository, please make sure they do not use experiments={tfds.core.Experiment.S3: False}. This will be removed in the next version, as well as the num_shards kwargs from SplitGenerator.
  • Several new datasets. Thanks to all the contributors!
  • API changes and new features:
    • shuffle_files defaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the new read_config parameter in tfds.load.
    • urls kwargs renamed homepage in DatasetInfo
    • Support for nested tfds.features.Sequence and tf.RaggedTensor
    • Custom FeatureConnectors can override the decode_batch_example method for efficient decoding when wrapped inside a tfds.features.Sequence(my_connector)
    • Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
    • Beam datasets can use a tfds.core.BeamMetadataDict to store additional metadata computed as part of the Beam pipeline.
    • Beam datasets' _split_generators accepts an additional pipeline kwargs to define a pipeline shared between all splits.
  • Various other bug fixes and performance improvements. Thank you for all the reports and fixes!

v1.3.0

24 Oct 16:12
Compare
Choose a tag to compare

Bug fixes and performance improvements.

v1.2.0

20 Aug 08:26
Compare
Choose a tag to compare

Features

  • Add shuffle_files argument to tfds.load function. The semantic is the same as in builder.as_dataset function, which for now means that by default, files will be shuffled for TRAIN split, and not for other splits. Default behaviour will change to always be False at next release.
  • Most datasets now support the new S3 API (documentation)
  • Support for uint16 PNG images

Misc

  • Crash while shuffling on Windows
  • Various documentation improvements

New datasets

  • AFLW2000-3D
  • Amazon_US_Reviews
  • binarized_mnist
  • BinaryAlphaDigits
  • Caltech Birds 2010
  • Coil100
  • DeepWeeds
  • Food101
  • MIT Scene Parse 150
  • RockYou leaked password
  • Stanford Dogs
  • Stanford Online Products
  • Visual Domain Decathlon

v1.1.0

22 Jul 21:24
Compare
Choose a tag to compare

Features

  • Add in_memory option to cache small dataset in RAM.
  • Better sharding, shuffling and sub-split
  • It is now possible to add arbitrary metadata to tfds.core.DatasetInfo
    which will be stored/restored with the dataset. See tfds.core.Metadata.
  • Better proxy support, possibility to add certificate
  • Add decoders kwargs to override the default feature decoding
    (guide).

New datasets

More datasets added:

  • downsampled_imagenet
  • patch_camelyon
  • coco 2017 (with and without panoptic annotations)
  • uc_merced
  • trivia_qa
  • super_glue
  • so2sat
  • snli
  • resisc45
  • pet_finder
  • mnist_corrupted
  • kitti
  • eurosat
  • definite_pronoun_resolution
  • curated_breast_imaging_ddsm
  • clevr
  • bigearthnet

v1.0.2

01 May 20:26
Compare
Choose a tag to compare
  • Add Apache Beam support
  • Add direct GCS access for MNIST (with tfds.load('mnist', try_gcs=True))
  • More datasets added
  • Option to turn off tqdm bar (tfds.disable_progress_bar())
  • Subsplit do not depends on the number of shard anymore (#292)
  • Various bug fixes

Thanks to all external contributors for raising issues, their feedback and their pull request.

v1.0.1

15 Feb 21:42
Compare
Choose a tag to compare
  • Fixes bug #52 that was putting the process in Eager mode by default
  • New dataset celeb_a_hq

v1.0.0

14 Feb 22:58
Compare
Choose a tag to compare

Note that this release had a bug #52 that was putting the process in Eager mode.

tensorflow-datasets is ready-for-use! Please see our README and documentation linked there. We've got 25 datasets currently and are adding more. Please join in and add (or request) a dataset yourself.