diff --git a/README.md b/README.md index c951a730d0..9e923f9386 100644 --- a/README.md +++ b/README.md @@ -4,35 +4,25 @@ ![Tensorflow](https://img.shields.io/badge/tensorflow-v2.9.0+-success.svg) [![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-cv/issues) -# Vision -A computer vision library dedicated for auto-driving, robotics and on device applications. # Mission -KerasCV is a layered repository consisting of core components and modeling components. +KerasCV is a library of modular computer vision oriented Keras components. +These components include models, layers, metrics, losses, callbacks, and utility +functions. -On the core components, it is made of modular building blocks (ops, functions, layers, metrics, losses, callbacks) that standardizes APIs for computer vision concepts such as data-augmentation pipeline, bounding boxes, keypoints, point clouds, feature pyramid network, etc, so applied computer vision engineers can leverage to quickly assemble production-grade, state-of-the-art -training and inference pipelines for common tasks such as image classification, object detection and segmentation, image data augmentation, etc. - -On the modeling components, it provides the most widely used models for each task such as ResNet family, MobileNet family, transformer-based models, anchor-based and anchor-free meta architectures, unet models, that are built on top of core components, highly composable and compatible with the Keras trainer (`model.fit`). It aims to provide pre-built models that are mixed-precision compatible, QAT compatible, and xla compilable during training, and generic model optimization tools for deployment on devices such as onboard GPUs, mobile, edge chips. - -KerasCV provides the following values for users: -- modular mid-level APIs and composable meta architectures -- mixed-precision and xla enabled components -- highly optimized, quantization aware training (QAT) enabled models, compatible between GPUs and TPUs. -- reproducible training results and leaderboard -- useful tools for evaluation, visualization and explanation. -- source for inference conversion (TFLite, edge devices, TensorRT, etc) and optimization at model level. +KerasCV's primary goal is to provide a coherent, elegant, and pleasant API to train state of the art computer vision models. +Users should be able to train state of the art models using only `Keras`, `KerasCV`, and TensorFlow core (i.e. `tf.data`) components. KerasCV can be understood as a horizontal extension of the Keras API: the components are new first-party -Keras objects (layers, metrics, etc) that are too specialized to be added to core Keras, but that receive -the same level of polish and backwards compatibility guarantees as the rest of the Keras API and that -are maintained by the Keras team itself. +Keras objects (layers, metrics, etc.) that are too specialized to be added to core Keras. They receive the same level of polish and backwards compatibility guarantees as the core Keras API, and they are maintained by the Keras team. -KerasCV's primary goal is to provide a coherent, elegant, and pleasant API to train state of the art computer vision models. -Users should be able to train state of the art models using only `Keras`, `KerasCV`, and TensorFlow core (i.e. `tf.data`) components. +Our APIs assist in common computer vision tasks such as data-augmentation, classification, object detection, image generation, and more. +Applied computer vision engineers can leverage KerasCV to quickly assemble production-grade, state-of-the-art training and inference pipelines for all of these common tasks. + +In addition to API consistency, KerasCV components aim to be mixed-precision compatible, QAT compatible, XLA compilable, and TPU compatible. +We also aim to provide generic model optimization tools for deployment on devices such as onboard GPUs, mobile, and edge chips. -Different from Keras IO, this product focus on meta architectures and training scripts to help users reproduce result from open datasets. To learn more about the future project direction, please check the [roadmap](.github/ROADMAP.md). @@ -42,6 +32,61 @@ To learn more about the future project direction, please check the [roadmap](.gi - [Roadmap](.github/ROADMAP.md) - [API Design Guidelines](.github/API_DESIGN.md) +## Quickstart + +Create a preprocessing pipeline: + +```python +import keras_cv +import tensorflow as tf +from tensorflow import keras +import tensorflow_datasets as tfds + +augmenter = keras_cv.layers.Augmenter( + layers=[ + keras_cv.layers.RandomFlip(), + keras_cv.layers.RandAugment(value_range=(0, 255)), + keras_cv.layers.CutMix(), + keras_cv.layers.MixUp() + ] +) + +def augment_data(images, labels): + labels = tf.one_hot(labels, 3) + inputs = {"images": images, "labels": labels} + outputs = augmenter(inputs) + return outputs['images'], outputs['labels'] +``` + +Augment a `tf.data.Dataset`: + +```python +dataset = tfds.load('rock_paper_scissors', as_supervised=True, split='train') +dataset = dataset.batch(64) +dataset = dataset.map(augment_data, num_parallel_calls=tf.data.AUTOTUNE) +``` + +Create a model: + +```python +densenet = keras_cv.models.DenseNet121( + include_rescaling=True, + include_top=True, + classes=3 +) +densenet.compile( + loss='categorical_crossentropy', + optimizer='adam', + metrics=['accuracy'] +) +``` + +Train your model: + +```python +densenet.fit(dataset) +``` + ## Contributors If you'd like to contribute, please see our [contributing guide](.github/CONTRIBUTING.md). @@ -52,7 +97,7 @@ but also for active development for feature delivery. To achieve this, here is t process for how to contribute to this repository: 1) Contributors are always welcome to help us fix an issue, add tests, better documentation. -2) If contributors would like to create a backbone, we usually require a pre-trained weight +2) If contributors would like to create a backbone, we usually require a pre-trained weight set with the model for one dataset as the first PR, and a training script as a follow-up. The training script will preferrably help us reproduce the results claimed from paper. The backbone should be generic but the training script can contain paper specific parameters such as learning rate schedules and weight decays. The training script will be used to produce leaderboard results. Exceptions apply to large transformer-based models which are difficult to train. If this is the case, contributors should let us know so the team can help in training the model or providing GCP resources. @@ -67,14 +112,27 @@ Thank you to all of our wonderful contributors! ## Pretrained Weights -Many models in KerasCV come with pre-trained weights. With the exception of StableDiffusion, -all of these weights are trained using Keras and KerasCV components and training scripts in this -repository. Models may not be trained with the same parameters or preprocessing pipeline -described in their original papers. Performance metrics for pre-trained weights can be found -in the training history for each task. For example, see ImageNet classification training -history for backbone models [here](examples/training/classification/imagenet/training_history.json). -All results are reproducible using the training scripts in this repository. Pre-trained weights -operate on images that have been rescaled using a simple `1/255` rescaling layer. +Many models in KerasCV come with pre-trained weights. +With the exception of StableDiffusion and the standard Vision Transformer, all of these weights are trained using Keras and +KerasCV components and training scripts in this repository. +While some models are not trained with the same parameters or preprocessing pipeline +as defined in their original publications, the KerasCV team ensures strong numerical performance. +Performance metrics for the provided pre-trained weights can be found +in the training history for each documented task. +An example of this can be found in the ImageNet classification training +[history for backbone models](examples/training/classification/imagenet/training_history.json). +All results are reproducible using the training scripts in this repository. + +Historically, many models have been trained on image datasets rescaled via manually +crafted normalization schemes. +The most common variant of manually crafted normalization scheme is subtraction of the +imagenet mean pixel followed by standard deviation normalization based on the imagenet +pixel standard deviation. +This scheme is an artifact of the days of manual feature engineering, but is no longer +required to score state of the art scores using modern deep learning architectures. +Due to this, KerasCV is standardized to operate on images that have been rescaled using +a simple `1/255` rescaling layer. +This can be seen in all KerasCV training pipelines and code examples. ## Custom Ops Note that in some the 3D Object Detection layers, custom TF ops are used. The @@ -85,8 +143,9 @@ If you'd like to use these custom ops, you can install from source using the instructions below. ### Installing KerasCV with Custom Ops from Source -Installing from source requires the [Bazel](https://bazel.build/) build system -(version >= 5.4.0). + +Installing custom ops from source requires the [Bazel](https://bazel.build/) build +system (version >= 5.4.0). Steps to install Bazel can be [found here](https://github.com/keras-team/keras/blob/v2.11.0/.devcontainer/Dockerfile#L21-L23). ``` git clone https://github.com/keras-team/keras-cv.git @@ -111,8 +170,9 @@ and Windows. KerasCV provides access to pre-trained models via the `keras_cv.models` API. These pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. -The following underlying models are provided by third parties, and subject to separate licenses: -StableDiffusion +The following underlying models are provided by third parties, and are subject to separate +licenses: +StableDiffusion, Vision Transfomer ## Citing KerasCV