Skip to content

v0.6.0

Latest
Compare
Choose a tag to compare
@adamjstewart adamjstewart released this 01 Sep 10:12
7500ee2

TorchGeo 0.6.0 Release Notes

TorchGeo 0.6 adds 18 new datasets, 15 new datamodules, and 27 new pre-trained models, encompassing 11 months of hard work by 23 contributors from around the world.

Highlights of this release

Multimodal foundation models

Diagram of a unified multimodal Earth foundation model

There are thousands of Earth observation satellites orbiting the Earth at any given time. Historically, in order to use one of these satellites in a deep learning pipeline, you would first need to collect millions of manually-labeled images from this sensor in order to train a model. Self-supervised learning enabled label-free pre-training, but still required millions of diverse sensor-specific images, making it difficult to use newly launched or expensive commercial satellites.

TorchGeo 0.6 adds multiple new multimodal foundation models capable of being used with imagery from any satellite/sensor, even ones the model was not explicitly trained on. While GASSL and Scale-MAE only support RGB images, DOFA supports RGB, SAR, MSI, and HSI with any number of spectral bands. It uses a novel wavelength-based encoder to map the spectral wavelength of each band to a known range of wavelengths seen during training.

The following table describes the dynamic spatial (resolution), temporal (time span), and/or spectral (wavelength) support, either via their training data (implicit) or via their model architecture (explicit), offered by each of these models:

Model Spatial Temporal Spectral
DOFA implicit - explicit
GASSL implicit - -
Scale-MAE explicit - -

TorchGeo 0.6 also adds multiple new unimodal foundation models, including DeCUR and SatlasPretrain.

Source Cooperative migration

Migration from Radiant MLHub to Source Cooperative

TorchGeo contains a number of datasets from the recently defunct Radiant MLHub:

These datasets were recently migrated to Source Cooperative (and AWS in the case of SpaceNet), but with a completely different file format and directory structure. It took a lot of effort, but we have finally ported all of these datasets to the new download location and file hierarchy. As an added bonus, the new data loader code is significantly simpler, allowing us to remove 2.5K lines of code in the process!

OSGeo community project

OSGeo Community logo

TorchGeo is now officially a member of the OSGeo community! OSGeo is a not-for-profit foundation for open source geospatial software, providing financial, organizational, and legal support. We are in good company, with other OSGeo projects including GDAL, PROJ, GEOS, QGIS, and PostGIS. Membership in OSGeo promotes advertising of TorchGeo to the community, and also ensures that we follow best practices for the stability, health, and interoperability of the open source geospatial ecosystem.

All TorchGeo users are encouraged to join us on Slack, join our Hugging Face organization, and join us in OSGeo using any of the following badges in our README:

slack
huggingface
osgeo

Lightning Studios support

Lightning AI logo

TorchGeo has always had a close collaboration with Lightning AI, including active contributions to PyTorch Lightning and TorchMetrics. In this release, we added buttons allowing users to launch our tutorial notebooks in the new Lightning Studios platform. Lightning Studios is a more powerful version of Google Colab, with reproducible software and data environments allowing you to pick up where you left off, VS Code and terminal support, and the ability to quickly scale up to a large number of GPUs. All TorchGeo tutorials have been confirmed to work in both Lightning Studios and Google Colab, allowing users to get started with TorchGeo without having to invest in their own hardware.

Backwards-incompatible changes

  • All Radiant MLHub datasets have been ported to the Source Cooperative file hierarchy (#1830)
  • GeoDataset: the bbox sample key was renamed to bounds in order to support Kornia (#2199)
  • Chesapeake7 and Chesapeake13: datasets were removed when updating to the 2022 edition (#2214)
  • Benin Cashews and Rwanda Field Boundary: remove os.path.expanduser for consistency (#1705)
  • LEVIR-CD and OSCD: images key was split into image1 and image2 for change detection (#1684, #1696)
  • EuroSAT: B08A was renamed to B8A to match Sentinel-2 (#1646)

Dependencies

New (optional) dependencies

  • aws-cli: to download datasets from AWS (#2203)
  • azcopy: to download datasets from Azure (#2064)
  • prettier: for YAML file formatting (#2018)
  • ruff: for code style and documentation testing (#1994)

Removed (optional) dependencies

  • radiant-mlhub: website no longer exists (#1830)
  • rarfile: datasets rehosted as zip files (#2210)
  • zipfile-deflate: no longer needed for newer Chesapeake data (#2214)
  • black: replaced by ruff (#1994)
  • flake8: replaced by ruff (#1994)
  • isort: replaced by ruff (#1994)
  • pydocstyle: replaced by ruff (#1994)
  • pyupgrade: replaced by ruff (#1994)

Changes to existing dependencies

  • python: 3.10+ required following SPEC 0 (#1966)
  • fiona: 1.8.21+ required (#1966)
  • kornia: 0.7.3+ required (#1979, #2144)
  • lightly: 1.4.5+ required (#2196)
  • lightning: 2.3 not supported due to bug (#2155, #2211)
  • matplotlib: 3.5+ required (#1966)
  • numpy: 1.21.2+ required (#1966), numpy 2 support added (#2151)
  • pandas: 1.3.3+ required (#1966)
  • pillow: 3.3+ required (#1966), jpeg2000 support required (#2209)
  • pyproj: 3.3+ required (#1966)
  • rasterio: 1.3+ required (#1966)
  • shapely: 1.8+ required (#1966)
  • torch: 1.13+ required (#1358)
  • torchvision: 0.14+ required (#1358)
  • h5py: 3.6+ required (#1966)
  • opencv: 4.5.4+ required (#1966)
  • pycocotools: 2.0.7+ required (#1966)
  • scikit-image: 0.19+ required (#1966)
  • scipy: 1.7.2+ required (#1966)

Datamodules

New datamodules

Changes to existing datamodules

  • Remove torchgeo.datamodules.utils.dataset_split (#2005)
  • EuroSAT: make sure normalization is actually applied (#2176)

Changes to existing base classes

  • Fix plotting in datamodules when dataset is a subset (#2003)

Datasets

New datasets

Changes to existing datasets

  • Benin Cashews: migrate to Source Cooperative (#2116)
  • Benin Cashews: remove os.path.expanduser for consistency (#1705)
  • BigEarthNet: fix broken download link (#2174)
  • CDL: add 2023 checksum (#1844)
  • Chesapeake: update to 2022 edition (#2214)
  • ChesapeakeCVPR: reuse NLCD colormap (#1690)
  • Cloud Cover: migrate to Source Cooperative (#2117)
  • CV4A Kenya Crop Type: migrate to Source Cooperative (#2090)
  • EuroSAT: rename B08A to B8A to match Sentinel-2 (#1646)
  • FireRisk: redistribute on Hugging Face (#2000)
  • GlobBiomass: add min/max timestamp (#2086)
  • GlobBiomass: use float32 for pixelwise regression mask (#2086)
  • GlobBiomass: fix length of dataset (#2086)
  • L7 Irish: convert to IntersectionDataset (#2034)
  • L8 Biome: convert to IntersectionDataset (#2058)
  • LEVIR-CD+: split image into image1 and image2 for change detection (#1696)
  • NASA Marine Debris: migrate to Source Cooperative (#2206)
  • OSCD: support fine-grained band selection (#1684)
  • OSCD: split image into image1 and image2 for change detection (#1696)
  • PatternNet: redistribute on Hugging Face (#2100)
  • RESISC45: redistribute on Hugging Face (#2210)
  • Rwanda Field Boundary: don't plot empty masks during testing (#2254)
  • Rwanda Field Boundary: migrate to Source Cooperative (#2118)
  • Rwanda Field Boundary: remove os.path.expanduser for consistency (#1705)
  • SpaceNet 1–7: migrate to Source Cooperative (#2203)
  • Tropical Cyclone: migrate to Source Cooperative (#2068)
  • VHR-10: redistribute on Hugging Face (#2210)
  • VHR-10: improved plotting (#2092)
  • Wester USA Live Fuel Moisture: migrate to Source Cooperative (#2206)

Changes to existing base classes

  • Add support for pathlib.Path to all datasets (#2173)
  • Datasets can now use command-line utilities to download (#2064)
  • GeoDataset: bbox key was renamed to bounds (#2199)
  • GeoDataset: ignore other bands for separate files (#2222)
  • GeoDataset: don't warn about missing files for downloadable datasets (#2033)
  • RasterDataset: allow subclasses to specify which resampling algorithm to use (#2015)
  • RasterDataset: use nearest neighbors for int and bilinear for float by default (#2015)
  • RasterDataset: calculate resolution after changing CRS (#2193)
  • RasterDataset: support date_str containing % character (#2233)
  • RasterDataset: users can now specify the min/max time of a dataset (#2086)
  • VectorDataset: add dtype attribute to match RasterDataset (#1869)
  • VectorDataset: extract timestamp from filename to match RasterDataset (#1814)
  • IntersectionDataset: ignore 0 area overlap (#1985)

New error classes

  • DatasetNotFoundError: when a dataset has not yet been downloaded (#1714, #2053)
  • DependencyNotFoundError: when an optional dependency is not installed (#2054)
  • RGBBandsMIssingError: when you try to plot a dataset but don't use RGB bands (#1737, #2053)

Models

New model architectures

New model weights

Samplers

Changes to existing samplers

  • RandomGeoSampler: fix performance regression, 60% speedup with preprocessed data (#1968)

Trainers

New trainers

Changes to existing trainers

  • Explicitly specify batch size (#1928, #1933)
  • MoCo: explicitly specify memory bank size (#1931)
  • Semantic Segmentation: support ingore_index when using Jaccard loss (#1898)
  • SimCLR: switch from Adam to LARS optimizer (#2196)
  • SimCLR: explicitly specify memory bank size (#1931)

Transforms

  • Use Kornia's AugmentationSequential for all model weights (#1979)
  • Update TorchGeo's AugmentationSequential to support object detection (#1082)

Documentation

Changes to API docs

  • Datasets: add license information about every dataset (#1732)
  • Datasets: update link to cite SSL4EO-L dataset (#1942)
  • Models: emphasize new multimodal foundation models (#2236)
  • Trainers: update num_classes parameter description (#2101)

Changes to user docs

  • Alternatives: update metrics (#2259)
  • Contributing: explain how to use new I/O Bench dataset (#1972)

Changes to tutorials

  • Add button for the new Lightning Studios (#2146)
  • Remove button for the recently defunct Planetary Computer Hub (#2107)
  • Custom Raster Datasets: download the dataset before calling super (#2177)
  • Custom Raster Datasets: fix typo (#1987)
  • Transforms: update EuroSAT band names to match Sentinel-2 (#1646)

Other documentation changes

  • README: fix CLI example (#2142)
  • README: add Hugging Face badge (#1957)
  • README: fix example of creating fake raster data (#2162)
  • Read the Docs: use latest Ubuntu version to build (#1954)
  • Allow horizontal scrolling of wide tables (#1958)
  • Fix broken links and redirects (#2267)

Testing

Style

  • Use prettier for configuration files (#2018)
  • Use ruff for code files (#1994, #2001)

Type hints

  • Ensure all functions have type hints (#2217)
  • Make all class variables immutable (#2218)
  • Check for unreachable code (#2241)

Unit testing

  • Datasets: test dataset length (#2084, #2089)
  • Datamodules: don't download during testing (#2215, #2231)
  • download_url: add shared fixture to avoid code duplication (#2232)
  • load_state_dict: add shared fixture to avoid code duplication (#1932)
  • load_state_dict_from_url: add shared fixture to avoid code duplication (#2223)
  • torch_hub: add fixture to avoid downloading checkpoints to home directory (#2265)
  • Pytest: silence warnings (#1929, #1930, #2224)
  • PyVista: headless plotting (#1667)

Other CI changes

  • Check numpy 2 compliance (#2151)
  • Coverage: use newer flag to override ignores (#2260)
  • Dependabot: update devcontainer (#2025)
  • Dependabot: group torch and torchvision (#2025)
  • Labeler: update to v5 (#1759)
  • macOS: disable pip caching (#2024)
  • Windows: fail fast mode (#2225)

Contributors

This release is thanks to the following contributors:

@adamjstewart
@alhridoy
@ashnair1
@burakekim
@calebrob6
@cookie-kyu
@DarthReca
@Domejko
@favyen2
@GeorgeHuber
@isaaccorley
@kcrans
@nilsleh
@oddeirikigland
@pioneerHitesh
@piperwolters
@robmarkcole
@sfalkena
@ShadowXZT
@shreyakannan1205
@TropicolX
@wangyi111
@yichiac