Add azure support by shrek · Pull Request #749 · NVIDIA/earth2studio

shrek · 2026-03-13T20:09:15Z

Earth2Studio Pull Request

Description

This PR adds the following functions required for running inference on azure with integrations with azure blob for inference results, and planetary computer geocatalog for ingestion of inference results.

Object storage functionality is enhanced to support azure blob so that inference results can be uploaded to the azure-blob. For this, multi-storage-client is updated to a more recent version that supports azure default identity.
Geocatalog client is added which has python utilities to interface to the Planetary Computer geocatalog API.
Inference pipeline is updated to add config knobs to be able to perform:
API -> inference -> upload results to azure-blob -> trigger ingestion into geocatalog
Two foundry inference workflows are added. For these 2 workflows, json metadata files are added for geocatalog APIs for ingestion.

Misc enhancements unrelated to azure

Range support is added for inference result file-download from server. This helps in scenarios for large-file download from the server itself.
Configuration to limit the EXPOSED_WORKFLOWS in the API. This allows a subset of available workflows to be exposed in the API
cpu workers were consuming gpu memory. Fix this by not exposing any gpus to them.

Tests

The above functionality is tested e-2-e on azure as an online-endpoint. This includes uploading inference results to azure blob, and geocatalog ingestion.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

…o add-azure-support

greptile-apps · 2026-03-16T19:07:59Z

Greptile Summary

This PR adds Azure Blob Storage support for inference result uploads, a GeoCatalog (Planetary Computer) STAC ingestion client, two new foundry inference workflows (foundry_fcn3, foundry_fcn3_stormscope_goes), workflow exposure filtering, and RFC 9110-compliant range-request support for file downloads. The changes integrate a new geocatalog_ingestion pipeline stage between object-storage upload and metadata finalization.

The implementation is generally well-structured and has good test coverage. Several issues flagged in earlier review rounds (missing HTTP status checks in _create_element, polling error handling, end >= file_size RFC 9110 violation, ValueError from int() on malformed Range headers) appear to have been addressed. Key remaining concerns:

Geocatalog workers are unconditionally started and verified at server startup, even when AZURE_GEOCATALOG_URL is not configured. The startup script will hard-fail (exit 1) if geocatalog workers don't start, and check_admission_control always monitors the geocatalog queue depth — meaning a stalled geocatalog queue can block inference requests in non-Azure deployments.
Azure connection strings are written to the global process environment (os.environ["AZURE_CONNECTION_STRING"]), which exposes credentials to child processes; this may be unavoidable given MSC's env-var expansion model but should be documented.
test_planetary_computer.py GeoCatalog tests lack an importorskip guard for azure-identity, which will produce confusing ImportErrors instead of clean skips in environments without the serve extras.
A typo "preceed" → "precede" in foundry_fcn3_stormscope_goes.py.

Confidence Score: 2/5

Needs attention before merging — unconditional geocatalog worker requirement will break non-Azure server startups if workers fail, and several issues from earlier rounds remain open.
The core Azure integration logic is sound and well-tested, but the geocatalog worker startup is now a mandatory dependency for all deployments (the script exits with code 1 if no geocatalog workers are running), even when Azure geocatalog is not configured. Combined with the unaddressed wildcard SAS URL issue from previous rounds and the timezone-naive sentinel in validate_start_time/validate_start_times, the PR carries meaningful risk for both Azure and non-Azure deployments.
serve/server/scripts/start_api_server.sh (mandatory geocatalog worker check), earth2studio/serve/server/object_storage.py (wildcard SAS URL from prior thread, connection string in env), serve/server/example_workflows/foundry_fcn3_stormscope_goes.py (open issues from prior threads).

Important Files Changed

Filename	Overview
earth2studio/data/planetary_computer.py	Adds `GeoCatalogClient` for STAC ingestion into Planetary Computer GeoCatalog. Token refresh happens once per `create_feature` call; polling loop may run up to 5 minutes but tokens generally outlive that. Prior review threads addressed missing status checks and polling error handling — those issues appear resolved in this version.
earth2studio/serve/server/cpu_worker.py	Adds `process_geocatalog_ingestion` pipeline stage and significant refactoring for Azure storage support. Key concern: geocatalog workers are unconditionally included in admission control checks regardless of whether `AZURE_GEOCATALOG_URL` is configured, which could cause unnecessary queue-depth-related request blocking in non-Azure deployments.
earth2studio/serve/server/object_storage.py	Adds Azure Blob Storage support to `MSCObjectStorage`. The connection string (which may contain `AccountKey`) is written to the global process environment via `os.environ`. The wildcard SAS URL issue previously flagged in review threads remains unaddressed in this iteration.
earth2studio/serve/server/utils.py	Adds RFC 9110-compliant `parse_range_header` and `create_file_stream` utilities for partial content delivery. Implementation looks correct; `end >= file_size` is now clamped rather than rejected per §14.1.2.
earth2studio/serve/server/workflow.py	Adds `is_workflow_exposed` and updates `list_workflows` to support filtering. Warmup workflows are intentionally accessible via API endpoints but excluded from the public listing — design is clearly documented and tests cover both behaviors.
serve/server/scripts/start_api_server.sh	Adds geocatalog worker startup and `CUDA_VISIBLE_DEVICES=""` isolation for all CPU workers. Geocatalog workers are always started (and verified to have started) regardless of whether `AZURE_GEOCATALOG_URL` is set, consuming a process slot unconditionally.
serve/server/example_workflows/foundry_fcn3_stormscope_goes.py	New FCN3+StormScopeGOES workflow. Contains `if not seeds` check (flagged in previous thread) and timezone-naive sentinel in `validate_start_times` (flagged in previous thread). Also contains typo "preceed" → "precede" in error message.

Comments Outside Diff (2)

serve/server/scripts/start_api_server.sh, line 208-215 (link)

Geocatalog workers always required, even without Azure

The script unconditionally starts NUM_GEOCATALOG_WORKERS geocatalog workers and then hard-fails (exit 1) if none are found running. This means every deployment — including those that never set AZURE_GEOCATALOG_URL — must have geocatalog workers running or the server won't start.

Similarly, check_admission_control() in main.py always monitors the geocatalog_ingestion queue depth. If the geocatalog queue fills up for any reason (e.g., stale jobs, worker restart lag), it will block new inference requests even in non-Azure deployments.

Consider making both the worker startup and the admission-control check conditional on AZURE_GEOCATALOG_URL being configured:
```
# Only start geocatalog workers when geocatalog is enabled
if [ -n "$AZURE_GEOCATALOG_URL" ]; then
    echo "Starting $NUM_GEOCATALOG_WORKERS geocatalog ingestion workers..."
    GEOCATALOG_WORKER_PIDS=()
    for i in $(seq 1 $NUM_GEOCATALOG_WORKERS); do
        CUDA_VISIBLE_DEVICES="" rq worker -w rq.worker.SimpleWorker geocatalog_ingestion &
        GEOCATALOG_WORKER_PIDS+=($!)
        echo "Started geocatalog ingestion worker $i (PID: $!)"
    done
fi
```
earth2studio/serve/server/object_storage.py, line 1437-1442 (link)

Connection string written to global process environment

os.environ["AZURE_CONNECTION_STRING"] = azure_connection_string persists the full connection string (which typically includes the storage account key) into the process environment. This value is visible to all child processes spawned after this call and is readable from /proc/self/environ on Linux.

While MSC requires the env-var reference for the ${AZURE_CONNECTION_STRING} substitution in the profile config, it's worth checking whether MSC supports passing the value inline in the profile dict rather than via env-var expansion. If inline values are supported, that would avoid leaking credentials into the process environment. Otherwise, please add a comment explaining why the env var must be set here so future readers understand the trade-off.

_{Last reviewed commit: 056b714}

serve/server/example_workflows/foundry_fcn3_stormscope_goes.py

earth2studio/serve/server/utils.py

earth2studio/data/planetary_computer.py

earth2studio/serve/server/utils.py

earth2studio/data/planetary_computer.py

earth2studio/serve/server/cpu_worker.py

earth2studio/serve/server/object_storage.py

earth2studio/data/planetary_computer.py

earth2studio/serve/server/cpu_worker.py

serve/server/example_workflows/foundry_fcn3_stormscope_goes.py

earth2studio/serve/server/object_storage.py

serve/server/example_workflows/foundry_fcn3.py

test/serve/server/test_utils.py

swbg · 2026-03-18T13:38:33Z

earth2studio/serve/server/cpu_worker.py

+                storage_info["remote_path"] = (
+                    f"azure://{container_name}/{remote_prefix}"
+                )
+                # Build HTTPS blob URL for primary netcdf file (for GeoCatalog ingestion)


Will this only work for single NetCDF4 files or also Zarr archives?

thanks, good catch ! this works only for netcdf4, i am fixing it so, it works for zarr as well.

swbg · 2026-03-18T13:39:48Z

earth2studio/data/planetary_computer.py

                )
+
+
+class GeoCatalogClient:


Will let @NickGeneva comment whether this is the right place to put the client that starts the data ingestion into Microsoft Planetary Computer from Azure Blob Storage. It is not a data source, more like an IO utility so we may want to put it somewhere else.

shrek · 2026-03-19T20:18:44Z

earth2studio/data/planetary_computer.py

+                "Install with the serve extra or pip install azure-identity."
+            ) from e
+        self._DefaultAzureCredential = _DefaultAzureCredential
+        self._workflow_name = workflow_name


create filenames with workflow name as prefix. keep workflow name consistent throughout. parameter mapping is awkward - fix that too.

shrek added 11 commits March 12, 2026 16:36

rebase to main

3d7fd50

skip checking example workflows

cf7161b

updates after review

1beaf59

Merge branch 'add-azure-support' of github.com:shrek/earth2studio int…

86dd4d0

…o add-azure-support

basic server works

fbb0417

az on lepton works

abb6ea6

refactor pc_client

52c414b

add collection-id parameter to foundry workflows

9897066

update after tests

69d947c

refactor class name

f28dfef

add unit tests

e30a32e

shrek requested review from NickGeneva, gh0st-ryder, jleinonen and swbg March 16, 2026 17:20

shrek added 3 commits March 16, 2026 11:21

fix ut

3a9c383

add ut for wf expose

b8bb95d

tweak import

4e0f64d

shrek marked this pull request as ready for review March 16, 2026 19:03

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

serve/server/example_workflows/foundry_fcn3_stormscope_goes.py Outdated Show resolved Hide resolved

earth2studio/serve/server/utils.py Outdated Show resolved Hide resolved

earth2studio/data/planetary_computer.py Outdated Show resolved Hide resolved

fix greptile comments

c625fbf

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

earth2studio/serve/server/utils.py Outdated Show resolved Hide resolved

earth2studio/data/planetary_computer.py Outdated Show resolved Hide resolved

earth2studio/serve/server/cpu_worker.py Show resolved Hide resolved

fix greptile comments

f812e51

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

fix greptile comments

6ab4c2f

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

earth2studio/serve/server/object_storage.py Show resolved Hide resolved

serve/server/example_workflows/foundry_fcn3.py Show resolved Hide resolved

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

test/serve/server/test_utils.py Show resolved Hide resolved

shrek added 2 commits March 17, 2026 09:42

fix greptile comment

056b714

fix failing tests

1243af9

shrek added 6 commits March 17, 2026 11:32

fix failing tests

85f86d1

update tests

507a0e2

update tests

b14ac22

update tests

db018c2

update tests

29a3ef5

update client

dfb83ee

swbg reviewed Mar 18, 2026

View reviewed changes

shrek commented Mar 19, 2026

View reviewed changes

Conversation

shrek commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Earth2Studio Pull Request

Description

Checklist

Dependencies

Uh oh!

greptile-apps bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

swbg Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

shrek Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

swbg Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

shrek Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shrek commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 16, 2026 •

edited

Loading