Skip to content

Fix recursive search in Client.get_items #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

- Fix usage documentation of `ItemSearch`
- Fix fields argument to CLI ([#797](https://github.com/stac-utils/pystac-client/pull/797))
- Fix recursive search in `Client.get_items` ([#799](https://github.com/stac-utils/pystac-client/pull/799))

## [v0.8.6] - 2025-02-11

Expand Down
17 changes: 11 additions & 6 deletions pystac_client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,21 +443,26 @@ def get_collections(self) -> Iterator[Collection]:
call_modifier(self.modifier, collection)
yield collection

def get_items(
self, *ids: str, recursive: bool | None = None
) -> Iterator["Item_Type"]:
def get_items(self, *ids: str, recursive: bool = False) -> Iterator["Item_Type"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should change the function signature. If we need to change the underlying behavior, that might be ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you but this overrides a method from pystac.Catalog and changes the function signature of the method that it overrides. In my experience, it is best not to change the interface of an inherited method unless absolutely necessary.

An example of how this sort of thing causes problems can be found here: #799 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it is best not to change the interface of an inherited method unless absolutely necessary.

Agreed, which is why I think it was a mistake to inherit from Catalog in the first place, but here we are 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh well 😆

"""Return all items of this catalog.

Args:
ids: Zero or more item ids to find.
recursive: unused in pystac-client, but needed for falling back to pystac
recursive : If True, search this catalog and all children for the
item; otherwise, only search the items of this catalog. Defaults
to False.

Return:
Iterator[Item]: Iterator of items whose parent is this
catalog.
"""
if self.conforms_to(ConformanceClasses.ITEM_SEARCH):
search = self.search(ids=ids)
# Previously, recursive=None was treated the same as recursive=True.
# This if statement maintains this behaviour for backwards compatibility.
if recursive is not False:
search = self.search(ids=ids)
else:
search = self.search(ids=ids, collections=[self.id])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right, since the client is a Catalog, not a Collection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that the naming is not ideal but that's the parameter name that the API provides.

Items can be direct children of Catalogs and the API spec does not provide a separate catalogs= parameter to differentiate between catalogs and collections. Specifying the catalog id in the collections parameter works with at least one API implementation (stac-fastapi) but I guess the spec doesn't specify what to do in this edge case.

The other option is to skip the option to use the search endpoint for all non-recursive calls and do something like:

        if self.conforms_to(ConformanceClasses.ITEM_SEARCH) and recursive:
            yield from self.search(ids=ids).items()
        else:
            if not self.conforms_to(ConformanceClasses.ITEM_SEARCH):
                self._warn_about_fallback("ITEM_SEARCH")
            for item in super().get_items(
                *ids, recursive=recursive is None or recursive
            ):
                call_modifier(self.modifier, item)
                yield item

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the catalog id in the collections parameter works with at least one API implementation (stac-fastapi) but I guess the spec doesn't specify what to do in this edge case.

Yeah, we try to expand pystac-client with heuristics to help it work with real-world instances (rather than being strictly spec-enforcing) but this use-case is unusual enough that I'm not sure it's worth the complexity to manage.

I'm still not sure the problem we're trying to solve here is pystac-client's problem. As the original docstring said, we're not using recursive in pystac-client at all, we only use it when we fall back to pystac for non-API searches. So I'm a bit inclined to say "if pystac-client's recursion behavior isn't what you want, just use pystac directly"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the confusing thing for me as a user of pystac-client is that the recursive behaviour is inconsistent depending on whether its using the /search endpoint or not.

Currently the behaviour is:

  • if using /search: always recursive
  • otherwise: it depends on the recursive argument

If the solution is to just say don't use pystac-client in this case then let's at least document this better. Maybe change this

recursive: unused in pystac-client, but needed for falling back to pystac

to

recursive: If this client conforms to the ITEM_SEARCH conformance class, this is unused and this will always yield items recursively. Otherwise, this will only return items recursively if True.

Or something similar that talks about the distinction.

On a personal note... I don't think I'll be able to use pystac-client in my applications if we go this route.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 to the docs update. pystac-client is for STAC APIs, not static STAC catalogs, and our fallback to pystac is more of a convenience than a core feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's fine. Just so you know, the documentation talks about pystac in a way that makes it seem like pystac is more than just a convenience so you might understand why people might assume that pystac-client would align more closely with pystac than it does:

In that last link you even have the line (in the consequences heading):

"Special care should be taken to ensure that we do not break any of PySTAC’s functionality through inheritance."

Which is exactly the issue that this PR is trying to address

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I appreciate the call-out. There's been discussions over the years on whether we should even have the two libraries be separate (for one example, stac-utils/pystac#1334 (comment)). Any documentation cleanup/fixes to make things clearer for folks would be appreciated 🙇🏼.

FWIW My current thinking is that if we ever wanted to go to a v1.0 release of pystac-client, we'd want to drop inheritance altogether to avoid these problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, that makes sense. I understand now why pystac-client is taking this approach.

I've created a separate PR #800 that just updates the docstring as we discussed.

yield from search.items()
else:
self._warn_about_fallback("ITEM_SEARCH")
Expand All @@ -476,7 +481,7 @@ def get_all_items(self) -> Iterator["Item_Type"]:
catalogs or collections connected to this catalog through
child links.
"""
yield from self.get_items()
yield from self.get_items(recursive=True)

def search(
self,
Expand Down
232 changes: 232 additions & 0 deletions tests/cassettes/test_client/test_get_items_non_recursion.yaml

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

10,560 changes: 0 additions & 10,560 deletions tests/cassettes/test_client/test_get_items_without_ids.yaml

This file was deleted.

354 changes: 354 additions & 0 deletions tests/cassettes/test_client/test_recursion_on_fallback.yaml

Large diffs are not rendered by default.

45 changes: 42 additions & 3 deletions tests/test_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,19 +738,58 @@ def test_collections_are_clients() -> None:


@pytest.mark.vcr
def test_get_items_without_ids() -> None:
def test_get_items_recursion_collections_required_without_ids() -> None:
"""
Make sure recursion using /search works when the server requires collections
when searching
"""
client = Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1/",
"https://stac.sage.uvt.ro/",
)
next(client.get_items())


@pytest.mark.vcr
def test_get_items_recursion_no_collections_without_ids() -> None:
"""
Make sure recursion using /search works when the server does not require collections
when searching
"""
client = Client.open(
"https://paituli.csc.fi/geoserver/ogc/stac/v1/",
)
next(client.get_items())


@pytest.mark.vcr
def test_get_items_non_recursion() -> None:
"""Make sure that non-recursive search is used when using /search"""
client = Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1/",
)
with pytest.raises(StopIteration):
next(client.get_items(recursive=False))


@pytest.mark.vcr
def test_non_recursion_on_fallback() -> None:
"""
Make sure that non-recursive search using fallback only looks for
non-recursive items
"""
path = "https://raw.githubusercontent.com/stac-utils/pystac/v1.9.0/docs/example-catalog/catalog.json"
catalog = Client.from_file(path)
with pytest.warns(FallbackToPystac), pytest.raises(StopIteration):
next(catalog.get_items(recursive=False))


@pytest.mark.vcr
def test_recursion_on_fallback() -> None:
"""Make sure that recursive search using fallback looks for recursive items"""
path = "https://raw.githubusercontent.com/stac-utils/pystac/v1.9.0/docs/example-catalog/catalog.json"
catalog = Client.from_file(path)
with pytest.warns(FallbackToPystac):
[i for i in catalog.get_items()]
next(catalog.get_items(recursive=True))


@pytest.mark.vcr
Expand Down