Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a page to discuss performance #64

Merged
merged 4 commits into from
May 12, 2022
Merged

add a page to discuss performance #64

merged 4 commits into from
May 12, 2022

Conversation

jdries
Copy link
Contributor

@jdries jdries commented May 12, 2022

No description provided.

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some minor notes

In general, process graphs are first analyzed as a whole before the actual processing starts. The analysis phase serves
to reveal the optimal processing strategy and parameters.

These are a few examples of things that can be derived from a process graph and subsequent optimizations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please correct me if I'm wrong, but these examples don't seem all that backend-specific - if that is the case, wouldn't we want to come up with a selection of backend-independent pre-optimisations (editing the process graph directly), publish these as an openeo library and link to that here?
  • Are there any other relevant repos to link here? (I hesitate to link to openeo-odc, because it isn't very readable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, part of this is already part of openeo-python-driver which is a package that does not have too many dependencies on backend specific stuff.
I believe original authors of the odc backend simply choose to write and maintain their own basic process graph handling code, but there's indeed more opportunity to have shared libraries, especially for python related stuff.
Of course, most time is spent writing the actual processing engine, you can also reuse that, but then you don't run on dask anymore :-).

For scalability, the openEO processes clearly define along which set of dimension labels of the datacube they operate. When
a user writes a process graph, it should never instruct the backend to apply a black box algorithm or function on the
entire datacube. For most algorithms, this is not necessary, and loading the complete datacube of a Copernicus mission at once
is simply not possible. Hence, users run 'callbacks' over a 1-dimensional array, or even multidimensional arrays or 'chunks'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, as someone fairly new to openEO jargon, it is unclear to me what "callback" means here - is the idea to "chunk" an array of processing tasks into individual callables? If so, I think that wants to be explicitly described!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"callback" is meant to refer to the reducer argument in reduce_dimension or process argument in apply_dimension. In openEO these are called "processes", but I think that is too generic to be used as replacement for "callback" in this context. Maybe citing an example (e.g. reducer in reduce_dimension) could help in the text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link to python client docs as I don't know of a better alternative.
@m-mohr I would expect something here: https://openeo.org/documentation/1.0/developers/api/reference.html#section/Processes/Process-Graphs
But now it seems to be explained as 'user defined process', which I actually think of as something else.

Copy link
Member

@m-mohr m-mohr May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, callback people complained about it because it is "javascript" so now it is called "user-defined process" in openEO terminology, which is also used in the processes actually. Still not ideal though, indeed. What we recently used also is "child process" or "sub process", I think. But yeah maybe use "user-defined (child) process" and link to the API docs, indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@jdries
Copy link
Contributor Author

jdries commented May 12, 2022

@m-mohr should I just merge this one, so that we have a page for tomorrow's meeting?

@m-mohr
Copy link
Member

m-mohr commented May 12, 2022

@jdries I haven't read it, but it has some reviews so I added it to the menu and merged it. Thanks.

@m-mohr m-mohr merged commit ee869b3 into master May 12, 2022
@m-mohr m-mohr deleted the performance-guidelines branch May 12, 2022 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants