-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a page to discuss performance #64
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor notes
In general, process graphs are first analyzed as a whole before the actual processing starts. The analysis phase serves | ||
to reveal the optimal processing strategy and parameters. | ||
|
||
These are a few examples of things that can be derived from a process graph and subsequent optimizations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please correct me if I'm wrong, but these examples don't seem all that backend-specific - if that is the case, wouldn't we want to come up with a selection of backend-independent pre-optimisations (editing the process graph directly), publish these as an openeo library and link to that here?
- Are there any other relevant repos to link here? (I hesitate to link to openeo-odc, because it isn't very readable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, part of this is already part of openeo-python-driver which is a package that does not have too many dependencies on backend specific stuff.
I believe original authors of the odc backend simply choose to write and maintain their own basic process graph handling code, but there's indeed more opportunity to have shared libraries, especially for python related stuff.
Of course, most time is spent writing the actual processing engine, you can also reuse that, but then you don't run on dask anymore :-).
For scalability, the openEO processes clearly define along which set of dimension labels of the datacube they operate. When | ||
a user writes a process graph, it should never instruct the backend to apply a black box algorithm or function on the | ||
entire datacube. For most algorithms, this is not necessary, and loading the complete datacube of a Copernicus mission at once | ||
is simply not possible. Hence, users run 'callbacks' over a 1-dimensional array, or even multidimensional arrays or 'chunks' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, as someone fairly new to openEO jargon, it is unclear to me what "callback" means here - is the idea to "chunk" an array of processing tasks into individual callables? If so, I think that wants to be explicitly described!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"callback" is meant to refer to the reducer
argument in reduce_dimension
or process
argument in apply_dimension
. In openEO these are called "processes", but I think that is too generic to be used as replacement for "callback" in this context. Maybe citing an example (e.g. reducer
in reduce_dimension
) could help in the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link to python client docs as I don't know of a better alternative.
@m-mohr I would expect something here: https://openeo.org/documentation/1.0/developers/api/reference.html#section/Processes/Process-Graphs
But now it seems to be explained as 'user defined process', which I actually think of as something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, callback people complained about it because it is "javascript" so now it is called "user-defined process" in openEO terminology, which is also used in the processes actually. Still not ideal though, indeed. What we recently used also is "child process" or "sub process", I think. But yeah maybe use "user-defined (child) process" and link to the API docs, indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@m-mohr should I just merge this one, so that we have a page for tomorrow's meeting? |
@jdries I haven't read it, but it has some reviews so I added it to the menu and merged it. Thanks. |
No description provided.