-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XPublish: Platform standardization improvements #42
Comments
I'd be interested in contributing here (likely remotely). |
@cjolson64 I'd like to be involved in contributing (remotely) to this. |
Thank you for taking the time to propose this topic! From the Code Sprint topic survey, this has garnered a lot of interest. Following the contributing guidelines on selecting a code sprint topic I have assigned this topic to @jonmjoyce . Unless indicated otherwise, the assignee will be responsible for identifying a plan for the code sprint topic, establishing a team, and taking the lead on executing said plan. The first action for the lead is to:
|
@jonmjoyce, The description in this issue is a little different than the one in the project website. There, you mentioned the following goal:
I'd be very interested to hear the group's thoughts on how this development might alter the following schematic: That image is pretty dated, but it's still in use. @srstsavage had a hand in the old 52N server back in the day. As Xpublish + plugins become better tested and more easily deployable, we need a better description (both text and graphical) of how they integrate into and possibly replace part of the tech stack in that picture. It's beyond the scope of the code sprint, but as more data moves into the cloud—or perhaps into all the clouds—we'll need to think more about data architecture.
Thanks for your efforts in leading this development, and I look forward to hearing about the group's progress! |
The resting data architecture (the data lake) should be the same. Both workflows can access the cloud-optimized data directly, ideally through catalogs like intake. In addition, Jupyter instances can pull from Xpublish (for example to get a subset), but more complex calculations should still access the ARCO data directly.
Technically one. What are NOAA's requirements? We can have one copy and apply policies through the cloud provider to implement cross-regional backups, disaster recovery, high-availability, etc.
Regional data replication on the cloud is one option in the case of heavily used data. But thinking that most data usage is real-time (latest forecasts and obs), it might make sense to just regionally cache the recent products for usage by xpublish, and consider the archive as a different (and cheaper) storage solution. A tiered approach, with IOOS defining requirements for hot, warm, and cold data.
We have a good handle on the model workflows now, and I think integrating these other datasets will be a key prototype to explore. But more or less we can follow similar patterns, just adjusting the underlying ARCO data model to the data (instead of kerchunk). The workflow is notify -> ARCO -> data lake. Then pull from services, which could be more xpublish plugins to support those data types, but doesn't necessarily need to be the same project as they're different access methods versus grids. |
Project Description
Discuss and decide some standard configurations for xpublish deployments. XPublish is being hosted by a few groups right now in prototype. To increase adoption, a recommended "standard" deployment option should be documented, along with a standard Dockerfile. We want to continue to enable xpublish to run on many cloud platforms but it would be nice to make it easier for people to try without having to create a Python environment.
One opinionated deployment we have been using is XREDS.
Expected Outcomes
Skills required
Python, Docker
Expertise
Intermediate
Topic Lead(s)
Jonathan Joyce jonmjoyce, Matt Iannucci mpiannucci
Relevant links
https://github.com/xpublish-community/xpublish
https://github.com/asascience-open/xreds
The text was updated successfully, but these errors were encountered: