Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance of Auto-ingestion Data Pipelines #127

Open
iwensu0313 opened this issue Jan 10, 2025 · 2 comments
Open

Maintenance of Auto-ingestion Data Pipelines #127

iwensu0313 opened this issue Jan 10, 2025 · 2 comments
Assignees

Comments

@iwensu0313
Copy link

iwensu0313 commented Jan 10, 2025

Who is requesting this?

IOOS Marine Life

What is being requested?

This is an open ticket to capture continuous maintenance and updates of the auto-ingestion pipelines that have been developed for the ATN DAC. Also we will capture causes of data issues where identified to help us learn from them to improve robustness of existing pipeline or for new pipelines for other telemetry manufacturers e.g. Ornitela

What is the requested deadline and why?

No response

What is the current status quo (i.e., what happens if this does not get done)?

No response

What indicates this is done (i.e., how do we know this is complete)?

This is an ongoing effort.

Provide a description or any other important information.

Current auto-ingestion pipelines have been set up for:

  • Wildlife Computers (WC) trajectory and profile
  • SMRU trajectory and profile

Notes:

  • The bulk of incoming data are from WC.
@iwensu0313
Copy link
Author

iwensu0313 commented Jan 10, 2025

Updates

  • Should be ready to push changes live to address GPE3 issue Wildlife Computers GPE3 Auto-Ingestion Pipeline Update #117 next week
    • roughly 10% of datasets we ingest for ATN have GPE3 data, so this will help minimize manual troubleshooting
    • this will unblock Skomal
    • Orbesen also has GPE3 (a few deployments problematic, but troubleshooting required to determine cause; unsure if it will be fixed by this implementation yet)
  • We noted in ~Nov 2024, several real-time/active project data stopped flowing into the portal, by checking the ATN all layer. After some checks, we identified that the deployments have not ended yet and began troubleshooting
  • A few improvements were made to address this issue
    • Update scheduler to re-run ingestions periodically
    • Identified and addressed infra design (specifically locks) that were killing some processes
    • Added deployment logging to more easily and quickly identify which ones fail/succeeded
  • This seemed to have resolved a large number of deployments that were failing previously, yay!
  • Through the above troubleshooting we identified some aniMotum fails, though identify the scope of failure will require add'l troubleshooting and poking around. We do know that a failed aniMotum does not prevent raw data and other ifles from being ingested

@MathewBiddle @conniekot

@iwensu0313
Copy link
Author

iwensu0313 commented Feb 6, 2025

Update: the GPE3 pipeline improvements referenced in the prior update above were pushed mid Jan, it allowed us to ingest the rest of Skomal automatically instead of manually; this should assist in future ingests w/ GPE3 data as well. No new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant