Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiver: explore a time-series database #8563

Open
4 tasks
Tom-Willemsen opened this issue Nov 1, 2024 · 4 comments
Open
4 tasks

Archiver: explore a time-series database #8563

Tom-Willemsen opened this issue Nov 1, 2024 · 4 comments
Labels

Comments

@Tom-Willemsen
Copy link
Contributor

Tom-Willemsen commented Nov 1, 2024

Issue Description

As a developer I would like to understand whether a time-series database such as influxdb or timescaledb would be more supportable / easier to adapt to our needs than the archiver appliance.

This is exploratory work, where the goal is to form an opinion on whether this would be better/worse than the archive appliance implementation we started doing a PoC of.

This ticket should be timeboxed.


Some things we want in a new archiver in addition to what we have are:

  • Move away from CSS archiver as part of ongoing move away from css tools.
  • Support for decimation/downsampling of older data
    • So that we can stop manually truncating DBs
    • Multi-stage is nice, but a single stage would probably be sufficient in reality.
  • Support for configurable retention policies (eventually dropping old data)
    • So that we can stop manually truncating DBs
  • Support for adding/removing/reconfiguring pvs to archive without restarting entire archiver
  • Some way of displaying in existing GUI - this could mean something that it already supported by databrowser e.g. influx, rdb, pbraw, or we'll need to write code to support it.
  • "support" (or at least, the archiver doesn't mind too much) if:
    • PVs change datatype over time (e.g. blocks may do this if one IOC uses ints and another uses float for the same measurement - and both define the same block)
    • Lots of PVs are disconnected (this is something the existing CSS archiver does ok, but the archive appliance struggles with)
  • Generally handles pv disconnection/reconnection gracefully and doesn't assume "most" pvs will be connected "most" of the time

How & Where?

Some items for the reading list:

Acceptance Criteria

  • Read the above material and any other related material around archiving into timeseries databases at comparable facilities
  • Write an exploratory proof-of-concept that puts data into a timeseries database (influxdb or timescaledb is suggested, but if an obvious alternative emerges that is also fine)
    • Decimation/downsampling - prove it's possible (it should be simple!)
  • Document any pros/cons of the approach
  • Generate tickets to further the implementation

How to Review

Before making a PR...

  • Provide verbose instructions for the reviewer to test that your changes work and fix the issue
  • Describe if/how you have implemented testing for this issue
  • Provide screenshots of the feature to help the reviewer if relevant

If not applicable, write "Not applicable"

...

To the reviewer: Make sure to update submodules!

@FreddieAkeroyd
Copy link
Member

  • I think I recall a conversation with Kai about influx - the work was done as a proff of concept by a fixed term student, and i can't remember how complete it was as i thought the work stopped when they finished, but maybe it was continued furtehr.
  • from a previous conversation with accelerators, i think there were issues with array data and influxdb

@FreddieAkeroyd
Copy link
Member

FreddieAkeroyd commented Nov 2, 2024

I was looking at TimescaleDB at one point - I may have though it could potentially drop in as a replacement DB with better performance , but that was a while back and I may misremember. I think it may potentially handle logging of an array datatype better than influx.

@Tom-Willemsen
Copy link
Contributor Author

Tom-Willemsen commented Nov 8, 2024

Another option I discussed with @rerpha is kafka - as that's the direction we look to be going facility-wide for other data in the medium term, so we'd then be able to re-use that same infrastructure rather than setting up something different (and maintaining it) just for the archiver, i.e. less duplicated work long-term.

Would mean our "archiver" is conceptually an instance of the forwarder and we'd need to write a kafka reader for CS-Studio log plotter.

@FreddieAkeroyd
Copy link
Member

Do we have a list of the issues we have with the archiver appliance in a ticket somewhere? Was it just to do with when PVs didn't exist and it was started or are there more? If the archiver appliance did work sufficiently well, then it could also use kafka via a storage plugin backend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

2 participants