Archiver: explore a time-series database #8563

Tom-Willemsen · 2024-11-01T22:01:45Z

Issue Description

As a developer I would like to understand whether a time-series database such as influxdb or timescaledb would be more supportable / easier to adapt to our needs than the archiver appliance.

This is exploratory work, where the goal is to form an opinion on whether this would be better/worse than the archive appliance implementation we started doing a PoC of.

This ticket should be timeboxed.

Some things we want in a new archiver in addition to what we have are:

Move away from CSS archiver as part of ongoing move away from css tools.
Support for decimation/downsampling of older data
- So that we can stop manually truncating DBs
- Multi-stage is nice, but a single stage would probably be sufficient in reality.
Support for configurable retention policies (eventually dropping old data)
- So that we can stop manually truncating DBs
Support for adding/removing/reconfiguring pvs to archive without restarting entire archiver
Some way of displaying in existing GUI - this could mean something that it already supported by databrowser e.g. influx, rdb, pbraw, or we'll need to write code to support it.
"support" (or at least, the archiver doesn't mind too much) if:
- PVs change datatype over time (e.g. blocks may do this if one IOC uses ints and another uses float for the same measurement - and both define the same block)
- Lots of PVs are disconnected (this is something the existing CSS archiver does ok, but the archive appliance struggles with)
Generally handles pv disconnection/reconnection gracefully and doesn't assume "most" pvs will be connected "most" of the time

How & Where?

Some items for the reading list:

https://accelconf.web.cern.ch/icalepcs2023/papers/tupdp108.pdf (from our colleagues over on the ISIS accelerator - we can also just talk to them about what they like/dislike...)
https://indico.jacow.org/event/80/contributions/5947/contribution.pdf
https://inspirehep.net/files/0b1eac4f436be0b3e220459d5d4cea9f
https://accelconf.web.cern.ch/icalepcs2017/papers/thpha032.pdf
https://github.com/ControlSystemStudio/cs-studio/pull/2246/files
- hypothetically if we wrote our data in that format into influxdb, without necessarily actually using the CSS archiver, the existing databrowser in the GUI would likely work... this is something I think is worth exploring.
- At the very least this is an archive implementation where someone has already used influxdb, so will likely contain useful implementation ideas/inspiration.

Acceptance Criteria

Read the above material and any other related material around archiving into timeseries databases at comparable facilities
Write an exploratory proof-of-concept that puts data into a timeseries database (influxdb or timescaledb is suggested, but if an obvious alternative emerges that is also fine)
- Decimation/downsampling - prove it's possible (it should be simple!)
Document any pros/cons of the approach
Generate tickets to further the implementation

How to Review

_{Before making a PR...}

Provide verbose instructions for the reviewer to test that your changes work and fix the issue
Describe if/how you have implemented testing for this issue
Provide screenshots of the feature to help the reviewer if relevant

_{If not applicable, write "Not applicable"}

...

_{To the reviewer: Make sure to update submodules!}

The text was updated successfully, but these errors were encountered:

FreddieAkeroyd · 2024-11-02T00:20:58Z

I think I recall a conversation with Kai about influx - the work was done as a proff of concept by a fixed term student, and i can't remember how complete it was as i thought the work stopped when they finished, but maybe it was continued furtehr.
from a previous conversation with accelerators, i think there were issues with array data and influxdb

FreddieAkeroyd · 2024-11-02T00:30:08Z

I was looking at TimescaleDB at one point - I may have though it could potentially drop in as a replacement DB with better performance , but that was a while back and I may misremember. I think it may potentially handle logging of an array datatype better than influx.

Tom-Willemsen · 2024-11-08T17:37:20Z

Another option I discussed with @rerpha is kafka - as that's the direction we look to be going facility-wide for other data in the medium term, so we'd then be able to re-use that same infrastructure rather than setting up something different (and maintaining it) just for the archiver, i.e. less duplicated work long-term.

Would mean our "archiver" is conceptually an instance of the forwarder and we'd need to write a kafka reader for CS-Studio log plotter.

FreddieAkeroyd · 2024-11-11T01:17:40Z

Do we have a list of the issues we have with the archiver appliance in a ticket somewhere? Was it just to do with when PVs didn't exist and it was started or are there more? If the archiver appliance did work sufficiently well, then it could also use kafka via a storage plugin backend?

Tom-Willemsen added the proposal label Nov 1, 2024

ISISBuilder added this to PI_2024_08 Nov 1, 2024

ISISBuilder moved this to Backlog in PI_2024_08 Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archiver: explore a time-series database #8563

Archiver: explore a time-series database #8563

Tom-Willemsen commented Nov 1, 2024 •

edited

Loading

FreddieAkeroyd commented Nov 2, 2024

FreddieAkeroyd commented Nov 2, 2024 •

edited

Loading

Tom-Willemsen commented Nov 8, 2024 •

edited

Loading

FreddieAkeroyd commented Nov 11, 2024

Archiver: explore a time-series database #8563

Archiver: explore a time-series database #8563

Comments

Tom-Willemsen commented Nov 1, 2024 • edited Loading

Issue Description

How & Where?

Acceptance Criteria

How to Review

FreddieAkeroyd commented Nov 2, 2024

FreddieAkeroyd commented Nov 2, 2024 • edited Loading

Tom-Willemsen commented Nov 8, 2024 • edited Loading

FreddieAkeroyd commented Nov 11, 2024

Tom-Willemsen commented Nov 1, 2024 •

edited

Loading

FreddieAkeroyd commented Nov 2, 2024 •

edited

Loading

Tom-Willemsen commented Nov 8, 2024 •

edited

Loading