-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend "take over" mode in filestream
to cover the changing ID use-case
#42884
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
I also like the idea of having a single implementation. There is an issue of migrating
For the For the If we change the way we access the store, we can circumvent this limitation. |
This is a valid point, however, I don't think it's a common use-case. It seems like we could allow to migrate without setting the IDs by default: we skip all files that are present multiple times, printing a warning each time, recommending to use the ID. Or even fail the migration with an error message (the input does not start, states don't change). Another important moment is that when I introduced take_over:
enabled: true
# if empty, we try and fail if the same file is tracked by multiple inputs
# if has items, we take over states ONLY from inputs with given IDs.
from_ids: [] I think having
I might be wrong but it should be possible to detect that the "taking over" input already has a state for the file and just skip it without migrating again. |
@rdner and I talked, here is what we decided and some corner case we Filestream and Log input are handled the same wayThe state migration for During the
|
@belimawr this is a great summary, thank you so much!
In theory we could just detect that the "taking over" input already has a state for this file path and just skip it during the next migration run. This should ensure migrating a file state only once. However, this also means that the user would see data duplication until the old input is removed from the config. This puts us in a tricky situation:
I think it's okay to delete the state entries if we are very clear about that in the docs and we explain how to make a registry backup manually if needed. Few things I'd like to add to the summary:
|
I agree, thanks!
From a safety standpoint, I like this idea, I'm not sure how feasible this is during runtime as it might mean locking up the whole
I believe we should only support the Honestly, in a correct use-case I have a hard time picturing any reason to have the "old" and "new" inputs running at the same time. We cannot really enforce it at runtime, but we can document well the risk of having the "old" and "new" inputs running at the same time is data duplication. |
Filestream "take over" mode
The "take over" mode was originally created for a single use-case – migrating state (current file offsets) from a
log
input to afilestream
input.The implementation is currently contained by this package #42624
Short description of how it works:
filestream
that hastake_over: true
in its configurationlog
inputslog
input state we try to match its file path againstpaths
of any offilestream
inputs withtake_over: true
filestream
state for this matched file converting all the data from thelog
input statelog
input states that matched anyfilestream
withtake_over: true
At the moment it has some limitations due to the placement of this implementation: dynamically created inputs don't perform the state migration despite
take_over: true
is set. This means that running Filebeat under the agent that communicates input changes or running Filbeat with the autodiscover mode enabled would not migrate the states.This issue was created to address this problem #36777
A new use-case
Due to the recent development we found a new migration use-case similar to the existing
take_over
mode – migrating the state (file offsets) from onefilestream
input to another after an ID change.The current solution introduces a separate migration logic for just this use-case and introduces a new configuration parameter
previous_ids
#42624Proposed alternative
From the UX perspective I think it would be better to have only one way how we migrate the state between inputs – the already existing
take_over: true
mode.For that reason we should consolidate the effort of working on #36777 and #42472 into one single solution:
take_over
logic into thefilestream
implementationfilestream
->filestream
migration (due to an ID change) exactly likelog
->filestream
migration matching by paths (we might need additional checks like inode/fingerprint and reset the offset on mismatch)cc @belimawr @cmacknz
The text was updated successfully, but these errors were encountered: