-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default log store backend to WAL and allow disabling verification #21700
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few docs ❓ and minor suggestions, but otherwise LGTM! 🚀
agent/consul/server.go
Outdated
if s.config.LogStoreConfig.Backend == LogStoreBackendDefault && !boltFileExists { | ||
if (s.config.LogStoreConfig.Backend == LogStoreBackendDefault || s.config.LogStoreConfig.Backend == LogStoreBackendWAL) && !boltFileExists { | ||
s.config.LogStoreConfig.Backend = LogStoreBackendWAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~ Should we consider moving the original
if s.config.LogStoreConfig.Backend == LogStoreBackendDefault && !boltFileExists {
s.config.LogStoreConfig.Backend = LogStoreBackendWAL
}
bit up above the rest of this if
block, and just check explicitly for WAL (not default) after?
Main thought that crossed my mind is we're treating default and WAL as equivalent in these checks once we get past the BoltDB detection gate, so normalizing in one place is less error-prone in case of future changes and separates the defaulting from the business logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking we could simplify the checks following the first defaulting block, so that we aren't repeating stuff like "default or WAL" and "!boltDB". Maybe something like this? (also switches the warning to "using BoltDB" since "ignoring 'wal'" might be confusing when default is used)
- Take a snapshot prior to testing. | ||
- Monitor Consul server metrics and logs, and set an alert on specific log events that occur when WAL is enabled. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information. | ||
- Enable WAL in a pre-production environment and run it for a several days before enabling it in production. | ||
WAL LogStore is now enabled by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still ignoring config if there's a BoltDB file found - should we call that out here (new installs only) similar to the main doc, and keep some instructions to transition existing servers if desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boruszak Can you please check the wording in here?
## Enable log verification | ||
|
||
You must enable log verification on all voting servers in Enterprise and all servers in CE because the leader writes verification checkpoints. | ||
|
||
1. On each voting server, add the following to the server's configuration file: | ||
|
||
```hcl | ||
raft_logstore { | ||
verification { | ||
enabled = true | ||
interval = "60s" | ||
} | ||
} | ||
``` | ||
|
||
1. Restart the server to apply the changes. The `consul reload` command is not sufficient to apply `raft_logstore` configuration changes. | ||
1. Run the `consul operator raft list-peers` command to wait for each server to become a healthy voter before moving on to the next. This may take a few minutes for large snapshots. | ||
|
||
When complete, the server's logs should contain verifier reports that appear like the following example: | ||
|
||
```log hideClipboard | ||
2023-01-31T14:44:31.174Z [INFO] agent.server.raft.logstore.verifier: verification checksum OK: elapsed=488.463268ms leaderChecksum=f15db83976f2328c rangeEnd=357802 rangeStart=298132 readChecksum=f15db83976f2328c | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this section on log verification still relevant info even when WAL is defaulted on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log verification is enabled by default now as part of WAL. The reasoning is that it have minimal impact but great benefits in case of bugs.
I will double check if it's documented as part of the logstore config properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be deleting the instructions in the wal-logstore/enable
page. We should rephrase the language around "experimental" and callout that it's the default. But what if someone changes from WAL to BoltDB and then wants to change back?
Co-authored-by: Michael Zalimeni <[email protected]> Co-authored-by: Jeff Boruszak <[email protected]>
@boruszak I'm not sure I get your point 🤔. The aim of that page is to help users enable an experimental feature and make sure that it's working safely for them. Now that WAL is default that logic don't hold anymore as by making it default we implicitly admit to it being stable enough to make it default. I agree on your point about reverting from WAL to boltdb being important but it's a simple configuration change and don't need any extra steps. The only thing I can think of and that we should call-out, and we can probably document in that page is that:
So to sum it up, when changing the log store backend it's always recommended to:
WYT? |
@dhiaayachi For someone who already has BoltDB what steps would they have to take to make the switch? Do we need/have some migration docs somewhere? 🤔 |
@JadhavPoonam the procedure I highlighted in the comment above would be needed. We can add that as documentation. |
Co-authored-by: Michael Zalimeni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes LGTM! Will defer to you and docs team for remaining open questions about what to retain/drop/change.
@boruszak This is ready from code perspective, can you please check what changes are needed to the doc to get this into a merging state? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small changes in these suggestions , and I'm approving to unblock this PR on my end.
@dhiaayachi I proposed keeping the "Enable WAL" instructions assuming that if someone had WAL and reverted to BoltDB, they might want to go back to WAL. But from your comments, it sounds like what we actually need instead is a page that describes the steps to migrate a datacenter running BoltDB to one that runs WAL (using the steps you describe with the snapshot agent).
- Take a snapshot prior to testing. | ||
- Monitor Consul server metrics and logs, and set an alert on specific log events that occur when WAL is enabled. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information. | ||
- Enable WAL in a pre-production environment and run it for a several days before enabling it in production. | ||
WAL LogStore is now enabled by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAL LogStore is now enabled by default | |
The WAL LogStore backend is now enabled in Consul by default. |
@@ -7,30 +7,7 @@ description: >- | |||
|
|||
# Enable the experimental WAL LogStore backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Enable the experimental WAL LogStore backend | |
# Enable the WAL LogStore backend |
|
||
This topic provides an overview of the WAL (write-ahead log) LogStore backend. | ||
The WAL backend is an experimental feature. Refer to | ||
The WAL backend is now the default Consul LogStore when a boltdb database is not already in place. Refer to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WAL backend is now the default Consul LogStore when a boltdb database is not already in place. Refer to | |
The WAL backend is now the default LogStore for Consul server agents when a BoltDB database is not already in place. Refer to |
Co-authored-by: Jeff Boruszak <[email protected]>
Thank you for the review @boruszak but no need to rush this anymore as we decided not to include it in the next release. I think we should have a page that describe both:
I will try to add that page and ping you for a review. |
📄 Content ChecksUpdated: Tue, 15 Oct 2024 20:10:33 GMT Found 4 error(s)
|
Position | Description | Rule |
---|---|---|
1:1-1:1 |
Document does not have a page_title key in its frontmatter. Add a page_title key at the top of the document. |
ensure-valid-frontmatter |
1:1-1:1 |
Document does not have a description key in its frontmatter. Add a description key at the top of the document. |
ensure-valid-frontmatter |
1:1-1:1 |
This file is not present in the nav data file at data/docs-nav-data.json. Either add a path that maps to this file in the nav data or remove the file. If you want the page to exist but not be linked in the navigation, add a hidden property to the associated nav node. |
no-unlinked-pages |
Description
This PR change the default log store config to use WAL when starting with a fresh database. If a bolt db already exist bolt db will be used as a backend and a warning will be logged.
It also allow the log verifier, enabled by default, to be disabled.
Testing & Reproduction steps
Added tests to verify combination of configs.
PR Checklist