[Questions] 4.2 Missing khepri files after join_cluster #14748
Replies: 4 comments 10 replies
-
|
What I found so far is that during the according to the logs at this point khepri_db is not enabled on rabbit-2 hence it looks like at the time of reset all stable feature flags are disabled on rabbit-2 |
Beta Was this translation helpful? Give feedback.
-
|
The solution seems to be to not call |
Beta Was this translation helpful? Give feedback.
-
|
Per @dumbbell, the explicit However, |
Beta Was this translation helpful? Give feedback.
-
|
@gomoripeti I have updated the docs: rabbitmq/rabbitmq-website@90881de, rabbitmq/rabbitmq-website@7fc0833, rabbitmq/rabbitmq-website@6237ad3. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Community Support Policy
RabbitMQ version used
other (please specify) 4.2.0-rc.1
Erlang version used
27.2.x
Operating system (distribution) used
Linux/MacOS
How is RabbitMQ deployed?
Debian package
rabbitmq-diagnostics status output
N/A
Logs from node 2 aka rabbit-2
Logs from node 3 (if applicable, with sensitive values edited out)
N/A
rabbitmq.conf
N/A
Steps to deploy RabbitMQ cluster
Steps to reproduce the behavior in question
From a rabbitmq-server git checkout start to independent brokers
Then follow steps in https://www.rabbitmq.com/docs/clustering#creating to join rabbit-2 to rabbit-1
shell logs showing when the files disappear
advanced.config
N/A
Application code
Kubernetes deployment file
What problem are you trying to solve?
Tested on 4.2.0-rc.1. Let's call the remote node as rabbit-1 and the node that joins the remote node as rabbit-2
When creating a cluster manually (not by peer discovery, but by rabbitmqctl commands) after calling
join_cluserthe coordination directory will be missing some essentialrafiles (WAL, segment, dets). (See repro steps)After this the cluster will still work as khepri ra_server process has every data in memory, so it is possible to eg define a queue which will be added to khepri. However other operations will fail: eg WAL file force rollover on rabbit-2 will crash as the WAL file on disk is missing. We noticed this because manually removing rabbit-2 from the cluster (as described in https://www.rabbitmq.com/docs/clustering#removal-of-a-reachable-node) also fails.
rabbitmq logs added under "Logs from node 2"
Logs from a forced WAL roll-over
Beta Was this translation helpful? Give feedback.
All reactions