You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the performance issue
A pattern emerged in one deployment that had a cluster of 3 CouchDb nodes, version 3.3.3 (medic version 4.15), where one of the CouchDb nodes would start consuming alarming amounts of RAM (peak reached at above 200 GB), up until the point that the node was evicted by Kubernetes.
This persisted after numerous restarts.
This turned out to be caused by setting log level on the troublesome node to debug, while the other nodes had the default CHT log level of info.
I have replicated this behavior locally, using a clustered CouchDb same version, newest version AND a vanilla CouchDb latest in single node.
This appears to occur ONLY if the logs are followed, which seems to be the case in our infrastructure, as the logs are streamed to Loki for observability purposes.
It also appears to be linked to having multiple validate_doc_update functions (which might not be directly correlated, just through the fact that they generate more log entries).
Describe the improvement you'd like
As this appears to be an issue in CouchDb itself.
Document that debug log level should not be used in production, unless for limited time.
Determine whether this is related to our validate_doc_update functions specifically, or any vdu function would yield the same result.
Follow up through CouchDb official channels to get a fix, or at least an answer of whether this is expected behavior.
To Reproduce
Steps to record the performance metrics:
Launch single node or clustered CHT core (any recent version).
Set CouchDb config log -> level to debug on one node.
Tail that node's output.
Start any document generating script (I used test-data-generator to create batches of 1_000_000 documents).
Run docker stats <your container> to see the gradual memory footprint increase
Environment
Instance: local, production, vanilla
App: CouchDb
Version: My guess is that any Couch v3 will display this behavior. this means CHT > 4.4 . I have not tested on CouchDb v2.
Additional context
Observed by @mrjones-plip in production. Thank you for your diligence!!
The text was updated successfully, but these errors were encountered:
Describe the performance issue
A pattern emerged in one deployment that had a cluster of 3 CouchDb nodes, version 3.3.3 (medic version 4.15), where one of the CouchDb nodes would start consuming alarming amounts of RAM (peak reached at above 200 GB), up until the point that the node was evicted by Kubernetes.
This persisted after numerous restarts.
This turned out to be caused by setting log level on the troublesome node to
debug
, while the other nodes had the default CHT log level ofinfo
.I have replicated this behavior locally, using a clustered CouchDb same version, newest version AND a vanilla CouchDb latest in single node.
This appears to occur ONLY if the logs are followed, which seems to be the case in our infrastructure, as the logs are streamed to Loki for observability purposes.
It also appears to be linked to having multiple
validate_doc_update
functions (which might not be directly correlated, just through the fact that they generate more log entries).Describe the improvement you'd like
As this appears to be an issue in CouchDb itself.
debug
log level should not be used in production, unless for limited time.To Reproduce
Steps to record the performance metrics:
log -> level
todebug
on one node.test-data-generator
to create batches of 1_000_000 documents).docker stats <your container>
to see the gradual memory footprint increaseEnvironment
Additional context
Observed by @mrjones-plip in production. Thank you for your diligence!!
The text was updated successfully, but these errors were encountered: