Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CouchDb memory leak when log level is set to debug #9826

Open
dianabarsan opened this issue Mar 5, 2025 · 1 comment
Open

CouchDb memory leak when log level is set to debug #9826

dianabarsan opened this issue Mar 5, 2025 · 1 comment
Assignees
Labels
Type: Performance Make something faster

Comments

@dianabarsan
Copy link
Member

Describe the performance issue
A pattern emerged in one deployment that had a cluster of 3 CouchDb nodes, version 3.3.3 (medic version 4.15), where one of the CouchDb nodes would start consuming alarming amounts of RAM (peak reached at above 200 GB), up until the point that the node was evicted by Kubernetes.
This persisted after numerous restarts.

This turned out to be caused by setting log level on the troublesome node to debug, while the other nodes had the default CHT log level of info.

I have replicated this behavior locally, using a clustered CouchDb same version, newest version AND a vanilla CouchDb latest in single node.

This appears to occur ONLY if the logs are followed, which seems to be the case in our infrastructure, as the logs are streamed to Loki for observability purposes.
It also appears to be linked to having multiple validate_doc_update functions (which might not be directly correlated, just through the fact that they generate more log entries).

Describe the improvement you'd like
As this appears to be an issue in CouchDb itself.

  1. Document that debug log level should not be used in production, unless for limited time.
  2. Determine whether this is related to our validate_doc_update functions specifically, or any vdu function would yield the same result.
  3. Follow up through CouchDb official channels to get a fix, or at least an answer of whether this is expected behavior.

To Reproduce
Steps to record the performance metrics:

  1. Launch single node or clustered CHT core (any recent version).
  2. Set CouchDb config log -> level to debug on one node.
  3. Tail that node's output.
  4. Start any document generating script (I used test-data-generator to create batches of 1_000_000 documents).
  5. Run docker stats <your container> to see the gradual memory footprint increase

Environment

  • Instance: local, production, vanilla
  • App: CouchDb
  • Version: My guess is that any Couch v3 will display this behavior. this means CHT > 4.4 . I have not tested on CouchDb v2.

Additional context
Observed by @mrjones-plip in production. Thank you for your diligence!!

@dianabarsan dianabarsan added the Type: Performance Make something faster label Mar 5, 2025
@dianabarsan dianabarsan self-assigned this Mar 5, 2025
@mrjones-plip
Copy link
Contributor

Observed by @mrjones-plip in production

Noting this occurred Muso Mali's new 4.x cluster in EKS - see private repo issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Make something faster
Projects
None yet
Development

No branches or pull requests

2 participants