CU-869ah00vt Fix memory leak (mc-service) #141

vladd-bit · 2025-09-17T16:01:23Z

Bug: when processing docs for a long amount of time memory usage begins to increase > 7-8gb usage for de-id models after processing a 20-30k after 3+ hours, this bug is reproducible for normal annotation models too.

Fix: try to limit the lru cache to maximum 1 instance for the processor and other objs.

…medcat_service_rev1

…dels (docker) + deid bulk processing OOB anns.

…CU-869ah00vt

tomolopolis · 2025-09-17T16:01:28Z

Task linked: CU-869ah00vt fix memory leak (service side)

mart-r

IMHO this doesn't fix the underlying issue.

We expect to have 1 Settings instance. And as such, we expect to have 1 MedCATProcessor instance.
But what this PR does is allow for multiple instances of MedCATProcessor while only caching one at a time.
Not only does this mean that there could still be multiple instances around (if they're still in use), but it also means that each one must have a slightly different Settings instance (because if they were identical they should hash to the same thing and you should still get the same processor).
Now why you'd have multiple different Settings instances is a little bizarre to me. Because the only reason I can think of is if os.environ gets manually changed.

PS:
Open to discussion on leaving this in the sate you've proposed. But I'm just a little concerned regarding the underlying issue(s).

mart-r · 2025-09-18T10:50:05Z

medcat-service/medcat_service/dependencies.py



-@lru_cache
+@lru_cache(maxsize=1)


I feel like this is kind of counteproductive. It may (effectively) fix the memory leak. But it doesn't actually fix the underlying issue. And as far as I know, that is the fact that we expect only 1 instance of MedCatProcessor. Since the Settings class is frozen, one should always be using the same instance, and as such, getting the same instance of cached MedCATprocessor.

So I think that the issue is in how FastAPI uses the dependency and by doing so seems to omit the lru_cache when claling the get_settings method.

this was just the easy way to enforce the mcat proc singleton, we can switch to the lifespan manager then and see how it behaves

That's the thing though - it doesn't enforce a singleton. You'll still have multiple instances. They just won't be cached. And I understand the caching was the culprit for the memory leak, but having multiple instances doesn't seem to be the intended design.

I agree that working out how there are multiple instances is the main thing.

In general I'm surprised by this being the fix - so I'm not really feeling this is likely the right answer ?

I literally followed the fastapi docs for settings here https://fastapi.tiangolo.com/advanced/settings/#lru-cache-technical-details. Only diff I see is I imported Settings directly from config instead of import config.

I'd be really surprised here if we have some situation that nobody else has encountered, like FastAPI + torch is standard. The only rare thing I'd think we do is that I set it to use gunicorn just to keep that prexisting torch thread code that exists - so maybe theres an issue somewhere there?

Wildcard suggestion is we switch to unicorn (and turn the logging up to info on those lru_cache methods) and confirm if it still has the error. My hope is fastAPI copied code + torch + everything default = no issue. Could make a flag to in the startup script to switch between uvicorn and gunicorn.

ok clearly my misuse of the term 'singleton' was oblivious, but yes, the 'fix' proposed is hacky, so, update time Regarding gunicorn with uvicorn, we are using 1 worker anyways, it shouldn't be an issue unless there some specific bug we didnt discover yet, we can also try an alternative to uvicorn and just use Asgi middleware (but for now im not into this approach, I didn't test it yet), and now with the singleton mc-proc if there will still be memory leaks it will be from the CAT object itself. I will give it a test today.

mart-r · 2025-09-18T10:54:06Z

medcat-service/medcat_service/dependencies.py



-@lru_cache
+@lru_cache(maxsize=1)


Refer to the comment below for more detail.

Perhaps we can get away with something simpler for the settings singleton, i.e

_def_settings: Optional[Settings] = None def get_settings() -> Settings: global _def_settings if _def_settings is None: _def_settings = Settings() return settings

That way the get_settings method will always return the same instance and thus the caching for get_medcat_processor should always result in the same instance since the argument is always the same.

This makes sense to me, feels right to make it explicitly a global singleton. Would keep that log line in from before and really confirm it never gets created again

Yeah, if we fully force a singleton settings instance, I fail to see how we could ever have multiple MedCATProcessor instances from the cached method.

But yes, keeping the log message does still make sense!

mart-r

A few duplicate lines.
And a few questions of relevance for the global settings.

But I think this does look better to me!

medcat-service/medcat_service/main.py

mart-r · 2025-09-19T08:31:28Z

medcat-service/medcat_service/dependencies.py

+    settings_singleton = settings

+
+def get_global_settings() -> Settings:


Is this needed? Doesn't seem to be called anywhere?

medcat-service/medcat_service/dependencies.py

alhendrickson · 2025-09-19T08:45:21Z

medcat-service/medcat_service/main.py

-    summary="MedCAT Service",
-    contact={
+@asynccontextmanager
+async def lifespan(app: FastAPI):


As the more general comment:

Is this really the way to go?

I really don't understand how marts suggestion wouldn't have worked to make a single global settings object, and then stick with the lru_cache from before. I don't see how that could ever make a new Settings object and trigger the cache with a new input

Reason I'm really hesitant is that your change here is basically saying "Following FastAPI documentation causes memory leaks" which I dont think is correct. It's also saying "Dont use FastAPI dependencies" in this project, which also doesnt seem right - it's really not a unique project...

_def_settings: Optional[Settings] = None def get_settings() -> Settings: global _def_settings if _def_settings is None: _def_settings = Settings() return settings

I'm not suggesting FastAPI deps and guides are wrong or that the cache cant work. Since we already converted m-cat proc to singleton and got rid of the cache, I opted to do the same for settings (as per Mart's suggestion), thats all, but if we want to revert to cache for settings only thats fine. We cannot have both singleton and @lru_cache, that would not be consistent and would not add any value. Regarding the lifespan, I opted for it because we got explicit control over when and where the processor gets created ( there is the potential problem of having the model loaded as soon as the app starts though).

To clarify - I mean, taking a step back, is it not possible to keep it all basically how it was before, just with the settings being explicitly global?

dependencies.py

_def_settings: Optional[Settings] = None def get_settings() -> Settings: global _def_settings if _def_settings is None: _def_settings = Settings() return settings ... @lru_cache def get_medcat_processor(settings: Annotated[Settings, Depends(get_settings)]) -> MedCatProcessor: log.debug("Creating new Medcat Processsor using settings: %s", settings) return MedCatProcessor(settings)

Then there are no changes needed anywhere else (I think?)

We probably want this merged in at some point sooner rather than later.

I think we can all agree that the main point of this PR was to avoid the memory leak from and that this PR seems to accomplish that.

It sounds like @alhendrickson is concerned as to why the previous - seemingly standard - approach caused issues. And this may be worth looking into. But I think @vladd-bit 's point is that he opted for a different fix - one that gives more control over lifecycle management. So a fix + move forward.

So given the above, I think we should just move forward with and merge this PR. It does seem to fix the issue. And it seems to move forward in terms of lifecycle management.

Does that sound good to all parties?

alhendrickson · 2025-10-28T19:20:31Z

Hey for this one -

I've put the gunicorn config into it's own PR that hopefully easily goes in #201

I've then got another PR that adds just the global medcat processor on here https://github.com/CogStack/cogstack-nlp/pull/200/files. The issue I was really fixing was a threading issue (service breaks in k8s under load), but I think it is the same underlying problem - let me know if it's possible to test it for the memory leak too.

vladd-bit added 12 commits September 12, 2025 07:47

Removed old code + minor refactoring + added extra input handles to API.

a44703b

Merge branch 'main' of https://github.com/CogStack/cogstack-nlp into …

09a4f9f

…medcat_service_rev1

Lint/type fixes + pyproject settings.

7cd6cc5

Updated types.

bdaeaf7

MedCAT v2 version update to 2.1.0.

6428a16

Updated test scripts.

6a6eef4

Updated test for meta_ann.

db6a131

Merge branch 'main' of https://github.com/CogStack/cogstack-nlp into …

73d8c4b

…medcat_service_rev1

Bugfix: fixed app crash in bulk processing mode when loading large mo…

6dc73f8

…dels (docker) + deid bulk processing OOB anns.

Bugfix: memory leak when caching app processor state (CU-869ah00vt).

393d182

Undo state changes (Not Implemented yet).

f573b4d

Merge branch 'main' of https://github.com/CogStack/cogstack-nlp into …

6bd3fda

…CU-869ah00vt

Linting.

9493de4

vladd-bit requested a review from mart-r September 17, 2025 16:16

Added gunicorn settings to app env files.

1c75c68

mart-r requested changes Sep 18, 2025

View reviewed changes

vladd-bit added 4 commits September 18, 2025 15:53

Env file update + bash prod startup script update.

d40b261

Lifespan usage instead of cache for mc-processor.

9cbe7fd

Singleton implementation.

ea8f55d

Updated tests.

8d2947f

mart-r requested changes Sep 19, 2025

View reviewed changes

Cleanup.

4534172

alhendrickson reviewed Sep 19, 2025

View reviewed changes

This was referenced Oct 28, 2025

fix(medcat-service): Add thread safety to medcat processor dependency #200

Merged

feat(medcat-service): Add Gunicorn CLI args for max requests, jitter, and any other input from env #201

Merged

alhendrickson closed this in #200 Oct 29, 2025

alhendrickson reopened this Oct 29, 2025

		settings_singleton = settings


		def get_global_settings() -> Settings:



		@lru_cache
		@lru_cache(maxsize=1)



		@lru_cache
		@lru_cache(maxsize=1)

CU-869ah00vt Fix memory leak (mc-service) #141

Are you sure you want to change the base?

CU-869ah00vt Fix memory leak (mc-service) #141

Uh oh!

Conversation

vladd-bit commented Sep 17, 2025

Uh oh!

tomolopolis commented Sep 17, 2025

Uh oh!

mart-r left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vladd-bit Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mart-r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alhendrickson Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alhendrickson commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vladd-bit Sep 19, 2025 •

edited

Loading

alhendrickson Sep 19, 2025 •

edited

Loading