Skip to content

Conversation

vladd-bit
Copy link
Member

Bug: when processing docs for a long amount of time memory usage begins to increase > 7-8gb usage for de-id models after processing a 20-30k after 3+ hours, this bug is reproducible for normal annotation models too.

Fix: try to limit the lru cache to maximum 1 instance for the processor and other objs.

@tomolopolis
Copy link
Member

Task linked: CU-869ah00vt fix memory leak (service side)

@vladd-bit vladd-bit requested a review from mart-r September 17, 2025 16:16
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this doesn't fix the underlying issue.

We expect to have 1 Settings instance. And as such, we expect to have 1 MedCATProcessor instance.
But what this PR does is allow for multiple instances of MedCATProcessor while only caching one at a time.
Not only does this mean that there could still be multiple instances around (if they're still in use), but it also means that each one must have a slightly different Settings instance (because if they were identical they should hash to the same thing and you should still get the same processor).
Now why you'd have multiple different Settings instances is a little bizarre to me. Because the only reason I can think of is if os.environ gets manually changed.

PS:
Open to discussion on leaving this in the sate you've proposed. But I'm just a little concerned regarding the underlying issue(s).



@lru_cache
@lru_cache(maxsize=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is kind of counteproductive. It may (effectively) fix the memory leak. But it doesn't actually fix the underlying issue. And as far as I know, that is the fact that we expect only 1 instance of MedCatProcessor. Since the Settings class is frozen, one should always be using the same instance, and as such, getting the same instance of cached MedCATprocessor.

So I think that the issue is in how FastAPI uses the dependency and by doing so seems to omit the lru_cache when claling the get_settings method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just the easy way to enforce the mcat proc singleton, we can switch to the lifespan manager then and see how it behaves

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing though - it doesn't enforce a singleton. You'll still have multiple instances. They just won't be cached. And I understand the caching was the culprit for the memory leak, but having multiple instances doesn't seem to be the intended design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that working out how there are multiple instances is the main thing.

In general I'm surprised by this being the fix - so I'm not really feeling this is likely the right answer ?

I literally followed the fastapi docs for settings here https://fastapi.tiangolo.com/advanced/settings/#lru-cache-technical-details. Only diff I see is I imported Settings directly from config instead of import config.

I'd be really surprised here if we have some situation that nobody else has encountered, like FastAPI + torch is standard. The only rare thing I'd think we do is that I set it to use gunicorn just to keep that prexisting torch thread code that exists - so maybe theres an issue somewhere there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard suggestion is we switch to unicorn (and turn the logging up to info on those lru_cache methods) and confirm if it still has the error. My hope is fastAPI copied code + torch + everything default = no issue. Could make a flag to in the startup script to switch between uvicorn and gunicorn.

Copy link
Member Author

@vladd-bit vladd-bit Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok clearly my misuse of the term 'singleton' was oblivious, but yes, the 'fix' proposed is hacky, so, update time Regarding gunicorn with uvicorn, we are using 1 worker anyways, it shouldn't be an issue unless there some specific bug we didnt discover yet, we can also try an alternative to uvicorn and just use Asgi middleware (but for now im not into this approach, I didn't test it yet), and now with the singleton mc-proc if there will still be memory leaks it will be from the CAT object itself. I will give it a test today.



@lru_cache
@lru_cache(maxsize=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the comment below for more detail.

Perhaps we can get away with something simpler for the settings singleton, i.e

_def_settings: Optional[Settings] = None

def get_settings() -> Settings:
    global _def_settings
    if _def_settings is None:
        _def_settings = Settings()
    return settings

That way the get_settings method will always return the same instance and thus the caching for get_medcat_processor should always result in the same instance since the argument is always the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, feels right to make it explicitly a global singleton. Would keep that log line in from before and really confirm it never gets created again

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we fully force a singleton settings instance, I fail to see how we could ever have multiple MedCATProcessor instances from the cached method.

But yes, keeping the log message does still make sense!

Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few duplicate lines.
And a few questions of relevance for the global settings.

But I think this does look better to me!

settings_singleton = settings


def get_global_settings() -> Settings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Doesn't seem to be called anywhere?

summary="MedCAT Service",
contact={
@asynccontextmanager
async def lifespan(app: FastAPI):
Copy link
Collaborator

@alhendrickson alhendrickson Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the more general comment:

Is this really the way to go?

I really don't understand how marts suggestion wouldn't have worked to make a single global settings object, and then stick with the lru_cache from before. I don't see how that could ever make a new Settings object and trigger the cache with a new input

Reason I'm really hesitant is that your change here is basically saying "Following FastAPI documentation causes memory leaks" which I dont think is correct. It's also saying "Dont use FastAPI dependencies" in this project, which also doesnt seem right - it's really not a unique project...

_def_settings: Optional[Settings] = None

def get_settings() -> Settings:
    global _def_settings
    if _def_settings is None:
        _def_settings = Settings()
    return settings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting FastAPI deps and guides are wrong or that the cache cant work. Since we already converted m-cat proc to singleton and got rid of the cache, I opted to do the same for settings (as per Mart's suggestion), thats all, but if we want to revert to cache for settings only thats fine. We cannot have both singleton and @lru_cache, that would not be consistent and would not add any value. Regarding the lifespan, I opted for it because we got explicit control over when and where the processor gets created ( there is the potential problem of having the model loaded as soon as the app starts though).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify - I mean, taking a step back, is it not possible to keep it all basically how it was before, just with the settings being explicitly global?

dependencies.py

_def_settings: Optional[Settings] = None

def get_settings() -> Settings:
    global _def_settings
    if _def_settings is None:
        _def_settings = Settings()
    return settings
...
@lru_cache
def get_medcat_processor(settings: Annotated[Settings, Depends(get_settings)]) -> MedCatProcessor:
    log.debug("Creating new Medcat Processsor using settings: %s", settings)
    return MedCatProcessor(settings)

Then there are no changes needed anywhere else (I think?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants