-
Notifications
You must be signed in to change notification settings - Fork 3
CU-869ah00vt Fix memory leak (mc-service) #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…medcat_service_rev1
…medcat_service_rev1
…dels (docker) + deid bulk processing OOB anns.
Task linked: CU-869ah00vt fix memory leak (service side) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this doesn't fix the underlying issue.
We expect to have 1 Settings
instance. And as such, we expect to have 1 MedCATProcessor
instance.
But what this PR does is allow for multiple instances of MedCATProcessor
while only caching one at a time.
Not only does this mean that there could still be multiple instances around (if they're still in use), but it also means that each one must have a slightly different Settings
instance (because if they were identical they should hash to the same thing and you should still get the same processor).
Now why you'd have multiple different Settings
instances is a little bizarre to me. Because the only reason I can think of is if os.environ
gets manually changed.
PS:
Open to discussion on leaving this in the sate you've proposed. But I'm just a little concerned regarding the underlying issue(s).
|
||
|
||
@lru_cache | ||
@lru_cache(maxsize=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is kind of counteproductive. It may (effectively) fix the memory leak. But it doesn't actually fix the underlying issue. And as far as I know, that is the fact that we expect only 1 instance of MedCatProcessor
. Since the Settings
class is frozen, one should always be using the same instance, and as such, getting the same instance of cached MedCATprocessor
.
So I think that the issue is in how FastAPI uses the dependency and by doing so seems to omit the lru_cache
when claling the get_settings
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was just the easy way to enforce the mcat proc singleton, we can switch to the lifespan manager then and see how it behaves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the thing though - it doesn't enforce a singleton. You'll still have multiple instances. They just won't be cached. And I understand the caching was the culprit for the memory leak, but having multiple instances doesn't seem to be the intended design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that working out how there are multiple instances is the main thing.
In general I'm surprised by this being the fix - so I'm not really feeling this is likely the right answer ?
I literally followed the fastapi docs for settings here https://fastapi.tiangolo.com/advanced/settings/#lru-cache-technical-details. Only diff I see is I imported Settings directly from config instead of import config.
I'd be really surprised here if we have some situation that nobody else has encountered, like FastAPI + torch is standard. The only rare thing I'd think we do is that I set it to use gunicorn just to keep that prexisting torch thread code that exists - so maybe theres an issue somewhere there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wildcard suggestion is we switch to unicorn (and turn the logging up to info on those lru_cache methods) and confirm if it still has the error. My hope is fastAPI copied code + torch + everything default = no issue. Could make a flag to in the startup script to switch between uvicorn and gunicorn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok clearly my misuse of the term 'singleton' was oblivious, but yes, the 'fix' proposed is hacky, so, update time Regarding gunicorn with uvicorn, we are using 1 worker anyways, it shouldn't be an issue unless there some specific bug we didnt discover yet, we can also try an alternative to uvicorn and just use Asgi middleware (but for now im not into this approach, I didn't test it yet), and now with the singleton mc-proc if there will still be memory leaks it will be from the CAT object itself. I will give it a test today.
|
||
|
||
@lru_cache | ||
@lru_cache(maxsize=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the comment below for more detail.
Perhaps we can get away with something simpler for the settings singleton, i.e
_def_settings: Optional[Settings] = None
def get_settings() -> Settings:
global _def_settings
if _def_settings is None:
_def_settings = Settings()
return settings
That way the get_settings
method will always return the same instance and thus the caching for get_medcat_processor
should always result in the same instance since the argument is always the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, feels right to make it explicitly a global singleton. Would keep that log line in from before and really confirm it never gets created again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we fully force a singleton settings instance, I fail to see how we could ever have multiple MedCATProcessor
instances from the cached method.
But yes, keeping the log message does still make sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few duplicate lines.
And a few questions of relevance for the global settings.
But I think this does look better to me!
settings_singleton = settings | ||
|
||
|
||
def get_global_settings() -> Settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Doesn't seem to be called anywhere?
summary="MedCAT Service", | ||
contact={ | ||
@asynccontextmanager | ||
async def lifespan(app: FastAPI): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the more general comment:
Is this really the way to go?
I really don't understand how marts suggestion wouldn't have worked to make a single global settings object, and then stick with the lru_cache from before. I don't see how that could ever make a new Settings object and trigger the cache with a new input
Reason I'm really hesitant is that your change here is basically saying "Following FastAPI documentation causes memory leaks" which I dont think is correct. It's also saying "Dont use FastAPI dependencies" in this project, which also doesnt seem right - it's really not a unique project...
_def_settings: Optional[Settings] = None
def get_settings() -> Settings:
global _def_settings
if _def_settings is None:
_def_settings = Settings()
return settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting FastAPI deps and guides are wrong or that the cache cant work. Since we already converted m-cat proc to singleton and got rid of the cache, I opted to do the same for settings (as per Mart's suggestion), thats all, but if we want to revert to cache for settings only thats fine. We cannot have both singleton and @lru_cache, that would not be consistent and would not add any value. Regarding the lifespan, I opted for it because we got explicit control over when and where the processor gets created ( there is the potential problem of having the model loaded as soon as the app starts though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify - I mean, taking a step back, is it not possible to keep it all basically how it was before, just with the settings being explicitly global?
dependencies.py
_def_settings: Optional[Settings] = None
def get_settings() -> Settings:
global _def_settings
if _def_settings is None:
_def_settings = Settings()
return settings
...
@lru_cache
def get_medcat_processor(settings: Annotated[Settings, Depends(get_settings)]) -> MedCatProcessor:
log.debug("Creating new Medcat Processsor using settings: %s", settings)
return MedCatProcessor(settings)
Then there are no changes needed anywhere else (I think?)
Bug: when processing docs for a long amount of time memory usage begins to increase > 7-8gb usage for de-id models after processing a 20-30k after 3+ hours, this bug is reproducible for normal annotation models too.
Fix: try to limit the lru cache to maximum 1 instance for the processor and other objs.