-
Notifications
You must be signed in to change notification settings - Fork 371
Cpu memory optimization #3602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: moe-support
Are you sure you want to change the base?
Cpu memory optimization #3602
Conversation
a609cfd
to
85107c3
Compare
85107c3
to
53c1bcb
Compare
f147421
to
711446c
Compare
|
||
class TRTInterpreterResult(NamedTuple): | ||
serialized_engine: bytes | ||
engine: trt.ICudaEngine | bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@narendasan Should we set it to be engine when it is python runtime and serialized engine if it is cpp runtime? In this way we can do serialization in Interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can just be trt.ICudaEngine
and post processing for the cpp runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I would have the CPP runtime determine if the engine is live or serialized, and if its live serialize the engine.
|
||
return rt_cls( | ||
serialized_engine=interpreter_result.serialized_engine, | ||
return PythonTorchTensorRTModule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the rt_class still but there should be some preprocessing modifying the arguments
|
||
class TRTInterpreterResult(NamedTuple): | ||
serialized_engine: bytes | ||
engine: trt.ICudaEngine | bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can just be trt.ICudaEngine
and post processing for the cpp runtime
|
||
class TRTInterpreterResult(NamedTuple): | ||
serialized_engine: bytes | ||
engine: trt.ICudaEngine | bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I would have the CPP runtime determine if the engine is live or serialized, and if its live serialize the engine.
@needs_refit # type: ignore[misc] | ||
def _insert_engine_to_cache(self, hash_val: str, serialized_engine: bytes) -> None: | ||
def _insert_engine_to_cache(self, hash_val: str, engine: bytes) -> None: | ||
serialized_engine = engine.serialize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are doing this don't we end up paying the serialization cost again?
self.ctx.requires_output_allocator, | ||
) | ||
else: | ||
serialized_engine = cuda_engine.serialize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have the TRTInterpreter return a live engine in all cases
def __init__( | ||
self, | ||
serialized_engine: Optional[bytes] = None, | ||
cuda_engine: trt.ICudaEngine = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API should support both since uses may use this runtime independently, its just now we either support cuda_engine
or serialized_engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for the CPP runtime
Description
Delete the extra copy INetworkDefinition creates
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: