Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tensorrt backend #33

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from
Draft

Add a tensorrt backend #33

wants to merge 22 commits into from

Conversation

BestDriverCN
Copy link

@BestDriverCN BestDriverCN commented Apr 28, 2021

Add BaseTensorRTBackend that provide support for TensorRT models

@apockill
Copy link
Contributor

I'll add the rest of my comments once there's a clear story for mutli-GPU support.

@BryceBeagle BryceBeagle marked this pull request as draft April 28, 2021 21:54
self.context = self.trt_engine.create_execution_context()
# create buffers for inference
self.buffers = {}
for batch_size in range(1, self.trt_engine.max_batch_size + 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving a comment here to remind me:

We need to do some memory measurement to figure out if all of these buffers are necessary. I wonder if allocating a buffer for Batch-Size [1, 2, 5, 10] or other combinations might be better.

THings to test:

  1. How many tensorrt models can the NX hold?
  2. How much extra memory does this allocate (relating to Use an s3 bucket to store large files instead of Git LFS #1)
  3. What's the speed performance if we do [1, 2, 10] vs [1, 2, 5, 10], vs [1, 2, 3, 4, 5, 6,...10]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question we'll have to figure out: Should this be configurable via the init?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some tests to figure how much memory is needed for those buffers. Another thought is that if we don't get performance improvement with a larger batch size, we don't have to do that. Based on my tests, larger batch size will improve the inference time by 10% but lower the preprocessing performance, the overall performance is even a little lower than a small batch size.

self.ctx.pop()
return final_outputs

def _prepare_post_process(self):
Copy link
Contributor

@apockill apockill Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think that are too many constants and GridNet specific functions here, and it might be easier to make a separate class specifically for parsing GridNet bounding boxes.

For now, let's clean up the rest of the code first, then discuss how that would work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants are only necessary for detectors, maybe we need another parameter like is_detector in the constructor to indicate if this capsule a detector or classifier?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can check if these constants exist before we call the post process function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I'm thinking that this is super duper specific to GridNet detectors particularly. Maybe we can just offer a function that for parsing GridNet detector outputs, and name it as such.

class GridNetParser:
   def __init__(parameters):
     ...
   def parse_detection_results(prediction):
     ...

class BaseTensorRTBackend:
   ...

The benefit would be to separate all of these GridNet specific parameters out of the BaseTensorRTBackend 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, we should have separate parsers for different architectures.

outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream

def do_inference(self, bindings: List[int], inputs: List[HostDeviceMem], outputs: List[HostDeviceMem],
Copy link
Contributor

@apockill apockill Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def do_inference(self, bindings: List[int], inputs: List[HostDeviceMem], outputs: List[HostDeviceMem],
def _do_inference(self, bindings: List[int],
inputs: List[HostDeviceMem],
outputs: List[HostDeviceMem],
stream: cuda.Stream,
batch_size: int = 1) -> List[List[float]]:

def batch_predict(self, input_data_list: List[Any]) -> List[Any]:
task_size = len(input_data_list)
curr_index = 0
while curr_index < task_size:
Copy link
Contributor

@apockill apockill Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic may need to be revisited if we decide not to have buffers [0->10], and instead have combinations of [1, 2, 5, 10], for example

@BestDriverCN
Copy link
Author

@apockill I resolved most of your comments but the post-process stuff. The code is tested to be working and the performance is almost the same. I'll continue to work on the GridParser stuff. In the meantime, you can take another look.

out_lists = [out_array.tolist() for out_array in out_array_by_batch]
batch_outputs.append(out_lists)
final_outputs = list(zip(*batch_outputs))
final_outputs = [list(item) for item in final_outputs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to cast each item as a list? After zip it's held as a tuple

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to match the original type hit. I can also change the type hit instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah, just change the type hint. Tuples are cheaper and faster anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants