Cannot reproduce 0.3s inference time

Hi, thank you for this amazing work!

I am benchmarking the inference speed but noticed that the per-image inference time on an **A100** GPU is around **940ms**, at **float32** precision, which is more than 2× the number reported in the paper (**341ms on V100**). 

I am wondering if there are any specific optimizations or settings I might have missed that could explain this discrepancy.

```python
inference_time = []
for i in range(20):
    torch.cuda.synchronize()
    start_time = time.time()
    prediction = model.infer(image)
    torch.cuda.synchronize()
    end_time = time.time()
    inference_time.append(end_time - start_time)
average_time = sum(inference_time) / len(inference_time)
print(f"Inference time: {average_time:.4f}s")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot reproduce 0.3s inference time #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot reproduce 0.3s inference time #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions