-
Notifications
You must be signed in to change notification settings - Fork 356
Open
Description
Hi, thank you for this amazing work!
I am benchmarking the inference speed but noticed that the per-image inference time on an A100 GPU is around 940ms, at float32 precision, which is more than 2× the number reported in the paper (341ms on V100).
I am wondering if there are any specific optimizations or settings I might have missed that could explain this discrepancy.
inference_time = []
for i in range(20):
torch.cuda.synchronize()
start_time = time.time()
prediction = model.infer(image)
torch.cuda.synchronize()
end_time = time.time()
inference_time.append(end_time - start_time)
average_time = sum(inference_time) / len(inference_time)
print(f"Inference time: {average_time:.4f}s")
Metadata
Metadata
Assignees
Labels
No labels