After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. #21

lhz123 · 2024-08-09T09:10:40Z

No description provided.

cile98 · 2024-08-09T10:00:29Z

@lhz123 can you provide any comparison examples?

vietanhdev · 2024-08-10T04:14:45Z

Hi! :D
The inference flow is not identical to the original one from SAM2 official repo, including the dimension of the input images. Therefore, the results cannot be comparable.

marwand · 2024-08-16T17:44:56Z

@vietanhdev I've noticed all of the converted SAM2 models output a mask in a 256x256 resolution. Is this configurable? Ideally I want it to be the same as the input resolution (1024x1024).

The reason 256 isn't good enough is that after upscaling to 1024, the edges are very rough and don't overlay perfectly with the source image. I've applied some basic post processing, but the result isn't very accurate, especially for small object/surfaces.

Does the original SAM model output masks in 256 res? What are the limitations that make the onnx version different from the pytorch one?

cile98 · 2024-08-16T17:47:27Z

@vietanhdev I've noticed all of the converted SAM2 models output a mask in a 256x256 resolution. Is this configurable? Ideally I want it to be the same as the input resolution (1024x1024).

The reason 256 isn't good enough is that after upscaling to 1024, the edges are very rough and don't overlay perfectly with the source image. I've applied some basic post processing, but the result isn't very accurate, especially for small object/surfaces.

Does the original SAM model output masks in 256 res? What are the limitations that make the onnx version different from the pytorch one?

Pretty sure SAM1 also originally outputs them in 256x256 res and then upscales them

ibaiGorordo · 2024-08-17T05:59:38Z

@vietanhdev I recommend adding masks = F.interpolate(masks, (img_size[0], img_size[1]), mode="bilinear", align_corners=False) to the decoder to get smoother results than doing the upscale with opencv

here is the updated colab notebook for export: https://colab.research.google.com/drive/1tqdYbjmFq4PK3Di7sLONd0RkKS0hBgId?usp=sharing

vietanhdev · 2024-08-17T06:05:02Z

Hi @ibaiGorordo
Thank you for your great code! Could you help with a PR to this repo?

marwand · 2024-08-17T12:41:55Z

@vietanhdev I recommend adding masks = F.interpolate(masks, (img_size[0], img_size[1]), mode="bilinear", align_corners=False) to the decoder to get smoother results than doing the upscale with opencv

here is the updated colab notebook for export: https://colab.research.google.com/drive/1tqdYbjmFq4PK3Di7sLONd0RkKS0hBgId?usp=sharing

This is great, thank you!

lhz123 changed the title ~~After converting the SAM2 model to an ONNX , the inference results are significantly worse than the original model.~~ After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. #21

After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. #21

lhz123 commented Aug 9, 2024

cile98 commented Aug 9, 2024

vietanhdev commented Aug 10, 2024

marwand commented Aug 16, 2024

cile98 commented Aug 16, 2024

ibaiGorordo commented Aug 17, 2024 •

edited

Loading

vietanhdev commented Aug 17, 2024

marwand commented Aug 17, 2024

After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. #21

After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. #21

Comments

lhz123 commented Aug 9, 2024

cile98 commented Aug 9, 2024

vietanhdev commented Aug 10, 2024

marwand commented Aug 16, 2024

cile98 commented Aug 16, 2024

ibaiGorordo commented Aug 17, 2024 • edited Loading

vietanhdev commented Aug 17, 2024

marwand commented Aug 17, 2024

ibaiGorordo commented Aug 17, 2024 •

edited

Loading