Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. #21

Open
lhz123 opened this issue Aug 9, 2024 · 7 comments

Comments

@lhz123
Copy link

lhz123 commented Aug 9, 2024

No description provided.

@lhz123 lhz123 changed the title After converting the SAM2 model to an ONNX , the inference results are significantly worse than the original model. After converting the SAM2 model to ONNX , the inference results are significantly worse than the original model. Aug 9, 2024
@cile98
Copy link

cile98 commented Aug 9, 2024

@lhz123 can you provide any comparison examples?

@vietanhdev
Copy link
Owner

Hi! :D
The inference flow is not identical to the original one from SAM2 official repo, including the dimension of the input images. Therefore, the results cannot be comparable.

@marwand
Copy link

marwand commented Aug 16, 2024

@vietanhdev I've noticed all of the converted SAM2 models output a mask in a 256x256 resolution. Is this configurable? Ideally I want it to be the same as the input resolution (1024x1024).

The reason 256 isn't good enough is that after upscaling to 1024, the edges are very rough and don't overlay perfectly with the source image. I've applied some basic post processing, but the result isn't very accurate, especially for small object/surfaces.

Does the original SAM model output masks in 256 res? What are the limitations that make the onnx version different from the pytorch one?

@cile98
Copy link

cile98 commented Aug 16, 2024

@vietanhdev I've noticed all of the converted SAM2 models output a mask in a 256x256 resolution. Is this configurable? Ideally I want it to be the same as the input resolution (1024x1024).

The reason 256 isn't good enough is that after upscaling to 1024, the edges are very rough and don't overlay perfectly with the source image. I've applied some basic post processing, but the result isn't very accurate, especially for small object/surfaces.

Does the original SAM model output masks in 256 res? What are the limitations that make the onnx version different from the pytorch one?

Pretty sure SAM1 also originally outputs them in 256x256 res and then upscales them

@ibaiGorordo
Copy link

ibaiGorordo commented Aug 17, 2024

@vietanhdev I recommend adding masks = F.interpolate(masks, (img_size[0], img_size[1]), mode="bilinear", align_corners=False) to the decoder to get smoother results than doing the upscale with opencv

here is the updated colab notebook for export: https://colab.research.google.com/drive/1tqdYbjmFq4PK3Di7sLONd0RkKS0hBgId?usp=sharing

@vietanhdev
Copy link
Owner

Hi @ibaiGorordo
Thank you for your great code! Could you help with a PR to this repo?

@marwand
Copy link

marwand commented Aug 17, 2024

@vietanhdev I recommend adding masks = F.interpolate(masks, (img_size[0], img_size[1]), mode="bilinear", align_corners=False) to the decoder to get smoother results than doing the upscale with opencv

here is the updated colab notebook for export: https://colab.research.google.com/drive/1tqdYbjmFq4PK3Di7sLONd0RkKS0hBgId?usp=sharing

This is great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants