Add EoMT from ViT is Secretly an Image Segmentation Model

Hi, here to share a new image segmentation paper using ViT !

Paper : https://arxiv.org/abs/2503.19108
Code : https://github.com/tue-mps/eomt

This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.

It would be interesting to have in this library !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions