When using UperNet decoder with plain ViT decoder_scale_modules should be set to True to upscale the layers to simulate a "hierarchical output". This part can also be done in the LearnedInterpolateToPyramidal neck, in the tests for the Unet, which also needs this hierarchical output, this neck is used.
For this, I suggest deprecating scale_modules for the UperNetDecoder and recommend using the neck instead.
Also, we should consider the architecture for this neck as currently it only supports 4 layers and the last one is downscaled using maxpool, not sure if this is the best option and it might be interesting to look into it.
CC: @blumenstiel
When using
UperNetdecoder with plain ViTdecoder_scale_modulesshould be set toTrueto upscale the layers to simulate a "hierarchical output". This part can also be done in theLearnedInterpolateToPyramidalneck, in the tests for the Unet, which also needs this hierarchical output, this neck is used.For this, I suggest deprecating
scale_modulesfor theUperNetDecoderand recommend using the neck instead.Also, we should consider the architecture for this neck as currently it only supports 4 layers and the last one is downscaled using maxpool, not sure if this is the best option and it might be interesting to look into it.
CC: @blumenstiel