For training your newly released checkpoint supporting semantic level segmentation, you added a special token "[semantic]" before the input prompt in the training data. I was wondering what difference this token makes? You have not modified your model architecture, which gives 1-1 mapping between your prediction and your prompt. The training strategy for referring seg datasets and semantics seg datasets remain the same, except for this token. Why would adding a "[semantic]" token improve your model ability? Would training without it yield similar performance?
Thank you very much in advance!
For training your newly released checkpoint supporting semantic level segmentation, you added a special token "[semantic]" before the input prompt in the training data. I was wondering what difference this token makes? You have not modified your model architecture, which gives 1-1 mapping between your prediction and your prompt. The training strategy for referring seg datasets and semantics seg datasets remain the same, except for this token. Why would adding a "[semantic]" token improve your model ability? Would training without it yield similar performance?
Thank you very much in advance!