Your work is very impressive! I want to clarify: when evaluating clip-level retrieval, are the candidates all clips in the test set, or only those from the same video?
Your work is very impressive!
I want to clarify: when evaluating clip-level retrieval, are the candidates all clips in the test set, or only those from the same video?