Skip to content

Evaluate zero shot image classification #422

Description

@zhenchaoni

Zero-shot image classification predicts a label for an image without any task-specific fine-tuning. A dual-encoder model (CLIP-family, SigLIP) produces an image embedding and a set of per-class text embeddings, then picks the class whose text embedding has the highest cosine similarity with the image. The class vocabulary is defined at inference time, not at training time — hence zero-shot.

This issue is to track the evaluation work of this kind of model.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions