Evaluate zero shot image classification

Zero-shot image classification predicts a label for an image without any task-specific fine-tuning. A dual-encoder model (CLIP-family, SigLIP) produces an image embedding and a set of per-class text embeddings, then picks the class whose text embedding has the highest cosine similarity with the image. The class vocabulary is defined at inference time, not at training time — hence *zero-shot*.

This issue is to track the evaluation work of this kind of model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluate zero shot image classification #422

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Evaluate zero shot image classification #422

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions