Create workshop: Write Your Own Domain-Specific Benchmark #43

deanwampler · 2025-03-04T20:45:14Z

As part of the TSEI promotion plan, a Write Your Own Domain-Specific Benchmark workshop drills into and generalizes the RAG benchmark workshop #42.

How does someone create their own benchmark for their use cases or domain? The session will use the TSEI reference stack, including lm-evaluation-harness and unitxt, with candidate benchmark data, either hand-curated Q&A pairs or synthetic data generated with a teacher model. The session will demonstrate the basics of running the benchmark and interpreting the results.

The text was updated successfully, but these errors were encountered:

deanwampler added this to FA2: TSEI Tasks Mar 4, 2025

deanwampler converted this from a draft issue Mar 4, 2025

This was referenced Mar 4, 2025

Crate workshop: Running Evaluations #44

Open

Create workshop: Running Safety Benchmark Suites #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create workshop: Write Your Own Domain-Specific Benchmark #43

Create workshop: Write Your Own Domain-Specific Benchmark #43

deanwampler commented Mar 4, 2025 •

edited

Loading

Create workshop: Write Your Own Domain-Specific Benchmark #43

Create workshop: Write Your Own Domain-Specific Benchmark #43

Comments

deanwampler commented Mar 4, 2025 • edited Loading

deanwampler commented Mar 4, 2025 •

edited

Loading