Office Hours: April 1, 2026 #58
andrewmchang
started this conversation in
Office Hours
Replies: 1 comment
-
NotesAttendees/intros
Community Q&AEval tool from Ariel
Questions/comments? Youth AI use cases?
Nik's use of CoPEBuilding golden data sets (from Reddit, crawling for hate speech), seeing a lot of variation. Would like to see how to reduce this variation. For example, given a policy on hate speech, tiny variations (like the last paragraph ending with a comma versus a period) could measurably change the outputs. Running the same prompt 30 times, do you get the same outputs? Suggestions to try a "jury" of models approach; more expensive, but can get more consistent results. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Our next office hours are happening Wednesday, April 1, 2026 at 1630–1700 UTC!
Learn how T&S peers are using AI for safety workflows, ask for support in implementing open safety models, and connect directly with RMC model partners. We'll also discuss how you can contribute to ongoing projects that the RMC is working on!
Proposed Agenda
Introduction
Q&A / Show and Tell
Admin:
What's Next:
Beta Was this translation helpful? Give feedback.
All reactions