Summary
Create a skill documenting anti-hallucination training data generation and calibration workflows.
Context
ReAlign has AntiHallucinationGenerator and calibration training capabilities but they're not documented in skills.
Implementation Approach
Create .claude/skills/anti-hallucination-training.md documenting:
Anti-Hallucination Data Types
-
Appropriate Refusal
- Refuse when asked about unknown facts
- Example: "I don't have information about events after my training cutoff"
-
Hedged Uncertainty
- Express uncertainty for partial knowledge
- Example: "I'm not certain, but I believe..."
-
Confident Answers
- Answer confidently for verified facts
- Provides contrast for calibration
Generator Classes
| Class |
Purpose |
File |
AntiHallucinationGenerator |
Generate refusal examples |
src/realign/data/antihallucination_generator.py |
RefusalPreferenceGenerator |
Preference pairs for refusal |
src/realign/data/refusal_preference_generator.py |
CalibrationTrainer |
Confidence calibration |
src/realign/backends/olmo/calibration_trainer.py |
Confidence Levels
CONFIDENCE_LEVELS = {
"high": "I am confident that...",
"medium": "I believe that...",
"low": "I'm not certain, but...",
"uncertain": "I don't know..."
}
Training Data Mix
Recommended anti-hallucination mix:
- 40% - Appropriate refusals (unknown facts)
- 30% - Hedged uncertainty (partial knowledge)
- 20% - Confident answers (verified facts)
- 10% - Edge cases (ambiguous queries)
Commands
# Generate anti-hallucination data
python -m realign.data.antihallucination_generator \
--input sft.jsonl \
--output antihall.jsonl \
--refusal-ratio 0.4 \
--uncertainty-ratio 0.3
# Generate calibration data
python scripts/calibration_generator.py \
--count 5000 \
--output calibration.jsonl
Evaluation
# Check TruthfulQA score (target: ≥60%)
realign evaluate --model your-model --benchmark truthfulqa
# Check calibration ECE (target: ≤0.10)
realign evaluate --model your-model --benchmark calibration
Acceptance Criteria
Related
Summary
Create a skill documenting anti-hallucination training data generation and calibration workflows.
Context
ReAlign has
AntiHallucinationGeneratorand calibration training capabilities but they're not documented in skills.Implementation Approach
Create
.claude/skills/anti-hallucination-training.mddocumenting:Anti-Hallucination Data Types
Appropriate Refusal
Hedged Uncertainty
Confident Answers
Generator Classes
AntiHallucinationGeneratorsrc/realign/data/antihallucination_generator.pyRefusalPreferenceGeneratorsrc/realign/data/refusal_preference_generator.pyCalibrationTrainersrc/realign/backends/olmo/calibration_trainer.pyConfidence Levels
Training Data Mix
Recommended anti-hallucination mix:
Commands
Evaluation
Acceptance Criteria
Related