feat(skills): Create anti-hallucination-training skill for calibration and refusal data

## Summary

Create a skill documenting anti-hallucination training data generation and calibration workflows.

## Context

ReAlign has `AntiHallucinationGenerator` and calibration training capabilities but they're not documented in skills.

## Implementation Approach

Create `.claude/skills/anti-hallucination-training.md` documenting:

### Anti-Hallucination Data Types

1. **Appropriate Refusal**
   - Refuse when asked about unknown facts
   - Example: "I don't have information about events after my training cutoff"

2. **Hedged Uncertainty**
   - Express uncertainty for partial knowledge
   - Example: "I'm not certain, but I believe..."

3. **Confident Answers**
   - Answer confidently for verified facts
   - Provides contrast for calibration

### Generator Classes

| Class | Purpose | File |
|-------|---------|------|
| `AntiHallucinationGenerator` | Generate refusal examples | `src/realign/data/antihallucination_generator.py` |
| `RefusalPreferenceGenerator` | Preference pairs for refusal | `src/realign/data/refusal_preference_generator.py` |
| `CalibrationTrainer` | Confidence calibration | `src/realign/backends/olmo/calibration_trainer.py` |

### Confidence Levels

```python
CONFIDENCE_LEVELS = {
    "high": "I am confident that...",
    "medium": "I believe that...",
    "low": "I'm not certain, but...",
    "uncertain": "I don't know..."
}
```

### Training Data Mix

Recommended anti-hallucination mix:
- 40% - Appropriate refusals (unknown facts)
- 30% - Hedged uncertainty (partial knowledge)
- 20% - Confident answers (verified facts)
- 10% - Edge cases (ambiguous queries)

### Commands

```bash
# Generate anti-hallucination data
python -m realign.data.antihallucination_generator \
  --input sft.jsonl \
  --output antihall.jsonl \
  --refusal-ratio 0.4 \
  --uncertainty-ratio 0.3

# Generate calibration data
python scripts/calibration_generator.py \
  --count 5000 \
  --output calibration.jsonl
```

### Evaluation

```bash
# Check TruthfulQA score (target: ≥60%)
realign evaluate --model your-model --benchmark truthfulqa

# Check calibration ECE (target: ≤0.10)
realign evaluate --model your-model --benchmark calibration
```

## Acceptance Criteria

- [ ] All generator classes documented
- [ ] Confidence levels defined
- [ ] Training data mix ratios
- [ ] CLI commands provided
- [ ] Evaluation benchmarks documented

## Related

- Issue #305 (data-curation-workflow)
- Issue #306 (training-methods)
- Issue #307 (dpo-rlvr-generation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): Create anti-hallucination-training skill for calibration and refusal data #308

Summary

Context

Implementation Approach

Anti-Hallucination Data Types

Generator Classes

Confidence Levels

Training Data Mix

Commands

Evaluation

Acceptance Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Class	Purpose	File
`AntiHallucinationGenerator`	Generate refusal examples	`src/realign/data/antihallucination_generator.py`
`RefusalPreferenceGenerator`	Preference pairs for refusal	`src/realign/data/refusal_preference_generator.py`
`CalibrationTrainer`	Confidence calibration	`src/realign/backends/olmo/calibration_trainer.py`

feat(skills): Create anti-hallucination-training skill for calibration and refusal data #308

Description

Summary

Context

Implementation Approach

Anti-Hallucination Data Types

Generator Classes

Confidence Levels

Training Data Mix

Commands

Evaluation

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions