Skip to content

Conversation

bilallamal07
Copy link

Summary

Fixes critical issue where OpenAI fine-tuning jobs fail during post-training safety evaluations (refusals_v3) due to sensitive content in product descriptions.

Problem

When running the Week 6 Day 5 fine-tuning exercise, users encounter this error:

Error while running moderation eval refusals_v3 for snapshot 
ft:gpt-4o-mini-2024-07-18:personal:pricer:CQxxxx
Error while running eval for category hate/threatening

Root Cause: The Amazon product dataset contains items with sensitive keywords (weapon, knife, tactical, combat, etc.) that trigger OpenAI's post-training safety checks.

Solution

1. Updated Notebook (day5.ipynb)

Added check_moderation() function that:

  • Implements two-stage filtering (keyword pre-filter + OpenAI Moderation API)
  • Provides detailed reporting of flagged items
  • Returns clean items ready for fine-tuning

2. Standalone Scripts

  • fix_moderation.py: Batch filtering script with 25+ sensitive keywords
  • test_moderation.py: JSONL verification utility

3. Documentation

  • DAY5_MODERATION_FIX_README.md: PR-focused documentation
  • MODERATION_FIX_README.md: Technical deep-dive

Results

Before Fix

  • Training: 200 examples
  • Validation: 50 examples
  • Status: ❌ Failed during post-training moderation

After Fix

  • Training: 190 examples (10 filtered)
  • Validation: 48 examples (2 filtered)
  • Status: ✅ Successfully completed and deployed

Testing

Verified with successful fine-tuning job:

  • Job ID: ftjob-moQGns3ajsS5UWIxxxxx
  • Model: ft:gpt-4o-mini-2024-07-18:personal:pricer:CQUNxxxx
  • Confirmed via OpenAI completion email

Benefits

  1. Prevents users from wasting time/money on failed fine-tuning jobs
  2. Demonstrates best practices for OpenAI safety evaluations
  3. Provides reusable tools for content filtering
  4. No breaking changes to original notebook structure

Files Changed

  • ✏️ week6/day5.ipynb - Added moderation function
  • week6/fix_moderation.py - New filtering script
  • week6/test_moderation.py - New verification utility
  • 📚 week6/DAY5_MODERATION_FIX_README.md - New documentation
  • 📚 week6/MODERATION_FIX_README.md - New technical docs

Compatibility

  • ✅ Python 3.8+
  • ✅ OpenAI Python SDK v1.0+
  • ✅ No breaking changes
  • ✅ Works with W&B integration

Community Contribution - Week 6 Day 5 Moderation Fix by @bilallamal07

…ning

Fixes critical issue where fine-tuning jobs fail during post-training safety
evaluations due to sensitive content in product descriptions.

Changes:
- Add check_moderation() function to day5.ipynb for content filtering
- Implement two-stage filtering (keyword pre-filter + OpenAI Moderation API)
- Create standalone fix_moderation.py script for batch processing
- Add test_moderation.py utility for JSONL verification
- Include comprehensive documentation in DAY5_MODERATION_FIX_README.md
  and MODERATION_FIX_README.md

Results:
- Training examples: 190 (10 filtered from 200)
- Validation examples: 48 (2 filtered from 50)
- Fine-tuning jobs now pass refusals_v3 safety evaluation
- Successfully deployed model: ft:gpt-4o-mini-2024-07-18:personal:pricer

This contribution prevents users from encountering moderation failures and
provides reusable tools for content filtering in fine-tuning workflows.
@bilallamal07 bilallamal07 changed the title fix(week6): Add comprehensive moderation filtering for OpenAI fine-tuning fix(week6): Add comprehensive moderation filtering to resolve fine-tuning eval errors Oct 14, 2025
@ed-donner
Copy link
Owner

Oh gosh - would you be OK to move this to community-contributions folder? I'm grateful to have this change, and I will make this update to the main repo at some point, but in the meantime it's best not to affect the main repo where possible..

@bilallamal07
Copy link
Author

Oh gosh - would you be OK to move this to community-contributions folder? I'm grateful to have this change, and I will make this update to the main repo at some point, but in the meantime it's best not to affect the main repo where possible..

Hi, Ed
Thanks for pointing this out. The PR submitted outside the community contribution process was unintentional. I’ll ensure all future updates align with the community contribution guidelines moving forward.

I appreciate your support and guidance!
Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants