feat: add legal and scam training datasets for safety model by Fadma1234 · Pull Request #54 · Resilient-Labs/multilingual-ai-document-assistant

Fadma1234 · 2026-04-13T23:33:43Z

Feature Summary

What does this PR change?
Adds two labeled training datasets for the Safety model under data/safety/.

Why was this change made?
The Safety model needs labeled examples of both legitimate legal documents and scam documents to learn how to differentiate between them for red flag detection.
What is the code meant to do?
Provides training data for the Safety model. Legitimate legal documents are labeled likely_legitimate and real fraud emails impersonating legal/government entities are labeled likely_scam. Both follow the schema defined in app/api/safety/system-prompt.md.

Feature Team / Lane

Team #: (1–5)
Team 2
DevOps Lane: (if applicable)

Type of Change

Testing

How was this tested?
Automated Testing:
N/A — data files only, no code changes

Automated Testing

Unit tests added or updated
Integration tests added or updated
Existing tests pass locally
CI pipeline passes

Manual Testing

Files validated locally against safety schema
JSON structure verified

Screenshots (if UI changes)

Attach screenshots or screen recordings here if the PR includes UI changes.

Risks / Edge Cases

Scam dataset is email-based so some records may be shorter than typical legal documents
Category assignment is keyword-based and may need refinement once model is tested

Environment Variables Added or Changed

None

Checklist

[x ] Lint passes
Type check passes
No console logs remain
Deployment preview verified

vercel · 2026-04-13T23:33:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
multilingual-ai-document-assistant	Ready	Preview, Comment	Apr 13, 2026 11:33pm

aidan-diaz · 2026-04-14T02:24:04Z

This looks good! Possible recommendation to add the training data to the .gitignore once training is finished, as downloading large files can cause lag/slower git operations + performance. No actual features are being modified, so everything in the app still works exactly the same as before. Nice work!

feat: add legal and scam training datasets for safety model

686c826

Fadma1234 requested a review from rakimdevcraig April 13, 2026 23:35

aidan-diaz requested review from aidan-diaz and removed request for rakimdevcraig April 14, 2026 02:07

aidan-diaz merged commit 1437fdd into dev Apr 14, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add legal and scam training datasets for safety model#54

feat: add legal and scam training datasets for safety model#54
aidan-diaz merged 1 commit intodevfrom
feature/safety-training-data

Fadma1234 commented Apr 13, 2026

Uh oh!

vercel bot commented Apr 13, 2026

Uh oh!

aidan-diaz commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Fadma1234 commented Apr 13, 2026

Feature Summary

Feature Team / Lane

Type of Change

Testing

Automated Testing

Manual Testing

Screenshots (if UI changes)

Risks / Edge Cases

Environment Variables Added or Changed

Checklist

Uh oh!

vercel bot commented Apr 13, 2026

Uh oh!

aidan-diaz commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants