Prompt-injection detection using three classifiers: a TF-IDF logistic regression baseline, a pretrained DeBERTa-v3 model, and a fine-tuned version of it. Includes a FastAPI backend and React frontend for live inference.
From the repo root (requires uv):
uv syncCUDA-enabled PyTorch is pinned in
pyproject.toml. Use CPU wheels if you don't have an NVIDIA GPU.
Model checkpoints are not included in the repo. You need to either train them (see below) or obtain them separately and place them in Models/:
| File | Description |
|---|---|
Models/best_checkpoint_TF_IDF.pt |
TF-IDF logistic regression weights |
Models/tfidf_vectorizer.pkl |
Fitted TF-IDF vectorizer |
Models/finetuned_deberta-v2_best.pt |
Fine-tuned DeBERTa weights |
The main experiment notebook. Covers:
- Loading the dataset (
data/final_dataset/) - Evaluating the pretrained
protectai/deberta-v3-base-prompt-injection-v2model - Fine-tuning it on the combined dataset
- Gradient-based token attribution analysis
Run all cells in order. The fine-tuned checkpoint is saved to Models/finetuned_deberta-v2_best.pt.
Trains a logistic regression classifier on TF-IDF features. Run all cells to produce:
Models/best_checkpoint_TF_IDF.ptModels/tfidf_vectorizer.pkl
Serves all three models (TF-IDF, pretrained DeBERTa, fine-tuned DeBERTa) with gradient-based explanations.
cd Backend
pip install -r requirements.txt
uvicorn main:app --reloadAPI runs at http://localhost:8000. Endpoints:
GET /health— checks if models are loadedPOST /predict—{"text": "your prompt here"}
Requires all model files in
Models/to be present before starting.
cd Frontend
npm install
npm run devOpens at http://localhost:5173. Make sure the backend is running first.