Production-ready SMS spam detection project built from a deep learning notebook and hardened for portfolio demonstrations, local macOS development (Metal), and VPS deployment (CPU now, NVIDIA-ready later).
- Classifies SMS text messages as
hamorspam. - Provides a deployable API (
/healthz,/predict) via FastAPI. - Supports runtime profiles:
cpu(VPS-safe default)nvidia(CUDA-ready profile)metal(Apple Silicon local profile)
- Keeps a deep learning path (TensorFlow training + exported model artifacts).
- Includes a fallback keyword adapter for deterministic smoke testing and resilient startup.
The notebook is the original deep learning experiment and contains:
-
Data loading and EDA
- Loads the AT&T SMS dataset from public S3.
- Inspects null values and class distribution.
-
Text preprocessing
- Cleans messages (lowercasing, character filtering).
- Removes stop words and lemmatizes text.
-
Tokenization and sequence preparation
- Builds a tokenizer and encodes text.
- Pads sequences to fixed length for neural input.
-
Deep learning model training
- Embedding + global pooling + dense layers.
- Early stopping and TensorBoard logging.
-
Evaluation and visualization
- Validation loss/accuracy curves.
- Confusion matrix for ham/spam quality.
This repository keeps that notebook as the experiment artifact and adds production-oriented code around it.
src/spam_detector/domain/- Pure business logic (text normalization, prediction entity).
src/spam_detector/application/- Use cases and ports (model inference contract).
src/spam_detector/adapters/- API adapter (FastAPI), ML adapters (TensorFlow and keyword fallback).
src/spam_detector/composition_root/- Runtime wiring (
MODEL_BACKEND,MODEL_RUNTIME,MODEL_PATH).
- Runtime wiring (
Dependency flow: adapters -> application -> domain.
src/spam_detector/
domain/
application/
adapters/
composition_root/
scripts/
train_tensorflow.py
predict_cli.py
smoke_api.py
requirements/
base.txt
cpu.txt
nvidia.txt
metal.txt
dev.txt
tests/
Dockerfile.cpu
Dockerfile.nvidia
docker-compose.yml
python3.13 -m venv .venv313
.venv313/bin/python -m pip install -r requirements/dev.txt.venv313/bin/python -m pytestPYTHONPATH=src MODEL_BACKEND=keyword MODEL_RUNTIME=cpu .venv313/bin/python -m uvicorn spam_detector.main:app.venv313/bin/python scripts/smoke_api.py --base-url http://127.0.0.1:8000Train and export TensorFlow model artifacts:
PYTHONPATH=src .venv313/bin/python scripts/train_tensorflow.py --epochs 10 --output-dir artifactsThen run inference with TensorFlow backend:
PYTHONPATH=src MODEL_BACKEND=tensorflow MODEL_RUNTIME=cpu MODEL_PATH=artifacts/model.keras .venv313/bin/python -m uvicorn spam_detector.main:appdocker compose --profile cpu up --build -d
python3 scripts/smoke_api.py --base-url http://127.0.0.1:8000docker compose --profile nvidia up --build -d
python3 scripts/smoke_api.py --base-url http://127.0.0.1:8001Apple Metal acceleration is available in local macOS Python environments (requirements/metal.txt) and is not a Linux Docker runtime feature.
- Environment-based configuration via
.env.example. - Local-only security notes/checklists kept under ignored
.local/. - Generated logs and artifacts ignored by git.
- Non-root Docker users and health checks.
- Dependency audit command included (
pip-audit).
make test- run unit tests.make lint- run lint checks.make run- start local API.make smoke- run API smoke tests.make train- train and export model.make audit- run vulnerability audit.