Deep Learning NLP Spam Detector

Production-ready SMS spam detection project built from a deep learning notebook and hardened for portfolio demonstrations, local macOS development (Metal), and VPS deployment (CPU now, NVIDIA-ready later).

What this project does

Classifies SMS text messages as ham or spam.
Provides a deployable API (/healthz, /predict) via FastAPI.
Supports runtime profiles:
- cpu (VPS-safe default)
- nvidia (CUDA-ready profile)
- metal (Apple Silicon local profile)
Keeps a deep learning path (TensorFlow training + exported model artifacts).
Includes a fallback keyword adapter for deterministic smoke testing and resilient startup.

Notebook contents (`main.ipynb`)

The notebook is the original deep learning experiment and contains:

Data loading and EDA
- Loads the AT&T SMS dataset from public S3.
- Inspects null values and class distribution.
Text preprocessing
- Cleans messages (lowercasing, character filtering).
- Removes stop words and lemmatizes text.
Tokenization and sequence preparation
- Builds a tokenizer and encodes text.
- Pads sequences to fixed length for neural input.
Deep learning model training
- Embedding + global pooling + dense layers.
- Early stopping and TensorBoard logging.
Evaluation and visualization
- Validation loss/accuracy curves.
- Confusion matrix for ham/spam quality.

This repository keeps that notebook as the experiment artifact and adds production-oriented code around it.

Architecture (Hexagonal)

src/spam_detector/domain/
- Pure business logic (text normalization, prediction entity).
src/spam_detector/application/
- Use cases and ports (model inference contract).
src/spam_detector/adapters/
- API adapter (FastAPI), ML adapters (TensorFlow and keyword fallback).
src/spam_detector/composition_root/
- Runtime wiring (MODEL_BACKEND, MODEL_RUNTIME, MODEL_PATH).

Dependency flow: adapters -> application -> domain.

Repository structure

src/spam_detector/
  domain/
  application/
  adapters/
  composition_root/
scripts/
  train_tensorflow.py
  predict_cli.py
  smoke_api.py
requirements/
  base.txt
  cpu.txt
  nvidia.txt
  metal.txt
  dev.txt
tests/
Dockerfile.cpu
Dockerfile.nvidia
docker-compose.yml

Quickstart

1) Local dev (Python 3.13)

python3.13 -m venv .venv313
.venv313/bin/python -m pip install -r requirements/dev.txt

2) Run tests

.venv313/bin/python -m pytest

3) Run API locally (keyword fallback)

PYTHONPATH=src MODEL_BACKEND=keyword MODEL_RUNTIME=cpu .venv313/bin/python -m uvicorn spam_detector.main:app

4) Smoke test API

.venv313/bin/python scripts/smoke_api.py --base-url http://127.0.0.1:8000

Deep learning training path

Train and export TensorFlow model artifacts:

PYTHONPATH=src .venv313/bin/python scripts/train_tensorflow.py --epochs 10 --output-dir artifacts

Then run inference with TensorFlow backend:

PYTHONPATH=src MODEL_BACKEND=tensorflow MODEL_RUNTIME=cpu MODEL_PATH=artifacts/model.keras .venv313/bin/python -m uvicorn spam_detector.main:app

Docker deployment

CPU profile (recommended for current VPS)

docker compose --profile cpu up --build -d
python3 scripts/smoke_api.py --base-url http://127.0.0.1:8000

NVIDIA profile (for future GPU hosts)

docker compose --profile nvidia up --build -d
python3 scripts/smoke_api.py --base-url http://127.0.0.1:8001

Metal note

Apple Metal acceleration is available in local macOS Python environments (requirements/metal.txt) and is not a Linux Docker runtime feature.

Security posture improvements

Environment-based configuration via .env.example.
Local-only security notes/checklists kept under ignored .local/.
Generated logs and artifacts ignored by git.
Non-root Docker users and health checks.
Dependency audit command included (pip-audit).

Commands

make test - run unit tests.
make lint - run lint checks.
make run - start local API.
make smoke - run API smoke tests.
make train - train and export model.
make audit - run vulnerability audit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning NLP Spam Detector

What this project does

Notebook contents (`main.ipynb`)

Architecture (Hexagonal)

Repository structure

Quickstart

1) Local dev (Python 3.13)

2) Run tests

3) Run API locally (keyword fallback)

4) Smoke test API

Deep learning training path

Docker deployment

CPU profile (recommended for current VPS)

NVIDIA profile (for future GPU hosts)

Metal note

Security posture improvements

Commands

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
requirements		requirements
scripts		scripts
src/spam_detector		src/spam_detector
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.nvidia		Dockerfile.nvidia
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.ipynb		main.ipynb
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Deep Learning NLP Spam Detector

What this project does

Notebook contents (main.ipynb)

Architecture (Hexagonal)

Repository structure

Quickstart

1) Local dev (Python 3.13)

2) Run tests

3) Run API locally (keyword fallback)

4) Smoke test API

Deep learning training path

Docker deployment

CPU profile (recommended for current VPS)

NVIDIA profile (for future GPU hosts)

Metal note

Security posture improvements

Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Notebook contents (`main.ipynb`)

Packages