A compact, production‑style MT system that trains a Transformer on HF datasets, exports to ONNX, serves low‑latency inference via FastAPI, and ships a C++ client on ONNX Runtime, this was done as a project for Machine Learning class.
End-to-end machine translation capstone project: train a Transformer model on multilingual parallel corpora (e.g., OPUS Books or WMT), export encoder/decoder to ONNX, serve via FastAPI (GPU/CPU), and run C++ inference with ONNX Runtime.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt-
Default (auto‑downloaded by code): OPUS Books — Hugging Face
-
Alternatives (also auto‑downloaded via HF):
-
Original sources (FYI):
- OPUS portal: https://opus.nlpl.eu/
- WMT tasks: https://www.statmt.org/wmt23/ (see “Shared Tasks” pages for older years)
You select datasets via CLI flags (
--datasetand--lang_pair). No manual download needed; HF Datasets handles caching.
set -euo pipefail
python -m src.tokenize \
--dataset opus_books \
--lang_pair en-de \
--vocab_size 16000 \
--output models/spmset -euo pipefail
python -m src.train \
--dataset opus_books \
--lang_pair en-de \
--sp_model models/spm/en-de.model \
--save_dir models/transformer_en-de \
--epochs 5 \
--batch_size 128 \
--max_len 128set -euo pipefail
python -m src.export_onnx \
--checkpoint models/transformer_en-de/checkpoint.pt \
--sp_model models/spm/en-de.model \
--output_dir onnx \
--max_len 128Tip: For CPU‑only boxes, install
onnxruntimeinstead ofonnxruntime-gpu— everything else works unchanged.
python -m src.infer_python \
--checkpoint models/transformer_en-de/checkpoint.pt \
--sp_model models/spm/en-de.model \
--src "Hello world!" --max_len 64python -m src.evaluate \
--checkpoint models/transformer_en-de/checkpoint.pt \
--sp_model models/spm/en-de.model \
--dataset opus_books --lang_pair en-de \
--n_samples 200 --max_len 64set -euo pipefail
cp -n .env.example .env || true
uvicorn api.server:app --host 0.0.0.0 --port 8000curl -X POST http://localhost:8000/translate \
-H "Content-Type: application/json" \
-d '{"text":"Hello world!","lang_pair":"en-de","beam":4,"max_len":64}'# Install sentencepiece dev libs (Debian/Ubuntu):
sudo apt-get install -y libsentencepiece-dev
# Download ONNX Runtime (GPU or CPU) that matches your CUDA (if GPU).
# Releases: https://github.com/microsoft/onnxruntime/releases
export ONNXRUNTIME_DIR=/path/to/onnxruntime-linux-x64-gpu-<ver>
cmake -S cpp -B build -DONNXRUNTIME_DIR=$ONNXRUNTIME_DIR
cmake --build build -j./build/mt_infer \
--spm models/spm/en-de.model \
--encoder onnx/encoder.onnx \
--decoder onnx/decoder.onnx \
--src "Hello world!" --beam 4 --max_len 64If you want a single copy‑paste to do tokenizer → train → export → serve:
# === run_all.sh ===
set -euo pipefail
# 1) Tokenizer
python -m src.tokenize --dataset opus_books --lang_pair en-de --vocab_size 16000 --output models/spm
# 2) Train
python -m src.train --dataset opus_books --lang_pair en-de \
--sp_model models/spm/en-de.model --save_dir models/transformer_en-de \
--epochs 5 --batch_size 128 --max_len 128
# 3) Export ONNX
python -m src.export_onnx --checkpoint models/transformer_en-de/checkpoint.pt \
--sp_model models/spm/en-de.model --output_dir onnx --max_len 128
# 4) Serve API
cp -n .env.example .env || true
uvicorn api.server:app --host 0.0.0.0 --port 8000- Dataset switch: Use
--dataset wmt14 --lang_pair en-de(orwmt16,wmt17). - Special tokens:
pad=0, unk=1, bos=2, eos=3are set in SentencePiece training and used by the model. - ONNX export: dynamic axes for
B,S,Tallow variable lengths at inference. - CPU‑only: swap
onnxruntime-gpu→onnxruntimeinrequirements.txtand you’re good. - HF cache: datasets are cached under
~/.cache/huggingface/datasetsby default.