A high-performance translation library powered by state-of-the-art models. Faster Translate offers optimized inference using CTranslate2 and vLLM backends, providing an easy-to-use interface for applications requiring efficient and accurate translations.
- High-performance inference using CTranslate2 and vLLM backends
- Seamless integration with Hugging Face models
- Flexible API for single sentence, batch, and large-scale translation
- Dataset translation with direct Hugging Face integration
- Multi-backend support for both traditional (CTranslate2) and LLM-based (vLLM) models
- Text normalization for improved translation quality
pip install faster-translate
pip install faster-translate[vllm]
pip install faster-translate[all]
from faster_translate import TranslatorModel
# Initialize with a pre-configured model
translator = TranslatorModel.from_pretrained("banglanmt_bn2en")
# Translate a single sentence
english_text = translator.translate_single("āĻĻā§āĻļā§ āĻŦāĻŋāĻĻā§āĻļāĻŋ āĻāĻŖ āĻ¨āĻŋāĻ¯āĻŧā§ āĻāĻāĻ¨ āĻŦā§āĻļ āĻāĻ˛ā§āĻāĻ¨āĻž āĻšāĻā§āĻā§āĨ¤")
print(english_text)
# Translate a batch of sentences
bengali_sentences = [
"āĻĻā§āĻļā§ āĻŦāĻŋāĻĻā§āĻļāĻŋ āĻāĻŖ āĻ¨āĻŋāĻ¯āĻŧā§ āĻāĻāĻ¨ āĻŦā§āĻļ āĻāĻ˛ā§āĻāĻ¨āĻž āĻšāĻā§āĻā§āĨ¤",
"āĻ°āĻžāĻ¤ āĻ¤āĻŋāĻ¨āĻāĻžāĻ° āĻĻāĻŋāĻā§ āĻāĻžāĻāĻāĻžāĻŽāĻžāĻ˛ āĻ¨āĻŋāĻ¯āĻŧā§ āĻā§āĻ˛āĻŋāĻ¸ā§āĻ¤āĻžāĻ¨ āĻĨā§āĻā§ āĻĒā§āĻ°āĻžāĻ¨ āĻĸāĻžāĻāĻžāĻ° āĻļā§āĻ¯āĻžāĻŽāĻŦāĻžāĻāĻžāĻ°ā§āĻ° āĻāĻĄāĻŧāĻ¤ā§ āĻ¯āĻžāĻā§āĻāĻŋāĻ˛ā§āĻ¨ āĻ˛āĻŋāĻāĻ¨ āĻŦā§āĻ¯āĻžāĻĒāĻžāĻ°ā§āĨ¤"
]
translations = translator.translate_batch(bengali_sentences)
# Using a CTTranslate2-based model
ct2_translator = TranslatorModel.from_pretrained("banglanmt_bn2en")
# Using a vLLM-based model
vllm_translator = TranslatorModel.from_pretrained("bangla_qwen_en2bn")
# Load a specific model from Hugging Face
translator = TranslatorModel.from_pretrained(
"sawradip/faster-translate-banglanmt-bn2en-t5",
normalizer_func="buetnlpnormalizer"
)
Translate an entire dataset with a single function call:
translator = TranslatorModel.from_pretrained("banglanmt_en2bn")
# Translate the entire dataset
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
batch_size=16
)
# Translate specific subsets
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
subset_name=["google"],
batch_size=16
)
# Translate a portion of the dataset
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
subset_name="alt",
batch_size=16,
translation_size=0.5 # Translate 50% of the dataset
)
Push translated datasets directly to Hugging Face:
translator.translate_hf_dataset(
"sawradip/bn-translation-mega-raw-noisy",
subset_name="alt",
batch_size=16,
push_to_hub=True,
token="your_huggingface_token",
save_repo_name="your-username/translated-dataset"
)
When working with challenging datasets, you can use the verification_mode="no_checks"
parameter:
# Using vLLM backend with relaxed dataset verification
vllm_translator = TranslatorModel.from_pretrained("bangla_qwen_en2bn")
# Translate dataset with relaxed verification checks
vllm_translator.translate_hf_dataset(
"difficult/dataset-with-formatting-issues",
batch_size=16,
# The verification_mode parameter is automatically set to "no_checks"
# to handle datasets with formatting inconsistencies
)
translate_hf_dataset
supports numerous parameters for fine-grained control:
# Full parameter example
translator.translate_hf_dataset(
dataset_repo="example/dataset", # HuggingFace dataset repository
subset_name="subset", # Optional dataset subset
split=["train", "validation"], # Dataset splits to translate (default: ["train"])
columns=["text", "instructions"], # Columns to translate
batch_size=32, # Number of texts per batch
token="hf_token", # HuggingFace token for private datasets
translation_size=0.7, # Translate 70% of the dataset
start_idx=100, # Start from the 100th example
end_idx=1000, # End at the 1000th example
output_format="json", # Output format
output_name="translations.json", # Output file name
push_to_hub=True, # Push translated dataset to HF Hub
save_repo_name="username/translated-data" # Repository name for upload
)
## đ Supported Models
| Model ID | Source Language | Target Language | Backend | Description |
|----------|----------------|----------------|---------|-------------|
| `banglanmt_bn2en` | Bengali | English | CTranslate2 | BanglaNMT model from BUET |
| `banglanmt_en2bn` | English | Bengali | CTranslate2 | BanglaNMT model from BUET |
| `bangla_mbartv1_en2bn` | English | Bengali | CTranslate2 | MBart-based translation model |
| `bangla_qwen_en2bn` | English | Bengali | vLLM | Qwen-based translation model |
## đ ī¸ Advanced Configuration
### Custom Sampling Parameters for vLLM Models
```python
from vllm import SamplingParams
# Create custom sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=512
)
# Initialize translator with custom parameters
translator = TranslatorModel.from_pretrained(
"bangla_qwen_en2bn",
sampling_params=sampling_params
)
This project is licensed under the MIT License - see the LICENSE file for details.
If you use Faster Translate in your research, please cite:
@software{faster_translate,
author = {Sawradip Saha and Contributors},
title = {Faster Translate: High-Performance Machine Translation Library},
url = {https://github.com/sawradip/faster-translate},
year = {2024},
}