This repository contains an implementation of Learning-to-Rank methods based on the Transformer Encoder architecture. The project is designed for ranking documents by relevance to queries using deep neural networks.
The project implements approaches to the ranking task (Learning-to-Rank) using transformer architectures. The model is based on the Encoder architecture and supports various loss functions (pointwise, listwise, combined), enabling efficient training of models for document ranking.
- Transformer Encoder model for document ranking
- Multiple loss functions: Pointwise (Cross-Entropy), Listwise (ListNet), Combined Loss
- Comprehensive metric evaluation: NDCG@5, NDCG@10, NDCG (full), Recall@5, Recall@10, Recall (full), MRR
- Analysis utilities: inference time measurement, memory usage estimation
- Fine-tuning support for models
- Visualization of training results and comparison of different architectures
The model is a Transformer-based Encoder:
Input (num_docs Γ num_features)
β
Input Projection (Linear)
β
Transformer Blocks (Γ N layers)
ββ Multi-Head Self-Attention
ββ Residual Connection + Layer Norm
ββ Feed-Forward Network
ββ Residual Connection + Layer Norm
β
Output Layer (Linear β num_classes)
β
Scores (num_docs Γ num_classes)
d_model: Model dimension (default: 512)n_heads: Number of attention heads (default: 2-4)n_layers: Number of transformer blocks (default: 2)ffn_hidden: FFN hidden layer dimension (default: 512)input_dim: Input feature dimension (depends on dataset)output_dim: Number of relevance classes (default: 5)dropout_rate: Dropout coefficient (default: 0.15)
pip install torch torchvision torchaudio
pip install numpy pandas scikit-learn matplotlib
pip install thop # For FLOPs counting (optional)Data should be in pickle file format with the following structure:
fl_features: document featureslabels: relevance labels (0-4)query_id: query identifiers
Usage example:
from utils.preprocess import preprocess_data
train_data = preprocess_data(
file_path='path/to/train.pkl',
num_docs=140, # Maximum number of documents per query
which=0, # Dataset index (0 for train, -1 for test)
is_shuffle=True, # Whether to shuffle documents
device='cuda'
)from utils.Encoder_model import make_Encoder_model
model = make_Encoder_model(
d_model=512,
n_heads=2,
n_layers=2,
ffn_hidden=512,
input_dim=699, # Feature dimension
output_dim=5, # Number of classes
dropout_rate=0.15,
device='cuda'
)The main training pipeline is in Training and evaluation.ipynb:
from utils.train_eval_utils import train_eval
from utils.loss_mask_utils import Combined_Loss, create_mask
from sklearn.metrics import ndcg_score
loss_fn = Combined_Loss(theta=0.01, num_of_labels=5, distribution='polynomial', degree=2)
train_params = {
'train_loader': train_loader,
'model': model,
'optimizer': optimizer,
'loss_fn': loss_fn,
'num_epochs': 25,
'create_mask': create_mask,
'val_loader': val_loader,
'score_fn': ndcg_score,
'name': 'best_model'
}
losses, metrics = train_eval(**train_params)The train_eval function automatically computes multiple ranking metrics:
- NDCG@5: Normalized Discounted Cumulative Gain on top-5 documents
- NDCG@10: NDCG on top-10 documents
- NDCG (full): NDCG on all documents in the ranking
- Recall@5: Proportion of relevant documents found in top-5 results
- Recall@10: Proportion of relevant documents found in top-10 results
- Recall (full): Proportion of relevant documents found in the entire ranking
- MRR (Mean Reciprocal Rank): Average of the reciprocal ranks of the first relevant document for each query
All metrics are computed during validation and displayed in the console output. The metrics dictionary returned by train_eval contains lists of all metric values for each epoch, enabling detailed analysis of model performance over time.
To fine-tune an existing model, use finetune.ipynb:
from utils.preprocess import Dataset_for_finetune, preprocess_for_finetune
from utils.train_eval_utils import train_eval
from utils.loss_mask_utils import cross_entropy_for_finetune
# Load model
model = make_Encoder_model(**model_params)
state_dict = torch.load('path/to/model.pth')
model.load_state_dict(state_dict)
# Fine-tuning with new loss function
loss_fn = cross_entropy_for_finetune
# ... further configurationThe project supports several loss functions:
A classical approach treating ranking as classification by relevance classes.
from utils.loss_mask_utils import Cross_Entropy_point
loss_fn = Cross_Entropy_point(num_of_label=5)A listwise approach that considers the relevance distribution in the list.
from utils.loss_mask_utils import ListNet_Loss
loss_fn = ListNet_Loss(distribution='polynomial', degree=2)
# or
loss_fn = ListNet_Loss(distribution='softmax')A combination of pointwise and listwise losses.
from utils.loss_mask_utils import Combined_Loss
loss_fn = Combined_Loss(
theta=0.01, # Weight for listwise loss
num_of_labels=5,
distribution='polynomial',
degree=2
)The framework provides comprehensive evaluation metrics for ranking tasks:
-
NDCG (Normalized Discounted Cumulative Gain)
- Measures ranking quality considering position and relevance
- Computed at different cutoffs: @5, @10, and full ranking
-
Recall@k
- Measures the proportion of relevant documents retrieved in top-k results
- Useful for understanding coverage of relevant items
- Configurable relevance threshold (default: > 0.0)
-
MRR (Mean Reciprocal Rank)
- Measures the average reciprocal rank of the first relevant document
- Particularly useful when the position of the first relevant result matters
- Returns 0 if no relevant documents are found
You can customize the relevance threshold for Recall and MRR:
from utils.train_eval_utils import evaluate
# Evaluate with custom relevance threshold
avg_ndcg5, avg_ndcg10, avg_ndcg, avg_recall5, avg_recall10, avg_recall, avg_mrr = evaluate(
val_loader,
model,
ndcg_score,
create_mask,
relevance_threshold=0.5 # Documents with score > 0.5 are considered relevant
)Use inference_time.ipynb for model performance analysis:
- Static memory estimation (parameters, buffers)
- Peak memory usage during inference
- Forward pass execution time
from inference_time import estimate_inference_memory_static, measure_inference_peak_memory
# Static estimation
static_info = estimate_inference_memory_static(model)
print(static_info['pretty'])
# Peak usage during inference
memory_info = measure_inference_peak_memory(model, sample_input, warmup=5, steps=10)
print(memory_info['cuda_peak']['pretty'])The done pictures/ folder contains results from experiments with various:
- Model architectures
- Loss functions (pointwise, listwise, combined)
- Hyperparameters (dropout, polynomial degree)
- Datasets (Web10k, Istella)
from utils.loss_mask_utils import create_mask
mask = create_mask(input_tensor) # Boolean mask for documentsYou can compute metrics individually using utility functions:
from utils.train_eval_utils import compute_recall_at_k, compute_mrr
import numpy as np
# Example: Compute Recall@10
y_true = np.array([0.0, 1.0, 0.0, 1.0, 0.5, 0.0]) # Ground truth relevance
y_pred = np.array([0.1, 0.9, 0.2, 0.8, 0.7, 0.3]) # Predicted scores
recall_10 = compute_recall_at_k(y_true, y_pred, k=10, relevance_threshold=0.0)
print(f"Recall@10: {recall_10:.4f}")
# Example: Compute MRR
mrr = compute_mrr(y_true, y_pred, relevance_threshold=0.0)
print(f"MRR: {mrr:.4f}")The Dataset_for_transformer class creates a Dataset for PyTorch DataLoader:
from utils.preprocess import Dataset_for_transformer
from torch.utils.data import DataLoader
dataset = Dataset_for_transformer(preprocessed_data)
loader = DataLoader(dataset, batch_size=128, shuffle=True)