A 3D object retrieval system for room elements and shapes, designed for the SHREC 2025 challenge.
This project is a comprehensive 3D object retrieval system designed for the SHREC 2025 challenge. The primary goal is to retrieve relevant 3D models of room elements and furniture from a large database based on natural language text queries.
The system provides an end-to-end solution, from data processing and feature extraction to a web-based interface for interactive retrieval and visualization.
The methodology is divided into three main stages: Data Preparation, Feature Extraction, and Retrieval.
3D models are first converted into a format suitable for feature extraction. This involves two steps:
- Multi-View Rendering: Each 3D object is rendered from multiple viewpoints to capture its geometric and textural information from different angles. This process generates a set of 2D images for each 3D model.
- Point Cloud Generation: The rendered images and their corresponding depth maps are used to generate colored point clouds, which serve as the primary 3D representation for our feature extraction model. Each point in the cloud has both spatial coordinates (X, Y, Z) and color information (R, G, B).
We employ a deep learning model from the OpenShape project to learn a joint embedding space for 3D shapes and text.
-
Model Architecture: The system uses a point cloud-based encoder, such as PointBERT or PointNeXt, to transform a 3D point cloud
$P$ into a high-dimensional feature vector$v_P \in \mathbb{R}^d$ . Simultaneously, a pre-trained text encoder (from OpenCLIP) is used to convert a text query$T$ into a feature vector$v_T \in \mathbb{R}^d$ of the same dimension. -
Training: The model is trained using a contrastive learning approach. Given a set of
$N$ pairs of point clouds and their corresponding text descriptions${(P_i, T_i)}_{i=1}^N$ , the objective is to minimize the contrastive loss. The similarity between a point cloud$P_i$ and a text query$T_j$ is measured by the cosine similarity of their embeddings:
The training objective is to maximize the similarity of corresponding pairs
where
The retrieval process is facilitated by a vector database and a web-based user interface.
-
Indexing: The extracted feature vectors for all 3D models in the database are indexed and stored in a Qdrant vector database for efficient similarity search.
-
Querying: When a user enters a text query, it is encoded into a feature vector using the same text encoder used during training. This query vector is then used to search the Qdrant database to find the 3D models with the most similar feature vectors, based on cosine similarity.
-
Ranking: The retrieved models are ranked based on their similarity scores and presented to the user through a web interface.
The performance of the retrieval system is evaluated using a combination of similarity metrics. The final score for a retrieved object is a weighted combination of multiple similarity measures:
-
Cosine Similarity: The primary metric for retrieval is the cosine similarity between the query text embedding and the 3D shape embedding in the learned joint feature space.
-
Chamfer Distance: For more fine-grained shape similarity assessment, we use the Chamfer Distance. It measures the distance between two point clouds,
$P_1$ and$P_2$ , and is defined as:
To improve the accuracy of the Chamfer Distance, point clouds are first aligned using Principal Component Analysis (PCA) and the Iterative Closest Point (ICP) algorithm.
- Combined Score: The final ranking score is a weighted sum of the cosine similarity and a score derived from the Chamfer distance. The
advanced_scoringfunction inRetrievalSystem/score/calculate_score.pyimplements this logic:
where
Here is the performance of our team (Ai-Yahh) in comparison to others on the SHREC 2025 challenge leaderboard.
| Team Name | R@1 | R@5 | R@10 | MRR |
|---|---|---|---|---|
| Stubborn_Strawberries | 0.94 | 1.00 | 1.00 | 0.97 |
| Ai-Yahh | 0.92 | 1.00 | 1.00 | 0.96 |
| MealsRetrieval | 0.92 | 1.00 | 1.00 | 0.96 |
| BUCCI_GANG | 0.90 | 1.00 | 1.00 | 0.95 |
| NoResources | 0.88 | 1.00 | 1.00 | 0.93 |
- Docker (version 20.10 or higher)
- Docker Compose (version 1.29 or higher)
- NVIDIA Docker Runtime (for GPU support)
# Clone the repository
git clone <repository-url>
cd ROOMELSA-SHREC2025
# Build the Docker image
docker-compose build
# Start all services (Qdrant + ROOMELSA)
docker-compose up -d
# Access the container shell
docker-compose exec roomelsa bash
# Inside the container, activate the conda environment
source activate roomelsaThe Docker setup includes:
- ROOMELSA container: All dependencies for rendering, training, and inference
- Qdrant container: Vector database for retrieval system
- Persistent volumes: For data, pretrained models, and outputs
- GPU support: Automatic GPU access via NVIDIA Docker runtime
# Check if PyTorch detects GPU
docker-compose exec roomelsa bash -c "source activate roomelsa && python -c 'import torch; print(torch.cuda.is_available())'"
# Check Qdrant status
curl http://localhost:6333/All commands should be executed inside the Docker container:
# Access the container
docker-compose exec roomelsa bash
source activate roomelsa- Organize your 3D models in the
data/directory (automatically mounted in the container). - Use the
Render/Object.ipynbnotebook to create a JSON file that catalogs your dataset. - Run the
Render/main.pyscript to perform multi-view rendering of the 3D models:python Render/main.py --data_root ./data --json_path ./data/object.json --output_dir ./output
- Convert the rendered outputs to point clouds using the
ObjectToNpy.pyscript:python ObjectToNpy.py --input_dir ./output --output_dir ./data/point_clouds
- Prepare your dataset of point clouds and corresponding text embeddings as described in
OpenShape_code/OUR_GUIDELINE.md. - Configure your training in
OpenShape_code/src/configs/custom_train.yaml. - Start training by running (inside the Docker container):
python OpenShape_code/src/main.py --config OpenShape_code/src/configs/custom_train.yaml
The Qdrant service is automatically started via Docker Compose and accessible at http://qdrant:6333 (inside container) or http://localhost:6333 (from host).
- Extract features for all your 3D models (inside the Docker container):
python OpenShapeInference.py --model_path ./pretrained/model.pt --input_dir ./data/point_clouds --output_dir ./output/features
- Use the
RetrievalSystem/Qdrant.ipynbnotebook to create collections in Qdrant and upload the extracted feature vectors.
- Start the Flask server (inside the Docker container):
cd RetrievalSystem python app.py - Open your web browser and navigate to
http://localhost:5000to use the retrieval system.
# Stop all services
docker-compose down
# Stop and remove all data (including volumes)
docker-compose down -v