zju3dv
diff --git a/‎.gitmodules
Lines changed: 6 additions & 0 deletions b/‎.gitmodules
Lines changed: 6 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 171 additions & 2 deletions b/‎README.md
Lines changed: 171 additions & 2 deletions
diff --git a/‎arguments/__init__.py
Lines changed: 146 additions & 0 deletions b/‎arguments/__init__.py
Lines changed: 146 additions & 0 deletions
diff --git a/‎assets/STDLoc_pipeline.png
859 KB b/‎assets/STDLoc_pipeline.png
859 KB
diff --git a/‎assets/logo.jpg
143 KB b/‎assets/logo.jpg
143 KB
diff --git a/‎configs/stdloc_7scenes.yaml
Lines changed: 27 additions & 0 deletions b/‎configs/stdloc_7scenes.yaml
Lines changed: 27 additions & 0 deletions
diff --git a/‎configs/stdloc_cambridge.yaml
Lines changed: 27 additions & 0 deletions b/‎configs/stdloc_cambridge.yaml
Lines changed: 27 additions & 0 deletions
@@ -0,0 +1,6 @@
+[submodule "submodules/Mask2Former"]
+	path = submodules/Mask2Former
+	url = https://github.com/facebookresearch/Mask2Former.git
+[submodule "submodules/gsplat"]
+	path = submodules/gsplat
+	url = https://github.com/nerfstudio-project/gsplat.git
@@ -1,2 +1,171 @@
-# STDLoc
-[CVPR2025] From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
+<br>
+<p align="center">
+<img src="assets/logo.jpg" style="height:70px"></img>
+<h1 align="center"><strong>From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting</strong></h1>
+  <p align="center">
+    <a href='' target='_blank'>Zhiwei Huang<sup>1,2</sup><sup>*</sup></a>&emsp;
+    <a href='' target='_blank'>Hailin Yu<sup>2</sup><sup>*</sup><sup>&dagger;</sup></a>&emsp;
+    <a href='' target='_blank'>Yichun Shentu<sup>2</sup></a>&emsp;
+    <a href='' target='_blank'>Jin Yuan<sup>2</sup></a>&emsp;
+    <a href='' target='_blank'>Guofeng Zhang<sup>1,2</sup><sup>&dagger;</sup></a>&emsp;
+    <br>
+    <sup>1</sup>State Key Lab of CAD&CG, Zhejiang University&emsp;<sup>2</sup>SenseTime Research
+    <br>
+    <sup>*</sup> Equal Contribution
+    <sup>&dagger;</sup> Corresponding Authors
+    <br>
+    <strong style="font-size: 20px; color:rgb(219, 39, 119);"> CVPR2025 </strong>
+  </p>
+</p>
+
+<p align="center">
+  <a href="" target='_**blank**'>
+    <img src="https://img.shields.io/badge/arXiv-None-blue?">
+  </a> 
+  <a href="" target='_blank'>
+    <img src="https://img.shields.io/badge/Paper-📖-blue?">
+  </a> 
+  <a href="https://zju3dv.github.io/STDLoc/" target='_blank'>
+    <img src="https://img.shields.io/badge/Project-&#x1F680-blue">
+  </a>
+  <a href="" target='_blank'>
+    <img src="https://visitor-badge.laobi.icu/badge?page_id=zju3dv.STDLoc">
+  </a>
+</p>
+
+## 🏠 About
+<div style="text-align: center;">
+    <img src="assets/STDLoc_pipeline.png" alt="Dialogue_Teaser" width=100% >
+</div>
+This paper presents a novel camera relocalization method, <b>STDLoc</b>, which leverages Feature GS as scene representation. STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. Unlike previous coarse-to-fine localization methods that require image retrieval first and then feature matching, we propose a novel sparse-to-dense localization paradigm. Based on this scene representation, we introduce a novel matching-oriented Gaussian sampling strategy and a scene-specific detector to achieve efficient and robust initial pose estimation. Furthermore, based on the initial localization results, we align the query feature map to the Gaussian feature field by dense feature matching to enable accurate localization. The experiments on indoor and outdoor datasets show that <b>STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall</b>.
+
+
+<!-- contents with emoji -->
+<!-- ## 📋 Contents
+- [🔍 Overview](#-overview)
+- [📦 Training and Evaluation](#-training-and-evaluation)
+- [🔗 Citation](#-citation)
+- [👏 Acknowledgements](#-acknowledgements) -->
+
+## 🔍 Performance
+
+
+The code in this repository has a better performance than our paper, through some small fix:
+1. Set ```align_corners=False``` in interpolation.
+2. Use a smaller learning rate for ourdoor dataset.
+3. Use the anti-aliasing feature of gsplat.
+
+#### 7-Scenes
+| Method | Chess | Fire | Heads | Office | Pumpkin | Redkitchen | Stairs | Avg.↓[cm/◦] |
+|---|---|---|---|---|---|---|---|---|
+| STDLoc (paper) | 0.46/0.15 | 0.57/0.24 | 0.45/0.26 | 0.86/0.24 | 0.93/0.21 | 0.63/0.19 | 1.42/0.41 | 0.76/0.24 |
+| STDLoc (repo) | 0.42/0.13 | 0.49/0.2 | 0.41/0.26 | 0.74/0.21 | 0.89/0.23 | 0.57/0.14 | 1.18/0.35 | 0.67/0.22 |
+
+
+
+#### Cambridge Landmarks
+| Methods | Court | King’s | Hospital | Shop | St. Mary’s | Avg.↓[cm/◦] |
+|---|---|---|---|---|---|---|
+| STDLoc (paper) | 15.7/0.06 | 15.0/0.17 | 11.9/0.21 | 3.0/0.13 | 4.7/0.14 | 10.1/0.14 |
+| STDLoc (repo) | 11.3/0.05 | 15.0/0.15 | 11.3/0.21 | 2.5/0.12 | 3.6/0.12 | 8.7/0.13 |
+
+## 📦 Training and Evaluation
+### Environment Setup
+
+1. Clone this repository.
+```bash
+git clone --recursive https://github.com/zju3dv/STDLoc.git
+```
+2. Install packages
+```bash
+conda create -n stdloc python=3.8 -y
+pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124 
+pip install -r requirements.txt
+# install gsplat
+cd submodules/gsplat
+pip install -e .
+cd ../..
+```
+
+### Data Preparation
+We use two public datasets:
+- [Microsoft 7-Scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/)
+- [Cambridge Landmarks](https://www.repository.cam.ac.uk/handle/1810/251342/)
+
+#### 7-Scenes Dataset
+1. Download images follow HLoc.
+```bash
+export dataset=datasets/7scenes
+for scene in chess fire heads office pumpkin redkitchen stairs; \
+do wget http://download.microsoft.com/download/2/8/5/28564B23-0828-408F-8631-23B1EFF1DAC8/$scene.zip -P $dataset \
+&& unzip $dataset/$scene.zip -d $dataset && unzip $dataset/$scene/'*.zip' -d $dataset/$scene; done
+```
+
+2. Download Full Reconstructions
+ from [visloc_pseudo_gt_limitations](https://github.com/tsattler/visloc_pseudo_gt_limitations/tree/main?tab=readme-ov-file#full-reconstructions):
+```bash
+pip install gdown
+gdown 1ATijcGCgK84NKB4Mho4_T-P7x8LSL80m $dataset/7scenes_reference_models.zip
+unzip $dataset/7scenes_reference_models.zip -d $dataset
+# move sfm_gt to each dataset
+for scene in chess fire heads office pumpkin redkitchen stairs; \
+do mkdir -p $dataset/$scene/sparse && cp -r $dataset/7scenes_reference_models/$scene/sfm_gt $dataset/$scene/sparse/0 ; done
+```
+
+<!-- 3. Generate test files -->
+
+#### Cambridge Landmarks Dataset
+1. Download the dataset from the PoseNet project page:
+```bash
+export dataset=datasets/cambridge
+export scenes=( "KingsCollege" "OldHospital" "StMarysChurch" "ShopFacade" "GreatCourt" )
+export IDs=( "251342" "251340" "251294" "251336" "251291" )
+for i in "${!scenes[@]}"; do
+wget https://www.repository.cam.ac.uk/bitstream/handle/1810/${IDs[i]}/${scenes[i]}.zip -P $dataset \
+&& unzip $dataset/${scenes[i]}.zip -d $dataset ; done
+```
+
+
+2. Install Mask2Former to mask dynamic objects and sky:
+```bash
+cd submodules/Mask2Former
+pip install -r requirements.txt
+wget https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/panoptic/maskformer2_swin_large_IN21k_384_bs16_100ep/model_final_f07440.pkl
+cd ../..
+```
+
+3. Preprocess data:
+```bash
+bash scripts/dataset_preprocess.sh
+```
+
+
+### Training Feature Gaussian
+For 7-Scenes: 
+```bash
+bash scripts/train_7scenes.sh
+```
+For Cambridge Landmarks: 
+```bash
+bash scripts/train_cambridge.sh
+```
+### Evaluation
+For 7-Scenes: 
+```bash
+bash scripts/evaluate_7scenes.sh
+```
+For Cambridge Landmarks: 
+```bash
+bash scripts/evaluate_cambridge.sh
+```
+
+## 🔗 Citation
+
+```bibtex
+
+```
+
+
+## 👏 Acknowledgements
+- [Feature 3DGS](https://github.com/ShijieZhou-UCLA/feature-3dgs): Our codebase is built upon Feature 3DGS.
+- [gsplat](https://github.com/nerfstudio-project/gsplat): We use gsplat as our rasterization backend.
@@ -0,0 +1,146 @@
+#
+# Copyright (C) 2023, Inria
+# GRAPHDECO research group, https://team.inria.fr/graphdeco
+# All rights reserved.
+#
+# This software is free for non-commercial, research and evaluation use 
+# under the terms of the LICENSE.md file.
+#
+# For inquiries contact  [email protected]
+#
+
+from argparse import ArgumentParser, Namespace
+import sys
+import os
+
+class GroupParams:
+    pass
+
+class ParamGroup:
+    def __init__(self, parser: ArgumentParser, name : str, fill_none = False):
+        group = parser.add_argument_group(name)
+        for key, value in vars(self).items():
+            shorthand = False
+            if key.startswith("_"):
+                shorthand = True
+                key = key[1:]
+            t = type(value)
+            value = value if not fill_none else None 
+            if shorthand:
+                if t == bool:
+                    group.add_argument("--" + key, ("-" + key[0:1]), default=value, action="store_true")
+                else:
+                    group.add_argument("--" + key, ("-" + key[0:1]), default=value, type=t)
+            else:
+                if t == bool:
+                    group.add_argument("--" + key, default=value, action="store_true")
+                else:
+                    group.add_argument("--" + key, default=value, type=t)
+
+    def extract(self, args):
+        group = GroupParams()
+        for arg in vars(args).items():
+            if arg[0] in vars(self) or ("_" + arg[0]) in vars(self):
+                setattr(group, arg[0], arg[1])
+        return group
+
+class ModelParams(ParamGroup): 
+    def __init__(self, parser, sentinel=False):
+        self.sh_degree = 3
+        self._source_path = ""
+        self._feature_type = ""
+        self._gaussian_type = "3dgs"
+        self._model_path = ""
+        self._images = "images"
+        self._resolution = -1
+        self._white_background = True
+        self.longest_edge = 640
+        self.data_device = "cuda"
+        self.eval = False
+        self.speedup = False ###
+        self.norm_before_render = True
+        self.render_items = ['RGB', 'Depth', 'Edge', 'Normal', 'Curvature', 'Feature Map']
+        super().__init__(parser, "Loading Parameters", sentinel)
+
+    def extract(self, args):
+        g = super().extract(args)
+        g.source_path = os.path.abspath(g.source_path)
+        return g
+
+class PipelineParams(ParamGroup):
+    def __init__(self, parser):
+        self.convert_SHs_python = False
+        self.compute_cov3D_python = False
+        self.debug = True
+        super().__init__(parser, "Pipeline Parameters")
+
+class OptimizationParams(ParamGroup):
+    def __init__(self, parser):
+        self.iterations = 30_000
+        self.position_lr_init = 0.00016
+        self.position_lr_final = 0.0000016
+        self.position_lr_delay_mult = 0.01
+        self.position_lr_max_steps = 30_000
+        self.feature_lr = 0.0025
+        self.opacity_lr = 0.05
+        self.scaling_lr = 0.005
+        self.rotation_lr = 0.001
+#################################################
+        self.loc_feature_lr = 0.001 
+#################################################
+        self.percent_dense = 0.01
+        self.lambda_dssim = 0.2
+        self.densification_interval = 100
+        self.opacity_reset_interval = 3000 ### TRY reset to 100000 but worse
+        self.densify_from_iter = 500
+        self.densify_until_iter = 15_000 #6000 ### comapre with 2-stage
+        self.densify_grad_threshold = 0.0002
+        super().__init__(parser, "Optimization Parameters")
+
+class OptimizationParams_2dgs(ParamGroup):
+    def __init__(self, parser):
+        self.iterations = 30_000
+        self.position_lr_init = 0.00016
+        self.position_lr_final = 0.0000016
+        self.position_lr_delay_mult = 0.01
+        self.position_lr_max_steps = 30_000
+        self.feature_lr = 0.0025
+        self.opacity_lr = 0.05
+        self.scaling_lr = 0.005
+        self.rotation_lr = 0.001
+#################################################
+        self.loc_feature_lr = 0.001 
+#################################################
+        self.percent_dense = 0.01
+        self.lambda_dssim = 0.2
+        self.lambda_dist = 0.0
+        self.lambda_normal = 0.05
+        self.opacity_cull = 0.05
+        self.densification_interval = 100
+        self.opacity_reset_interval = 3000 ### TRY reset to 100000 but worse
+        self.densify_from_iter = 500
+        self.densify_until_iter = 15_000 #6000 ### comapre with 2-stage
+        self.densify_grad_threshold = 0.0002
+        super().__init__(parser, "Optimization Parameters")
+
+def get_combined_args(parser : ArgumentParser):
+    cmdlne_string = sys.argv[1:]
+    cfgfile_string = "Namespace()"
+    args_cmdline = parser.parse_args(cmdlne_string)
+
+    try:
+        cfgfilepath = os.path.join(args_cmdline.model_path, "cfg_args")
+        print("Looking for config file in", cfgfilepath)
+        with open(cfgfilepath) as cfg_file:
+            print("Config file found: {}".format(cfgfilepath))
+            cfgfile_string = cfg_file.read()
+    except TypeError:
+        print("Config file not found at")
+        pass
+    args_cfgfile = eval(cfgfile_string)
+
+    merged_dict = vars(args_cfgfile).copy()
+    for k,v in vars(args_cmdline).items():
+        if v != None:
+            merged_dict[k] = v
+    return Namespace(**merged_dict)
@@ -0,0 +1,27 @@
+sparse:
+  nms: 4
+  detect_num: 4096
+  mnn_match: False # default False, use topk match
+  dual_softmax: False
+  dual_softmax_temp: 0.1
+  topk: 1
+  threshold: 0
+  solver: poselib
+  confidence: 0.99999
+  reprojection_error: 12.0
+  max_iterations: 100000
+  min_iterations: 1000
+  detector_path: detector/30000_detector.pth
+  landmark_path: detector/sampled_idx.pkl
+
+dense:
+  iters: 4
+  coarse_dual_softmax_temp: 0.1
+  fine_dual_softmax_temp: 0.1
+  coarse_threshold: 0
+  fine_threshold: 0
+  solver: poselib
+  confidence: 0.99999
+  reprojection_error: 8.0
+  max_iterations: 1000
+  min_iterations: 100
@@ -0,0 +1,27 @@
+sparse:
+  nms: 4
+  detect_num: 2048
+  mnn_match: False # default False, use topk match
+  dual_softmax: False
+  dual_softmax_temp: 0.1
+  topk: 1
+  threshold: 0
+  solver: poselib
+  confidence: 0.99999
+  reprojection_error: 12.0
+  max_iterations: 100000
+  min_iterations: 1000
+  detector_path: detector/30000_detector.pth
+  landmark_path: detector/sampled_idx.pkl
+
+dense:
+  iters: 1
+  coarse_dual_softmax_temp: 0.1
+  fine_dual_softmax_temp: 0.1
+  coarse_threshold: 0
+  fine_threshold: 0
+  solver: poselib
+  confidence: 0.99999
+  reprojection_error: 12.0
+  max_iterations: 1000
+  min_iterations: 100