-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Hello, thank you for this fascinating work. I’m new to this area, and I had a question while reading your paper.
In Table 2 of the main paper, the "No Depth Encoding" method seems to leave the scale ambiguity problem completely unresolved from a theoretical standpoint — is that a correct understanding?
My interpretation is that this method still uses the Epipolar Encoder, but when computing s in Equation 1, it omits the concatenation with r(d). If that's the case, the model leverages epipolar line information, but the epipolar line itself is defined only up to scale (is that right?). So even if the model is able to accurately find corresponding pixel pairs along epipolar lines, the scale ambiguity issue would still remain. In other words, given only the information available to the model, there seems to be no way to recover the scale of the SfM reconstruction used for training.
That said, the fact that the "No Depth Encoding" variant still achieves quite strong performance in Table 2 is surprising. Could it be that the scale of the training/test datasets is constrained within a specific range?
I’d really appreciate any clarification. Thank you!