Skip to content
This repository has been archived by the owner on Mar 22, 2022. It is now read-only.

Transform/Projectionmatrix of current frame (Locatable camera) #83

Open
Daniel4144 opened this issue Sep 24, 2019 · 37 comments
Open

Transform/Projectionmatrix of current frame (Locatable camera) #83

Daniel4144 opened this issue Sep 24, 2019 · 37 comments
Labels
enhancement New feature or request

Comments

@Daniel4144
Copy link

Hi,
I am currently using this project to stream mixed reality capture from a Hololens to a desktop application, which works great so far.
Now I want to visualize the mouse position on the mrc from desktop, in 3d space on the hololens. Therefore I need the position of the camera from where the frame was captured.
Is it possible to get the projectionmatrix / transformationmatrix of the camera for each frame?

Thanks in advance!

@djee-ms
Copy link
Member

djee-ms commented Sep 24, 2019

Hi @Daniel4144,

This is not supported out of the box. There are several ways to do that; I suggest you have a look at the discussions on webrtc-uwp/webrtc-uwp-sdk#10, w3c/strategy#133, https://groups.google.com/forum/#!topic/discuss-webrtc/CcJnxzUsVBE, and the example at https://github.com/phongcao/webrtc-mrvc-sample. There is so far no clean agreed upon solution, and those feel more like hacks, so I am a bit reluctant to engage in any work for MixedReality-WebRTC.

@djee-ms djee-ms added the question General question about the project label Sep 24, 2019
@djee-ms
Copy link
Member

djee-ms commented Sep 24, 2019

Let me be a bit more clear: I totally see the value of this feature for MR apps, and I am sure you are not the only one to want it. But at this time we are focusing our efforts in other areas where there is a clear path for improvement, ahead of the v1.0 release and the HoloLens 2 shipping to public, whereas this problem is still not well defined in my opinion. I would like to see some kind of consensus among MR people, possibly a draft standard or at least a cleaner solution than hacking RTP headers, and then I could look at adding an implementation for MixedReality-WebRTC. But as interesting as it is, we don't have the dev resources at this time to drive that research ourselves unfortunately.

For your particular application, I feel maybe the synchronization with video is not as critical as other use cases I know of, so maybe using data channels is enough?

@Daniel4144
Copy link
Author

I understand your point, unfortunately I do not have enough time to implement a custom solution either.
Regarding your suggestion using data channels: I already use them at the moment. It feels a bit inaccurate/laggy because of the delay between video / data, but I think I can leave it like that.
Thanks again for the answer.

@djee-ms djee-ms added enhancement New feature or request and removed question General question about the project labels Sep 24, 2019
@djee-ms
Copy link
Member

djee-ms commented Sep 24, 2019

I will leave that issue open as a feature request. We can re-prioritize if there's more demand.

@zhangazheng
Copy link

Almost like this,
In my case is: Hololens handles the location information returned by the desktop application, and then places a annotation in the 3d space. But now I have a problem, I try to use Camera.main.ViewportToWorldPoint to determine the location, but the location is not right.

I am still looking for other ways.

@Daniel4144
Copy link
Author

Hey @zhangazheng,
The location I get on the Hololens is pretty good (considering the Hololens is not moving).
I don't know about your implementation, but I guess the problem is using ViewportToWorldPoint. You need the projectionmatrix and transform of the RGB-camera to calculate the world position from a 2d-coordinate in the image. The RGB-camera however has a different projectionmatrix than the Unity-camera + there is an offset between them.
Example how to get these values in Unity (take a look at PhotoCaptureFrame ): https://docs.unity3d.com/Manual/windowsholographic-photocapture.html
If you need more details feel free to ask.

@zhangazheng
Copy link

@Daniel4144
Thanks for reply, I will look at it

@kspark-scott
Copy link

I can add a bit more context, for the record if nothing else. I've done this using just the data channel and was impressed at the low latency, though it was between peers on the same network. But even then, there was a bit of "swim". The magic of HoloLens is how solidly the rendered objects are anchored to the real world, and you really need per-frame accuracy to maintain that.

The https://github.com/phongcao/webrtc-mrvc-sample project (referenced above) is specific to the VP8 codec (or it might be VP9). There is some unused space in the encoded frame header for that codec that is used to embed a frame ID. The camera transform for that frame is then sent along the data channel along with the ID. When the video frame is received, the ID is extracted and used to look up the transform for that frame. It works very well, but only for the VP8/9 codec and uses reserved space in the header that is arguably risky. It also requires customization of the public WebRTC code base.

WebRTC now has a multiplex codec that allows metadata to be multiplexed with each frame. This provides a mainstream way to embed a frame ID, or even the camera transform itself, with each frame. But the base WebRTC code currently has no public APIs to provide metadata input to encoding and extract it again upon receipt of each frame. Also problematic is that SDP (to my knowledge) doesn't allow for negotiation of nested chains of codecs. So if you negotiate mutiplex, there has to be a hard coded assumption of what video codec it wraps -- which in the Google codebase now is VP9. So you lose the benefits of codec negotiation with this approach.

I would also love to see any enhancement along these lines, but I think it would require non-trivial work in the base WebRTC and may not even be possible at all without imposing constraints like forcing the use of specific codecs.

@djee-ms
Copy link
Member

djee-ms commented Oct 3, 2019

All very good info, thanks @kspark-scott, and the others who also contributed. I also would be interested to see something done in this area, there seems to be some demand and this is well inside the boundaries of the project. I think we will consider it for a next milestone.

@stephenatwork
Copy link
Member

Note to self. Link to webrtc multiplex issue https://bugs.chromium.org/p/webrtc/issues/detail?id=9632

@astaikos316
Copy link

@Daniel4144 I would like to get some more details on how you were able to get the projection matrix and transform from the RGB camera while running the webrtc unity app.

@Daniel4144
Copy link
Author

Hi @astaikos316,
Actually I don't get it while running webrtc.
The projection matrix is constant anyway (at least for a specific resolution), so I read the values before with the Unity photocapture I linked above. The offset between eye/rgb camera should also be constant, however I wasn't able to calculate it using the photocapture data, so I just tweaked the offset until I got acceptable results.
I have to admit that it's not a very elegant solution, but it worked for me.

@astaikos316
Copy link

@Daniel4144 I guess I am confused as to how to use the matrix to take an object placed on the 2D view from a desktop app to a world coordinate on the hololens.

@Daniel4144
Copy link
Author

@astaikos316 I'll try to explain what I did:

  1. Get the projection matrix of the RGB-camera once on the Hololens (with same resolution you use in webrtc)
    Example how to get access it in Unity (photoCaptureFrame.TryGetProjectionMatrix): https://docs.unity3d.com/Manual/windowsholographic-photocapture.html
  2. Calculate ray-direction in camera space from your 2d image-coordinate on the desktop,
    similar to this implementation (skip the line for transforming to world space): https://github.com/VulcanTechnologies/HoloLensCameraStream/blob/master/HoloLensVideoCaptureExample/Assets/CamStream/Scripts/LocatableCameraUtils.cs
    Using the projection matrix from above, the lens shift values were wrong, so I had to change signs of those. I also had to change sign of the direction.
  3. Send ray-direction to Hololens
  4. transform direction to world space and calculate world position of the rgb camera (position of main unity camera + estimated offset to rgb camera)
  5. raycast with that position and direction against the environment (make sure to add a spatial mapping collider to the scene)
  6. Place an object at the collision position, should at least roughly match the selected 2d view position (if not, try to tweak the camera offset and make sure the 2d coordinates aren't flipped)

One more thing I've noticed: The mrc I get back from the hololens is shifted to the left relative to the camera image (but gets more accurate with greater distance), so I couldn't use the mrc to check the positioning, but had to check how it actually looks on the hololens.

Maybe @djee-ms knows something about this: is it possible or even necessary to calibrate the mrc (it is always shifted to the left)? The only option I could find is to turn it on/off.
I think it should be possible to get a better mrc, because in the other direction (2d position on the image to 3d position on the hololens) i was able to calculate a more accurate positioning.

@kspark-scott
Copy link

Hi @Daniel4144 , regarding alignment, there are known alignment issues with MRC on HoloLens 1. Specifically, alignment drifts as you get further away from the focus point/plane for the scene. Might that be what you're seeing? If you're not familiar with this you can read at least a high level summary here (search for section Enabling improved alignment for MRC in your app):
https://docs.microsoft.com/en-us/windows/mixed-reality/mixed-reality-capture-for-developers. HoloLens 2 adds a third render camera aligned with the photo/video camera to address the problem. I have no deep expertise in this area, but would speculate that this wouldn't have been necessary if there were reliable ways to deal with it in software on HL1. :-)

@Daniel4144
Copy link
Author

According to this https://docs.microsoft.com/en-us/windows/mixed-reality/focus-point-in-unity setting the focus point manually should not be necessary (if Enable Depth Buffer Sharing is set), but I'll give it a try. Thanks for the hint @kspark-scott.

@djee-ms
Copy link
Member

djee-ms commented Oct 11, 2019

@kspark-scott beat me to it. Yes this is likely the alignment drift from using the wrong focus point.
Unfortunately there is currently no control over that in MixedReality-WebRTC.

@astaikos316
Copy link

@Daniel4144 i am having trouble I believe reversing the direction of the ray direction I get from the 2D view on the desktop. Right now what is occurring is that the holograms are appearing behind me instead of where i think they would. I've tried changing the sign of the direction ray, but when I do that I cannot get holograms to instantiate at all. I am also not sure if I am properly changing the direction to world space once transmitted to the hololens. What function would be used for that?

@Daniel4144
Copy link
Author

You can simply use the Unity transform-functions (https://docs.unity3d.com/ScriptReference/Transform.TransformDirection.html):
To get the direction in worldspace, use Camera.main.transform.TransformDirection(direction).

Regarding my mrc problem: setting the focuspoint manually for every frame solved it, I think the shaders I used didn't write into the depth buffer, so the automatic approach I linked above did not work.

@astaikos316
Copy link

@Daniel4144 i am not sure where to set the focus point manually for every frame. I'm also just using standard MRTK shaders for this project.

@Daniel4144
Copy link
Author

Set it in any Update() in your scene like in the example https://docs.microsoft.com/en-us/windows/mixed-reality/focus-point-in-unity
The position is in my (and probably also in your) case the calculated collision point on the hololens.

@christjt
Copy link

Hi @astaikos316,
Actually I don't get it while running webrtc.
The projection matrix is constant anyway (at least for a specific resolution), so I read the values before with the Unity photocapture I linked above. The offset between eye/rgb camera should also be constant, however I wasn't able to calculate it using the photocapture data, so I just tweaked the offset until I got acceptable results.
I have to admit that it's not a very elegant solution, but it worked for me.

These assumptions are going to get challenged with HoloLens 2, as a believe neither of them will be true then.

@astaikos316
Copy link

@Daniel4144 et. al. thank you for all the valuable information and discussions so far. One other thing I cannot figure out is which values are the camera offsets that you described earlier or where should i set them and apply them to? I've tried adding some offsets to a few values but nothing has worked so far.

@astaikos316
Copy link

@Daniel4144 I am still having trouble withe the offsets and figuring out where those values are or where to add them in.

@zhangazheng
Copy link

These assumptions are going to get challenged with HoloLens 2, as a believe neither of them will be true then.

What happens on Hololens2 ?

@christjt
Copy link

christjt commented Jan 6, 2020

@zhangazheng The HoloLens 2 uses eyetracking to ensure a more correct and proper placement of holograms. As such, the transformation between the headset and eyes will not be constant as it will change depending on which user is wearing the HoloLens as well as how they place it on their head. It can also change runtime if the user repositions the HoloLens. The projection matrix is also dynamic as the projection also changes with the position of the user's eyes in relation with the display.

TLDR; Neither the projection matrix nor the virtual-camera-to-physical-camera transform is constant.

@qazqaz12378
Copy link

Guys. Any good idea? Dynamics 365 remote assis is very stable. I guess that is because MS can get the deep level information.

@djee-ms
Copy link
Member

djee-ms commented Mar 16, 2020

Any idea about what? I don't think Remote Assist uses WebRTC, so they won't have the issue of finding a way to send the head position through the WebRTC protocol.

@Peskey
Copy link

Peskey commented Apr 14, 2020

At a minimum, it is possible to get the camera to world matrix locally whenever the local video frame is ready? I'm trying to do something more static. I was able to accomplish this previously with the WebRTC-universal-samples project. From my remote source, I want to send a command (can be done over the data channel) to my HoloLens 2 that causes a local positional "snapshot" which stores off the head position local to the device when the command is received. I do this so I can place an object in my scene according to commands given at the remote viewing station. The first step is creating the snapshot, and then later commands use this to position new objects.

@tarukosu
Copy link

I want to know how to get a still image and world matrix while WebRTC video streaming is active.

My use case is almost the same as @Peskey 's.

  1. Remote peer (PC) sends "capture" command to HoloLens.
  2. HoloLens takes a snapshot image and world matrix.
  3. HoloLens sends the image to the peer.
  4. The peer draws lines on the image.
  5. The peer sends lines data to HoloLens.
  6. HoloLens displays lines in 3D space.

I don't have to send information via WebRTC.

@astaikos316
Copy link

@Peskey have you been able to get the camera projection matrix from the Hololens 2? I have tried for a few days now since I just got my headset, but I only get an identity matrix no matter what resolution I set and trying to use the Unity Photocapture API.

@Peskey
Copy link

Peskey commented Jun 3, 2020

@djee-ms will this be a feature? WebRTC in Mixed Reality is fairly useless for my purposes without the ability to get the camera projection matrix associated with my video stream. With webrtc-uwp-sdk being deprecated, it's important for me to find an alternative that will run on the HoloLens 2, but also supports getting the camera matrix. I couldn't see how to do this without modifying multiple levels of your code. Is there a simpler way I'm missing

@tarukosu
Copy link

I am trying to take a photo with shared mode MediaCapture while using WebRTC.
But the following error happens when I try to get mediaCapture.FrameSources.

Error HRESULT E_FAIL has been returned from a call to a COM component.

This method works well when I used with webrtc-uwp-sdk.

code
var frameSource = await GetFrameSourceAsync();
var result = await StartMediaFrameReaderAsync(frameSource);
private async Task<FrameSource> GetFrameSourceAsync()
{
    var frameSourceGroups = await MediaFrameSourceGroup.FindAllAsync();

    foreach (var sourceGroup in frameSourceGroups)
    {
        foreach (var sourceInfo in sourceGroup.SourceInfos)
        {
            if (sourceInfo.MediaStreamType == MediaStreamType.VideoRecord
                    && sourceInfo.SourceKind == MediaFrameSourceKind.Color)
            {
                return new FrameSource()
                {
                    Group = sourceGroup,
                    Info = sourceInfo
                };
            }
        }
    }
    return null;
}

private async Task<bool> StartMediaFrameReaderAsync(FrameSource frameSource)
{
    try
    {
        var mediaCapture = new MediaCapture();
        settings = new MediaCaptureInitializationSettings()
        {
            SourceGroup = frameSource.Group,
            SharingMode = MediaCaptureSharingMode.SharedReadOnly,
            MemoryPreference = MediaCaptureMemoryPreference.Cpu,
            StreamingCaptureMode = StreamingCaptureMode.Video
        };
        await mediaCapture.InitializeAsync(settings);
    }
    catch (Exception ex)
    {
        Debug.Log(ex.Message);
        return false;
    }
    ...
}

@tarukosu
Copy link

I've found the cause.
When the following methods are called in UI thread, MediaCapture doesn't work well.

RequestAccessAndInitAsync(token) in libs/unity/library/Runtime/Scripts/PeerConnection.cs
RequestAccessAsync() in libs/unity/library/Runtime/Scripts/Media/WebcamSource.cs
RequestAccessAsync() in libs/unity/library/Runtime/Scripts/Media/MicrophoneSource.cs

@qazqaz12378
Copy link

@tarukosu
How to get PhotoCaptureFrame in real time.
I need to know the real-time CamToWorld and Projection matrices

@ccc7861
Copy link

ccc7861 commented Aug 5, 2021

wonder if there's any progress now? When webrtc can get the localtable camera data

@Davidwang007
Copy link

@zhangazheng have you solved this problem ? thank you .谢谢!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests