Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify handling of multi-plane formats #9

Open
lexaknyazev opened this issue Oct 27, 2018 · 14 comments
Open

Clarify handling of multi-plane formats #9

lexaknyazev opened this issue Oct 27, 2018 · 14 comments

Comments

@lexaknyazev
Copy link
Member

Vulkan 1.1 has introduced multi-planar formats that need special layout. Namely, they consist of 1-3 planes that don't have to have the same dimensions across components. For example:

VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16
Each plane is a one-component image with pixel data stored in the top 10 bits of each 16-bit word, bottom 6 bits are set to 0.

  • Plane 0: G component, full resolution
  • Plane 1: B component, half horizontal and half vertical resolution
  • Plane 2: R component, half horizontal and half vertical resolution

KTX2 must do one of:

  • disallow such formats;
  • require a specific layout for storing multi-plane images and document it;
  • explicitly delegate specification of multi-plane layout to DFD.
@MarkCallow
Copy link
Contributor

I favor the 3rd option. Please submit a PR.

@fluppeteer
Copy link

The DFD describes the interpretation of memory within "a texel" - where a texel is made up of a sequence of consecutive bytes from some number of planes. (In the case of a 4:2:0 format, this is achieved by pointing two "planes" at the Y plane, with an offset and a two-line stride.) It doesn't cover padding or order of storage of planes (or, indeed, tile swizzling to map coordinates to the bytes contributing to texels).

So the layout can be delegated to the DFD (which can also describe the contents of a VkSamplerYcbcrConversionCreateInfo), but KTX2 will still need to describe the stride and location in the file of each plane - where for the sake of the DFD, "plane" is a bit odd. Assuming it can do that, all is good, although the mapping from the Vulkan types isn't all that hard.

@MarkCallow
Copy link
Contributor

@fluppeteer please describe the 4:2:0 case a bit more. How the bits are arranged in memory and how do the offset and two-line stride relate to that. How can the DFD not cover order of storage of the planes? Isn't the offset related to the distance in memory from one plane to the next?

@fluppeteer
Copy link

fluppeteer commented Nov 1, 2018

@MarkCallow there's an example 4:2:0 data format descriptor at the end of the spec - that particular example assumes the U/Cb and V/Cr planes are stored independently rather than interleaved (in FourCC terms, "I420" rather than "NV12"). The expectation is that a typical implementation will have storage of the Y plane as 8-bit values in some location, for the purposes of discussion, let us assume (although this is not necessary) that the Y values are addressed as "Ybase + x + y×Ystride" (i.e. linear). Similarly, there is, somewhere, a U plane addressed as "Ubase + floor(x/2) + floor(y/2)×Ustride" and a V plane addressed as "Vbase + floor(x/2) + floor(y/2)×Vstride".

The data format descriptor does not treat this as having "downsampled planes", because this concept does not extend well to Bayer formats (especially X-Trans); instead it considers a texel block as being a repeating pattern that encompasses some number of coordinates in each axis (currently up to 128; this may be reduced to 16 in a future revision, if there's no counter-example which this would break, so as to allow more precise sub-pixel sample positioning). Typical compressed texel blocks are stored as a consecutive sequence of bytes, covering some area (e.g. 4×4 for the ETC formats). To extend this concept to multi-planar formats, the data format descriptor treats the bytes of each plane addressed at the texel coordinates as though they were concatenated,, and then the existing mechanism which applies to RGB formats is used to pluck bits out of the planes as needed. This mechanism allows true bit-planar representations (as supported, for example, by the Amiga).

For some proprietary ways to store YUV (such as 4 bytes of 2×2 Y data plus U and V, which is a not-uncommon way to store all the necessary data with good spatial locality) this encoding "just works" in a single plane. For a true planar format, we could consider 4:2:2 as encoding a 2×1 texel block, with three planes: 2 bytes in the Y plane, 1 byte of U plane, and 1 byte of V plane.

That is, the bytes for the Y plane of 4:2:2 start at:

plane 1 = Ybase + floor(x/2)×2 + y×Ystride

4:2:0 poses a question: not all the bytes in a "plane" that contribute to a texel block are consecutive in memory. Rather than providing a special case for this format, the solution is to describe the Y plane as two "planes" from the data format descriptor's perspective, each of which contain only consecutive bytes.

That is, rather than:

plane 1 = Ybase + x + y×Ystride

...we say:

plane 1 = Ybase + floor(x/2)×2 + floor(y/2)×2×Ystride
plane 2 = Ybase + floor(x/2)×2 + (floor(y/2)×2 + 1)×Ystride

This is not the conventional view of a "plane" (I freely admit that it's "weird"), but it allows the existing mechanism to be extended to arbitrary YUV alignments - so I don't think weirdness is a reason not to do it. Similarly, YUV 4:1:1 (YYYYUV) is a single plane of Y, but the transposed representation takes four Y planes. Depending on how a 6×6 X-Trans output is stored, this may require six planes, addressed by (floor(y/6)×6 + [0..5])×stride - but there is already a (floor(x/6)×6) term in there, so I don't consider this to be such a reach.

If you have a proprietary mapping between coordinates and bytes (such as Morton order), this complicates the relationship between the planes. But the actual relationship is not defined by the data format descriptor, so that's the user's problem. (And it's not that complicated.)

How can the DFD not cover order of storage of the planes?

Because the planes are stored independently in memory. Indeed, many systems allow arbitrary independent Y, U and V planes (which may have been processed separately) to be combined to give a single "YUV" image. The data format descriptor describes "formats"; the memory location of pixels (let alone planes) is independent of this.

Isn't the offset related to the distance in memory from one plane to the next?

And indeed the stride of the planes. I've also met architectures for which the planes are consecutive, but have a defined amount of padding between them (because the different planes are accessed by coordinates, but the data will be given a proprietary tile swizzle before use). Since the "format" doesn't change just because the size of the image changes, this is considered to be outside the remit of the data format descriptor. As mentioned in the 'required concepts not in the "format"' chapter, the intent isn't to provide a complete description required for an image - there's quite enough to worry about with the pure "format".

I do intend to provide a slightly more explicit example of all of this in a forthcoming spec revision - the questions in this discussion help to guide that, so thank you.

@lexaknyazev
Copy link
Member Author

This is still a blocking issue.

From the spec's vkFormat definition:

It can be any value defined in core Vulkan 1.1

and

Table 1. Prohibited Formats doesn't list these formats.

@MarkCallow
Copy link
Contributor

MarkCallow commented May 20, 2019

CTTF TSG Telecon 5/20/19. Given no hardware correctly supports the transfer functions needed for YUV, etc. and hardware returns RGB to shaders during sampling, are the multi-plane formats really useful as texture storage formats? Main use is for texturing video into a scene and it is generally recommended to use nearest filtering due to the transfer function issues.

Actions:

  • @dewilkinson to query some devtech folks about the importance of these formats.
  • @lexaknyazev to give us a list of the extra information needed in a KTX2 file, in case we decide to proceed.

@lexaknyazev
Copy link
Member Author

Vulkan's handling of these formats is very explicit and requires some care to get things right.

First of all,

To be used with VkImageView with subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, sampler Y’CBCR conversion must be enabled for the following formats.

The sampler Y’CBCR conversion is defined by:

  • Components swizzling

    • Happens before other sampler operations
    • Mutually exclusive with "regular" swizzling
    • There are quite a few restrictions on valid combinations, such as:

      If the format has a _422 or _420 suffix, then components.r must be VK_COMPONENT_SWIZZLE_IDENTITY or VK_COMPONENT_SWIZZLE_B.

  • ycbcrModel

    • RGB (untouched)
    • Y’CBCR (apply only range expansion)
    • Y’CBCR 601/709/2020 (range expansion + to RGB)
  • ycbcrRange

    • full
    • narrow
  • {x,y}ChromaOffset

    • cositedEven
    • midpoint

AFAIU, the DFD can supply this information.


There are 30 formats in total. They could be grouped like:

  • Single plane
    KTX2 should be able to handle them as is (assuming the correct DFD).

    • Single-resolution (2 formats):
      • 10 or 12 bits per color channel with zero-filled extra bits.
    • Multi-resolution (8 formats, 2x1 block, red and blue channels are recorded at half the horizontal resolution):
      • 8/10/12/16 bits per color channel
      • GBGR or BGRG order
  • Multi-planar (20 formats)
    AFAIU, implementations would have to query the runtime about expected memory locations of each plane. For the KTX2 spec we have, I think, only one option: to store the planes sequentially.

    • Full resolution (444, 3 planes)
      • 8/10/12/16 bits per color channel
    • RB at half the horizontal resolution (422)
      • 8/10/12/16 bits per color channel
      • RB can be stored together or separately (G_R_B or G_RB)
    • RB at half the resolution in both dimensions (420)
      • 8/10/12/16 bits per color channel
      • RB can be stored together or separately (G_R_B or G_RB)

@lexaknyazev
Copy link
Member Author

Interesting, that some of these formats can be somewhat mapped to other APIs.

The following assumptions should be carefully verified before updating the spec.

Metal

  • MTLPixelFormatGBGR422 and MTLPixelFormatBGRG422 look very similar to
    VK_FORMAT_G8B8G8R8_422_UNORM and VK_FORMAT_B8G8R8G8_422_UNORM. Although, Metal doesn't perform any YUV-to-RGB conversion.

  • MTLPixelFormatBGRA10_XR and MTLPixelFormatBGRA10_XR_sRGB have a layout that is similar to
    VK_FORMAT_R10X6G10X6B10X6A10X6_UNORM_4PACK16 with red and blue channels swapped. There's also a linear mapping to [-0.752941 .. 1.25098].

  • MTLPixelFormatBGR10_XR and MTLPixelFormatBGR10_XR_sRGB also have swapped channels and an additional linear mapping (as above).

Direct3D

  • DXGI_FORMAT_R8G8_B8G8_UNORM and DXGI_FORMAT_G8R8_G8B8_UNORM to VK_FORMAT_G8B8G8R8_422_UNORM and VK_FORMAT_B8G8R8G8_422_UNORM. An interesting comment about D3D usage.
  • DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM looks similar to Metal's MTLPixelFormatBGR10_XR but with 2 bits of alpha and swapped red and blue.

Also. dedicated video formats:

  • DXGI_FORMAT_AYUV -> VK_FORMAT_B8G8R8A8_UNORM.
  • DXGI_FORMAT_Y410 -> swizzled VK_FORMAT_A2B10G10R10_UNORM_PACK32.
  • DXGI_FORMAT_Y416 -> swizzled VK_FORMAT_R16G16B16A16_UNORM.
  • DXGI_FORMAT_NV12 -> VK_FORMAT_G8_B8R8_2PLANE_420_UNORM.
  • DXGI_FORMAT_P010 -> VK_FORMAT_G10X6_B10X6R10X6_2PLANE_420_UNORM_3PACK16
  • DXGI_FORMAT_P016 -> VK_FORMAT_G16_B16R16_2PLANE_420_UNORM
  • DXGI_FORMAT_YUY2 -> VK_FORMAT_G8B8G8R8_422_UNORM
  • FourCC UYVY -> VK_FORMAT_B8G8R8G8_422_UNORM
  • DXGI_FORMAT_Y210 -> VK_FORMAT_G10X6B10X6G10X6R10X6_422_UNORM_4PACK16
  • DXGI_FORMAT_Y216 -> VK_FORMAT_G16B16G16R16_422_UNORM
  • DXGI_FORMAT_NV11 -> no Vulkan support for 4:1:1 subsampling
  • FourCC IMC1/2/3/4 and YV12 -> swizzled VK_FORMAT_G8_B8_R8_3PLANE_420_UNORM
  • DXGI_FORMAT_P208 -> VK_FORMAT_G8_B8R8_2PLANE_422_UNORM
  • FourCC P216 -> VK_FORMAT_G16_B16R16_2PLANE_422_UNORM
  • FourCC P210 -> VK_FORMAT_G10X6_B10X6R10X6_2PLANE_422_UNORM_3PACK16

@MarkCallow
Copy link
Contributor

MarkCallow commented May 22, 2019

Thanks for the thorough info. @lexaknyazev.

@MarkCallow
Copy link
Contributor

Some comments from the author of the Vulkan YUV extensions.

These are regarding the extra information needed in a KTX2 file.

Mostly the issue is that, while the DFD doesn't say what the relationship
is between the memory for a pixel in one plane and in another
plane, it also doesn't say what the relationship is between
coordinates and memory for a single-plane image - so in that
sense multi-planar images aren't special.

For decoding using a DFD, the DFD may just say (for the easy
example of 8-bit planar 4:2:2) "two bytes from the first plane,
one byte from the second plane, one byte from the third plane;
of these, Y(x,y) is in the first 8 bits of the first two bytes;
Y(x+1,y) is in the second 8 bits of the first two bytes,
Cb(x+0.5,y) is in the 8 bits from the second plane, Cr(x+0.5,y)
is in the 8 bits from the third plane". How you got those
three planes' worth of data together is your problem. And of
course this is implicit - just because it's described that way
doesn't mean that it'll be the best way to implement a decoder.

Alexey has proposed sequential. Seems reasonable to me.

Absolutely. Vulkan has a user-controlled DISJOINT bit which
lets you decide whether you want to have your planes separate,
and that should be the problem of the application, not the
format (it's outside KTX2's remit). For a file, you get to
choose where you want to put them, just as you don't need to
specify a memory address for a single-plane image in the file.

Another tidbit:

The DXGI naming convention seems to be low-to-high-bit, the reverse of Microsoft's previous choice (https://docs.microsoft.com/en-us/windows/desktop/direct3d9/d3dformat) - see the definition of DXGI_FORMAT_R9G9B9A5_SHAREDEXP in
https://docs.microsoft.com/en-us/windows/desktop/api/dxgiformat/ne-dxgiformat-dxgi_format
(because it's not made obvious for most formats until you get to the remarks at the end). On the other hand, Vulkan packed formats (and GL's) use the D3D9 ordering convention.

As such, DXGI_FORMAT_Y10 is described as having a "view format"
of DXGI_FORMAT_R10G10B10A2_, which in Vulkan speak is
VK_FORMAT_A2B10G10R10_
_PACK32.

@MarkCallow
Copy link
Contributor

@dewilkinson to query some devtech folks about the importance of these formats.

@dewilkinson did you get any information? We still need to determine which of these formats, if any, KTX2 should support.

@MarkCallow
Copy link
Contributor

@lexaknyazev the current spec prohibits *[2-9]PLANE* formats. Therefore can we consider this issue resolved and close it? Or is there a desire to support these formats and we should keep the issue open?

@MarkCallow
Copy link
Contributor

@lexaknyazev, ping. With multi-plane formats now prohibited and the single-plane YUV formats supported it looks to me like this issue is resolved. Close it, if you agree. If not, explain why.

@lexaknyazev
Copy link
Member Author

I still think that it would be useful to have a standardized way of serializing multiplanar texture data, e.g., for capture/replay workflows.

That said, it's clear that there's little interest in pursuing and/or implementing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants