-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify handling of multi-plane formats #9
Comments
I favor the 3rd option. Please submit a PR. |
The DFD describes the interpretation of memory within "a texel" - where a texel is made up of a sequence of consecutive bytes from some number of planes. (In the case of a 4:2:0 format, this is achieved by pointing two "planes" at the Y plane, with an offset and a two-line stride.) It doesn't cover padding or order of storage of planes (or, indeed, tile swizzling to map coordinates to the bytes contributing to texels). So the layout can be delegated to the DFD (which can also describe the contents of a VkSamplerYcbcrConversionCreateInfo), but KTX2 will still need to describe the stride and location in the file of each plane - where for the sake of the DFD, "plane" is a bit odd. Assuming it can do that, all is good, although the mapping from the Vulkan types isn't all that hard. |
@fluppeteer please describe the 4:2:0 case a bit more. How the bits are arranged in memory and how do the offset and two-line stride relate to that. How can the DFD not cover order of storage of the planes? Isn't the offset related to the distance in memory from one plane to the next? |
@MarkCallow there's an example 4:2:0 data format descriptor at the end of the spec - that particular example assumes the U/Cb and V/Cr planes are stored independently rather than interleaved (in FourCC terms, "I420" rather than "NV12"). The expectation is that a typical implementation will have storage of the Y plane as 8-bit values in some location, for the purposes of discussion, let us assume (although this is not necessary) that the Y values are addressed as "Ybase + x + y×Ystride" (i.e. linear). Similarly, there is, somewhere, a U plane addressed as "Ubase + floor(x/2) + floor(y/2)×Ustride" and a V plane addressed as "Vbase + floor(x/2) + floor(y/2)×Vstride". The data format descriptor does not treat this as having "downsampled planes", because this concept does not extend well to Bayer formats (especially X-Trans); instead it considers a texel block as being a repeating pattern that encompasses some number of coordinates in each axis (currently up to 128; this may be reduced to 16 in a future revision, if there's no counter-example which this would break, so as to allow more precise sub-pixel sample positioning). Typical compressed texel blocks are stored as a consecutive sequence of bytes, covering some area (e.g. 4×4 for the ETC formats). To extend this concept to multi-planar formats, the data format descriptor treats the bytes of each plane addressed at the texel coordinates as though they were concatenated,, and then the existing mechanism which applies to RGB formats is used to pluck bits out of the planes as needed. This mechanism allows true bit-planar representations (as supported, for example, by the Amiga). For some proprietary ways to store YUV (such as 4 bytes of 2×2 Y data plus U and V, which is a not-uncommon way to store all the necessary data with good spatial locality) this encoding "just works" in a single plane. For a true planar format, we could consider 4:2:2 as encoding a 2×1 texel block, with three planes: 2 bytes in the Y plane, 1 byte of U plane, and 1 byte of V plane. That is, the bytes for the Y plane of 4:2:2 start at: plane 1 = Ybase + floor(x/2)×2 + y×Ystride 4:2:0 poses a question: not all the bytes in a "plane" that contribute to a texel block are consecutive in memory. Rather than providing a special case for this format, the solution is to describe the Y plane as two "planes" from the data format descriptor's perspective, each of which contain only consecutive bytes. That is, rather than: plane 1 = Ybase + x + y×Ystride ...we say: plane 1 = Ybase + floor(x/2)×2 + floor(y/2)×2×Ystride This is not the conventional view of a "plane" (I freely admit that it's "weird"), but it allows the existing mechanism to be extended to arbitrary YUV alignments - so I don't think weirdness is a reason not to do it. Similarly, YUV 4:1:1 (YYYYUV) is a single plane of Y, but the transposed representation takes four Y planes. Depending on how a 6×6 X-Trans output is stored, this may require six planes, addressed by (floor(y/6)×6 + [0..5])×stride - but there is already a (floor(x/6)×6) term in there, so I don't consider this to be such a reach. If you have a proprietary mapping between coordinates and bytes (such as Morton order), this complicates the relationship between the planes. But the actual relationship is not defined by the data format descriptor, so that's the user's problem. (And it's not that complicated.)
Because the planes are stored independently in memory. Indeed, many systems allow arbitrary independent Y, U and V planes (which may have been processed separately) to be combined to give a single "YUV" image. The data format descriptor describes "formats"; the memory location of pixels (let alone planes) is independent of this.
And indeed the stride of the planes. I've also met architectures for which the planes are consecutive, but have a defined amount of padding between them (because the different planes are accessed by coordinates, but the data will be given a proprietary tile swizzle before use). Since the "format" doesn't change just because the size of the image changes, this is considered to be outside the remit of the data format descriptor. As mentioned in the 'required concepts not in the "format"' chapter, the intent isn't to provide a complete description required for an image - there's quite enough to worry about with the pure "format". I do intend to provide a slightly more explicit example of all of this in a forthcoming spec revision - the questions in this discussion help to guide that, so thank you. |
This is still a blocking issue. From the spec's
and Table 1. Prohibited Formats doesn't list these formats. |
CTTF TSG Telecon 5/20/19. Given no hardware correctly supports the transfer functions needed for YUV, etc. and hardware returns RGB to shaders during sampling, are the multi-plane formats really useful as texture storage formats? Main use is for texturing video into a scene and it is generally recommended to use nearest filtering due to the transfer function issues. Actions:
|
Vulkan's handling of these formats is very explicit and requires some care to get things right. First of all,
The sampler Y’CBCR conversion is defined by:
AFAIU, the DFD can supply this information. There are 30 formats in total. They could be grouped like:
|
Interesting, that some of these formats can be somewhat mapped to other APIs. The following assumptions should be carefully verified before updating the spec. Metal
Direct3D
Also. dedicated video formats:
|
Thanks for the thorough info. @lexaknyazev. |
Some comments from the author of the Vulkan YUV extensions. These are regarding the extra information needed in a KTX2 file.
Another tidbit:
|
@dewilkinson did you get any information? We still need to determine which of these formats, if any, KTX2 should support. |
@lexaknyazev the current spec prohibits |
@lexaknyazev, ping. With multi-plane formats now prohibited and the single-plane YUV formats supported it looks to me like this issue is resolved. Close it, if you agree. If not, explain why. |
I still think that it would be useful to have a standardized way of serializing multiplanar texture data, e.g., for capture/replay workflows. That said, it's clear that there's little interest in pursuing and/or implementing this. |
Vulkan 1.1 has introduced multi-planar formats that need special layout. Namely, they consist of 1-3 planes that don't have to have the same dimensions across components. For example:
VK_FORMAT_G10X6_B10X6_R10X6_3PLANE_420_UNORM_3PACK16
Each plane is a one-component image with pixel data stored in the top 10 bits of each 16-bit word, bottom 6 bits are set to 0.
KTX2 must do one of:
The text was updated successfully, but these errors were encountered: