NV12 image formats in memory

I fully understand the size of the NV12 format as described in the question

NV12 format and UV plane

Now I read from two sources about storing the UV plane in this format: one https://msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx

NV12

All Y patterns first appear in memory as an array of unsigned char values ​​with an even number of rows. The Y plane is immediately followed by an array of unsigned char values ​​that contains the packed samples U (Cb) and V (Cr). When the combined UV array is addressed as an array of WORD low-order values, LSBs contain U values, and MSBs contain V. NV12 is the preferred 4: 2: 0 pixel format for DirectX VA. This is expected to be an intermediate requirement for DirectX VA accelerators supporting 4: 2: 0 video. The following figure shows the Y plane and an array that contains packed samples of U and V.

I understand: in the UV plane, each U and V are stored in one byte .

When I read from Wikipedia about this: https://wiki.videolan.org/YUV#NV12

It says:

NV12

In connection with I420, NV12 has one brightness “brightness” Y plane and one plane with alternating values ​​of U and V. In NV12, color planes (blue and red) are selected both horizontally and vertically 2 times. For a 2x2 pixel group, you have samples of 4 Y and 1 U and 1 V samples. It may be useful to think of NV12 as an I420 alternating between U and V. Here is a graphical representation of NV12. Each letter represents one bit: For 1 pixel NV12: YYYYYYY UVUV For 2-pixel frame NV12: YYYYYYYYYYYYYYYY UVUVUVUV For 50-pixel frame NV12: Y * 8 * 50 (UV) * 2 * 50 For n-pixel frame NV12: Y * 8 * n (UV) * 2 * n

I understand here: each U and V alternate byte in each byte. Thus, each each byte of the UV plane will contain 4U bits and 4V bits alternating.

Can anyone clarify my doubts?

+7
source share
1 answer

TL DR: MSDN is correct

To test this (or at least make sure there is no bit level interlacing), you can use ffmpeg , which is a widely used video tool. I did the following experiment:

  • Create a file containing some text (I took an example of Lorem Ipsum text)
  • Tell ffmpeg to read it as a small size I420 video frame
  • Tell ffmpeg to convert it to NV12 format
  • Print

Here is an example command line for (2) and (3):

 ffmpeg -s 96x4 -i example_i420.yuv -pix_fmt nv12 example_nv12.yuv 

Here is what I got in the output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tem incididunt ut labore et dolore magna aliqua. Announcement of the announcement minim veniam, quis nostrud Gym ullamco laboris nisi ut aliquip ex ea commodo. Duis aute irure dolor representation in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat irrelevant, s utnett uirn acduilppias cqiunig oeflfiitc, I as edde sdeor uenitu smmooldl itte mapnoirm iindc iedsitd ulnatb ourtu ml.a bLoorree me ti pdsoulmo rdeo laim lmo cUotn seenci

I highlighted color samples (U and V) in bold. Obviously, these are the same values ​​(ASCII letters), only in scrambled order. If any bit-wise interlacing was done, I would get different values.

Therefore, the description on the WLC VLC (BTW it is not Wikipedia) is incorrect. Someone named "Edwardw" added an "illustration" representing the pixels here , and then changed it to a "bit" here . I hope someone changes this to be less misleading (a wiki requires registration, so I can't edit it).

+13
source

All Articles