What is the structure of the video stream?

Question

What is the structure of the video stream?

The ultimate goal is the processing of RGB video data.

I am trying to read the bytes of a file that I created using ffmpeg.

ffmpeg -video_size 100x100 -framerate 20 -f x11grab -i :0.0 \ -c:v rawvideo -pix_fmt rgb24 -video.nut

I wrote a node script to make it easier to read binary data if you need it. The output of my current file is:

 Hex Binary Row 47 01000111 0 40 01000000 1 11 00010001 2 10 00010000 3 03 00000011 4 00 00000000 5 00 00000000 6 00 00000000 7 68 01101000 8

I see the specification for .nut , but I cannot figure it out. I would like to be able to analyze the RGB data for each frame so that I stay with the RGB matrix for each “image” in the video stream. Thanks!

+5

ffmpeg video video-processing

Matt Jun 16 '16 at 19:48

source share

2 answers

Personally, I like @Mulvya's answer. The .rgb format is much simpler. However, if you ever transfer this file, you will always have to include notes in it (for example, the expected width, height, frame rate, etc.), otherwise it is a sea of bytes, with no idea where to stop for the frame.

Regarding the .nut format, as you previously requested ...

Each video frame will be classified as a key frame (as this is an uncompressed full image).

First find the data section for your keyframes ... Look for the startup code sequence as follows:
4E 4B E4 AD EE CA 45 69 .
To ensure this is a keyframe data section, the following bytes 8 are always set as:
06 00 00 00 00 00 00 03 .
Then, the following bytes contain the flags and common bytes used for this key frame (i.e. 30000 for a 100 x 100 x 3 image). This is difficult because you need to check the bit at a level not only at the byte level ... A short version of this story (for a 100 x 100 image) is just to skip the next 4 bytes, which must be 00 81 EA 30 to get 30,000 bytes of RGB data. et as:
(a) The first frame skips 4 bytes (must be 00 81 EA 30 ) + retrieve the next 30,000 bytes.
(b) This leads to another keyframe: 4E 4B E4 AD EE CA 45 69 . The following 15 bytes are missing (of which the last is 30 ), and you get 30 Kbytes of RGB image data.
(c) For all other frames, repeat step (b), with the result that you: skip 8 bytes of the start code + Skip the next next 15 bytes + Perform the next 30,000 for the image. Repeat to the end.
- PS: As a final note ... These 4 bytes are only 4 because of the common bits needed to define (flags, etc.) a 100 x 100 image. More bits will be used for a larger image. In this case, you must parse the individual bits as well as the trailing bits before the frame data always sets the size of the bytes needed to extract the image. Let me know if you need this information.

OLD ANSWER

I can’t understand what format the payload is in. I checked wikipedia for uncompressed video format, but that didn't help ...

-f mpegts means force , the output format of mpegts , regardless of the file extension.
So you really have the MPEG TS format, and it has never been a RAW format. These 3 letters, raw, in your file name are misleading.

I cannot be sure which format you need when you only say "uncompressed video." This is RGB, what do you want? I only know AVI and FLV in formats that support RGB frames (maybe MOV can do this, but have never tried it). In any case, you will need a container for your RGB frame data

AVI Console :

 ffmpeg -video_size 1920x1080 -framerate 30 -f x11grab -i :0.0 -c:v rawvideo -pix_fmt rgb24 video.avi

FLV container :

 ffmpeg -video_size 1920x1080 -framerate 30 -f x11grab -i :0.0 -c:v flashsv -pix_fmt rgb24 video.flv

PS : Perhaps the information in this answer may help you choose the output format and container.

+3

Vc.one Jun 17 '16 at 10:48

source share

Mulvya · Accepted Answer · 2016-07-08T14:31:59+0000

If you're fine with raw video parsing, use

 ffmpeg -video_size 100x100 -framerate 20 -f x11grab -i :0.0 -c:v rawvideo -pix_fmt rgb24 -f rawvideo video.rgb

Output structure will be

 RGBRGBRGB ...

Thus, each three byte triplet represents a pixel in scanning order from left to right, from top to bottom. So, for a 100x100 frame, the 301st byte is the R value for the first pixel of the second row. The 30000th byte is the B value for the pixel in the lower right corner. Then the next 30K bytes represent the next frame, etc.

This is a raw stream, so there is no container metadata or frame encapsulation. Just an undifferentiated stream of pixel channel values.

What is the structure of the video stream?

More articles: