VSzA techblog

Video manipulation using stdio and FFmpeg

2015-05-11

Since my FFmpeg recipes post I've been using FFmpeg to process videos recorded at H.A.C.K. talks and workshops, and I needed an easy way to inject my own code into the pixel pipeline. For such tasks, I prefer stdio since there are APIs in every sane programming language, and the OS solves all the problems regarding the producer–consumer problem including parallelization and buffer management out of the box, while making it simple to tap into streams and/or replace them with files for debug purposes.

As it turned out, FFmpeg can be used both as a decoder and encoder in this regard. In case of former, the input is a video file (in my case, raw DV) and FFmpeg outputs raw RGB triplets, from left to right, then from top to bottom, advancing from frame to frame. The relevant command line switches are the following.

  • -pix_fmt rgb24 sets the pixel format to 24-bit (3 × 8 bit) RGB
  • -vcodec rawvideo sets the video codec to raw, resulting in raw pixels
  • -f rawvideo sets the container format to raw, e.g. no wrapping
  • - (a single dash) as the last parameter sends output to stdout

A simple example with 2 frames of 2x2 pixels:

Frame 1Frame 2Raw output (hex dump)
ff 00 00  ff ff 00   00 ff 00  00 00 ff
00 00 00 55 55 55 aa aa aa ff ff ff

The simplest way to test is redirecting the output of a video with solid colors to hd as it can be seen below (input.mkv is the input file).

$ ffmpeg -i input.mkv -vcodec rawvideo -pix_fmt rgb24 \
    -f rawvideo - | hd | head

Such raw image data can be imported in GIMP by selecting Raw image data in the Select File Type list in the Open dialog; since no metadata is supplied, every consumer must know at least the width and pixel format of the image. While GIMP is great for debugging such data, imaging libraries can also easily read such data, for example PIL offers the Image.frombytes method that takes the pixel format and the size as a tuple via parameters.

For example Image.frombytes('RGB', (320, 240), binary_data) returns an Image object if binary_data contains the necessary 320 × 240 × 3 bytes produced by FFmpeg in rgb24 mode. If you only need grayscale, 'RGB' can be replaced with 'L' and rgb24 with gray, like we did in our editor.

FFmpeg can also be used as an encoder; in this scenario, the input consists of raw RGB triplets in the same order as described above, and the output is a video-only file. The relevant command line switches are the following.

  • -r 25 defines the number of frames per second (should match the original)
  • -s 320x240 defines the size of a frame
  • -f rawvideo -pix_fmt rgb24 are the same as above
  • -i - sets stdin as input

The simplest way to test is redirecting /dev/urandom which results in white noise as it can be seen below (4 seconds in the example).

$ dd if=/dev/urandom bs=$((320 * 240 * 3)) count=100 | ffmpeg -r 25 \
    -s 320x240 -f rawvideo -pix_fmt rgb24 -i - output.mkv

Below is an example of a result played in Mplayer.

4 seconds of RGB white noise in Mplayer

Having a working encoder and decoder pipeline makes it possible not only to generate arbitrary output (that's how we generated our intro) but also to merge slides with the video recording of the talk. In that case, pixels can be “forwarded” without modification from the output of the decoder to the input of the encoder by reading stdin to and writing stdout from the same buffer, thus creating rectangular shapes of video doesn't even require image libraries.

permalink


next posts >
< prev post

CC BY-SA RSS Export
Proudly powered by Utterson