Applying OpenGL Shaders w/ FFmpeg

Walkthrough of a simple video filter for shading frames with arbitrary GLSL.

Motivation

A while back, I had a task which required performing geometric transformations on high resolution videos1 as efficiently as practically possible2. The development of the geometric portion was ongoing, involved, and cross-platform (cloud, Android, web, etc.), so it made sense to attempt to use a single source artifact - an OpenGL fragment shader - to do the lifting.

In the cloud processing case, an FFmpeg filter seemed like the most sensible way to apply the shader, while abstracting most of the I/O & transcoding details.

1 Disparate codecs, containers, etc.
2 Incl. utilising hardware-accelerated decoding/encoding, platform permitting.

Filter Introduction

I’ve removed everything interesting (domain-specific details) and almost everything helpful (error checking, speed optimisations, etc.) from the filter I ended up writing, and am focusing on the minimal viable GLSL/FFmpeg integration — read a frame, upload the image data to the GPU & emit the shaded output in a clone of the input frame (identical resolution, leaving audio, PTS, etc. intact).

As FFmpeg is monolithically architected, the filter, genericshader will be incorporated into its build process, and compiled into an ffmpeg binary, no different from any of the stock video filters (scale, rotate, etc.).

The code below has been tested on OS X 10.11.5 and Ubuntu 16.04 with a variety recent (last ~6 months) master commits of FFmpeg.

What’s Not Included

  • FFmpeg has an excellent guide to writing filters, so I’m going to avoid explaining much of anything that it covers, and focus on the intersection with OpenGL.
  • The version presented is optimised for terseness only - there are relatively obvious performance and sanity (error handling, logging, etc.) improvements to be made, but they don’t dramatically change the shape of things. If there’s any interest, or I otherwise run out of ideas, I’ll cover optimisation in a subsequent post.
N.B. I am by no means positioning myself as an expert in OpenGL, GLSL or FFmpeg.

Building

In addition to OpenGL, which FFmpeg’s build process can compile & link against, unmodified (--enable-opengl), we’re introducing dependencies on GLFW (context creation) and (on Linux) GLEW (unified presentation of both core & extension OpenGL functionality).

Assuming our shader source lives in libavfilter/vf_genericshader.c, and the filter is registered, something like:

FFmpeg$ ./configure --enable-filter=genericshader --enable-opengl \
    --extra-libs='-lGLEW -lglfw'

Would be a minimal build configuration - obviously you’ll want to include whatever codecs you need (--enable-libx264 etc.), but that stuff’s all orthogonal to the filter.

N.B. On OS X, -lGLEW is not necessary, and --cc=clang may make sense.

Annotated Filter Source

Github repository.

#include "libavutil/opt.h"
#include "internal.h"

#ifdef __APPLE__
#include <OpenGL/gl3.h>
#else
#include <GL/glew.h>
#endif

#include <GLFW/glfw3.h>

static const float position[12] = {
  -1.0f, -1.0f, 1.0f, -1.0f, -1.0f, 1.0f, -1.0f, 1.0f, 1.0f, -1.0f,
  1.0f, 1.0f};

const float position holds the buffer data for the position vertex attribute, visible below in the passthrough vertex shader.

static const GLchar *v_shader_source =
  "attribute vec2 position;\n"
  "varying vec2 texCoord;\n"
  "void main(void) {\n"
  "  gl_Position = vec4(position, 0, 1);\n"
  "  texCoord = position;\n"
  "}\n";

static const GLchar *f_shader_source =
  "uniform sampler2D tex;\n"
  "varying vec2 texCoord;\n"
  "void main() {\n"
  "  gl_FragColor = texture2D(tex, texCoord * 0.5 + 0.5);\n"
  "}\n";

These shaders don’t do anything.

As written, the filter doesn’t take any parameters - reading the shader source from filesystem paths supplied at runtime (e.g. -vf genericshader=frag=x.glsl:vert=y.glsl) is an obvious real-world enhancement.


#define PIXEL_FORMAT GL_RGB

To avoid repeating it, we’re defining the value we’ll use as the external pixel format when, e.g. creating textures.

GL_RGB makes sense for a trivial example because we can declare the filter to accept AV_PIX_FMT_RGB24 (“packed RGB 8:8:8, 24bpp, RGBRGB…”), and supply each input frame’s data to OpenGL (as a texture image) without performing any transformations.

typedef struct {
  const AVClass *class;
  GLuint        program;
  GLuint        frame_tex;
  GLFWwindow    *window;
  GLuint        pos_buf;
} GenericShaderContext

This structure is used to hold inter-frame state, e.g. resources we may later want to free. We’ve a single OpenGL program and texture, and post-setup, the filter assumes that the shader program remains active, and that the active texture unit has the frame texture bound to the GL_TEXTURE_2D target.


#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
static const AVOption genericshader_options[] = {{}, {NULL}};

AVFILTER_DEFINE_CLASS(genericshader);

GLuint build_shader(AVFilterContext *ctx,
                    const GLchar *shader_source,
                    GLenum type) {
  GLuint shader = glCreateShader(type);
  if (!(shader && glIsShader(shader))) {
    return 0;
  }
  glShaderSource(shader, 1, &shader_source, 0);
  glCompileShader(shader);
  GLint status;
  glGetShaderiv(shader, GL_COMPILE_STATUS, &status);
  return status == GL_TRUE ? shader : 0;
}

I Had To Pick a Side

We’re following FFmpeg’s function naming convention, rather than OpenGL’s. In general, we’re also using FFmpeg’s notion of failure (< 0), with the exception of GLuint-returning functions, responsible for the allocation of OpenGL resource IDs - 0 all the way.

After some filter boilerplate, we have the function we’ll use to compile the shaders. Passing around the AVFilterContext allows us to use FFmpeg’s structured logging mechanism.

The parameter type, above, corresponds to e.g. GL_VERTEX_SHADER.

static void vbo_setup(GenericShaderContext *gs) {
  glGenBuffers(1, &gs->pos_buf);
  glBindBuffer(GL_ARRAY_BUFFER, gs->pos_buf);
  glBufferData(GL_ARRAY_BUFFER, sizeof(position), position, GL_STATIC_DRAW);

  GLint loc = glGetAttribLocation(gs->program, "position");
  glEnableVertexAttribArray(loc);
  glVertexAttribPointer(loc, 2, GL_FLOAT, GL_FALSE, 0, 0);
}

Here we’re configuring the Vertex Buffer Object. As such, this function is going to be specific to the structure of your shader.

static void tex_setup(AVFilterLink *inlink) {
  AVFilterContext     *ctx = inlink->dst;
  GenericShaderContext *gs = ctx->priv;

  glGenTextures(1, &gs->frame_tex);
  glActiveTexture(GL_TEXTURE0);

  glBindTexture(GL_TEXTURE_2D, gs->frame_tex);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, inlink->w, inlink->h, 0, PIXEL_FORMAT, GL_UNSIGNED_BYTE, NULL);

  glUniform1i(glGetUniformLocation(gs->program, "tex"), 0);
}

This function depends on an AVFilterLink describing the input link, which we won’t have access to until our config_props (below) callback is invoked. We generate a single texture of the same dimensions as the input video, bind it to the GL_TEXTURE_2D target of texture unit 0, and set that unit as the value of our shader program’s tex uniform. Our plan is to upload each input frame to this texture, in order to expose the image data to the shader.

static int build_program(AVFilterContext *ctx) {
  GLuint v_shader, f_shader;
  GenericShaderContext *gs = ctx->priv;

  if (!((v_shader = build_shader(
          ctx, v_shader_source, GL_VERTEX_SHADER)) &&
        (f_shader = build_shader(
          ctx, f_shader_source, GL_FRAGMENT_SHADER))) {
    return -1;
  }

  gs->program = glCreateProgram();
  glAttachShader(gs->program, v_shader);
  glAttachShader(gs->program, f_shader);
  glLinkProgram(gs->program);

  GLint status;
  glGetProgramiv(gs->program, GL_LINK_STATUS, &status);
  return status == GL_TRUE ? 0 : -1;
}

AVFilterContext contains our filter-specific GenericShaderContext (the struct we defined at the top of the file), on which we’re storing the GL program ID.

static av_cold int init(AVFilterContext *ctx) {
  return glfwInit() ? 0 : -1;
}

Our first libav callback. At this point, we don’t know anything about the input video, so there’s not a huge amount of setup to do.

static int config_props(AVFilterLink *inlink) {
  AVFilterContext     *ctx = inlink->dst;
  GenericShaderContext *gs = ctx->priv;

  glfwWindowHint(GLFW_VISIBLE, GLFW_FALSE);
  gs->window = glfwCreateWindow(inlink->w, inlink->h, "", NULL, NULL);

  glfwMakeContextCurrent(gs->window);

  #ifndef __APPLE__
  glewExperimental = GL_TRUE;
  glewInit();
  #endif

  glViewport(0, 0, inlink->w, inlink->h);

  av_log(ctx, AV_LOG_VERBOSE, "Using GL: %s\n\nGLSL %s\n",
         glGetString(GL_VERSION), glGetString(GL_SHADING_LANGUAGE_VERSION));

The libav callback in which we do input-dependent setup - we use an invisible GLFW window to get a handle on a GL rendering context, and set the viewport size to the input video’s dimensions.

  int ret;
  if((ret = build_program(ctx)) < 0) {
    return ret;
  }

  glUseProgram(gs->program);
  vbo_setup(gs);
  tex_setup(gs);

  return 0;
}

static int filter_frame(AVFilterLink *inlink, AVFrame *in) {
  AVFilterContext *ctx     = inlink->dst;
  AVFilterLink    *outlink = ctx->outputs[0];

  AVFrame *out = ff_get_video_buffer(
    outlink, outlink->w, outlink->h);
  if (!out) {
    av_frame_free(&in);
    return AVERROR(ENOMEM);
  }
  av_frame_copy_props(out, in);

Called per-frame. We leave the input frame ontouched, and acquire an AVFrame pointer (out), to convey the shaded image data.

  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, inlink->w, inlink->h,
    0, PIXEL_FORMAT, GL_UNSIGNED_BYTE, in->data[0]);

Here we synchronously transfer the input frame data to the GPU. Using Pixel Buffer Objects for both reading and writing may significantly speed up this function, if done correctly.

  glDrawArrays(GL_TRIANGLES, 0, 6);

Render to the framebuffer. This call is specific to your shader/vertex configuration.

glReadPixels(0, 0, outlink->w, outlink->h, PIXEL_FORMAT,
  GL_UNSIGNED_BYTE, (GLvoid *)out->data[0]);

Synchronously1 read the shaded pixels from the GPU and store them in the output frame. In our example, the input and output dimensions are the same, but that needn’t necessarily be true.

  av_frame_free(&in);
  return ff_filter_frame(outlink, out);
}

Free the input frame, and hand over the output frame to libav2.

1 Very, uh, synchronously, depending on your hardware & image resolution.
2 Note that ff_filter_frame is the point at which the filter yields to the encoder (and any filters in between) - if were transferring pixel data asynchronously (per PBO note, above), we could still be getting work done inside the filter.
static av_cold void uninit(AVFilterContext *ctx) {
  GenericShaderContext *gs = ctx->priv;
  glDeleteTextures(1, &gs->frame_tex);
  glDeleteProgram(gs->program);
  glDeleteBuffers(1, &gs->pos_buf);
  glfwDestroyWindow(gs->window);
}

In our libav teardown callback, we free the resources attached to our filter’s context struct.

static int query_formats(AVFilterContext *ctx) {
  static const enum AVPixelFormat formats[] = {AV_PIX_FMT_RGB24, AV_PIX_FMT_NONE};
  return ff_set_common_formats(ctx, ff_make_format_list(formats));
}

static const AVFilterPad genericshader_inputs[] = {
  {.name = "default",
   .type = AVMEDIA_TYPE_VIDEO,
   .config_props = config_props,
   .filter_frame = filter_frame},
  {NULL}};

static const AVFilterPad genericshader_outputs[] = {
  {.name = "default", .type = AVMEDIA_TYPE_VIDEO}, {NULL}};

AVFilter ff_vf_genericshader = {
  .name          = "genericshader",
  .description   = NULL_IF_CONFIG_SMALL("Generic OpenGL shader filter"),
  .priv_size     = sizeof(GenericShaderContext),
  .init          = init,
  .uninit        = uninit,
  .query_formats = query_formats,
  .inputs        = genericshader_inputs,
  .outputs       = genericshader_outputs,
  .priv_class    = &genericshader_class,
  .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC};

Our filter definition - which amounts to a structure holding references to some of the static callbacks we defined above.

Enhancing

  • For desktop development, you may want to make the GLFW window visible, in order to view the shader output while the video is being processed.
  • Use of pixel buffers for asynchronous data transfer (i.e. passing ff_filter_frame the previous frame’s data, while transferring the pixels for current frame from the GPU).