[vlc-devel] RFC: OpenGL filter design overview

Sat Aug 29 19:58:55 CEST 2020

Hi,

The following mail is here to describe the work we've done
internally at Videolabs regarding the OpenGL refactor and
introduction of OpenGL filters.

This summary encompass part of my work, but also part of the
work of Maxime Meisson and Romain Vimont.

The whole project has been separated into multiple steps.
Some of them have already been merged, sometimes with issues
that other parts are really meant to solve in the long term,
and the rest of them are starting to reach a level of stability
which will allow us to submit them on the mailing list.

Though the work is globally thoroughly and meticulously tested
on many different platforms it involves (which is basically
«what VLC supports»), there have been pain points and untested
code path from the refactoring parts of all this work. Thus, we
are also exploring ways to test all those parts correctly in an
automated way on all those platforms too.

For convenience, I splitted this summary into different axis,
each of which has first the list of goals and then a short
text digging more into the details.

1/ Preliminary refactor in OpenGL (merged)

 - Make import and sampling separated from the opengl video output code
     so as to reuse it elsewhere, and in particular between filters.
 - Split renderer from vout code, the vout display is mostly in charge of
     setting up the rendering environnment and handle additional queries
     that we add in the end like format change.
     - the format change support is not merged yet.
 - Make core OpenGL client code into a reusable library `libvlc_opengl`
     and `libvlc_opengles`.

This is mainly a step towards the cleaning and clear separation of
everything that has been introduced in 3.0 regarding OpenGL, while
allowing new features (libplacebo upscaling for example) to start
being experimented.

2/ Introduce `opengl filter` capability

 - New pipeline taylored only for OpenGL code, with a single OpenGL
     context, potentially multiple framebuffer.
 - Provide a more open design scope for high performance pipeline
     without copies between stages, especially regarding blending.
 - Easier to reuse pieces in different location, like tonemappers,
     encoders, converters... We use it in particular in MediaCodec
     encoder in a different work to fit the encoder within the
     current state of the push model, in OpenGL display obviously,
     and in the `filter_chain` (see latter points).
 - Output can be directly redirected to a framebuffer taylored for
     display/encoder or a framebuffer with multisampling enabled
     and MSAA resolution is automatically done if needed.
 - Independance on the design of video context / decoder device and
     current other rework in the core, which makes it much easier to
     develop in parallel, and much less code to write for a filter,
     while fitting the resource model of OpenGL.

As the capability exposed in this axis is different to `video context`,
it does not provide the capability to be run in the usual filter_chain
pipeline but this will be tackled later (6/).

The main goal of this axis is to refactor the drawing code into steps
and being able to compose it, while taking care only of the OpenGL
parts (DMABuf, EGLImage, vaapi picture, etc are only textures within
the OpenGL resource model, and the link between them is tied by the
opengl interop introduced in the first part).

3/ OpenGL offscreen provider

 - New capability for OpenGL provider: `opengl offscreen` for OpenGL and
    `opengl es2 offscreen` for OpenGL ES 2. This is a capability for
    picture buffer producer, splitting the buffer management and the
    rendering itself, much like OpenGL in general.
 - Path leading to `filter_t` capable of running OpenGL pipeline,
     without taking care of the details regarding how OpenGL is
     provided. This is not entirely true given how OpenGL linking is
     done currently but will be with latter work.
 - Change the `gl->swap` prototype to return a `picture_t *`,
     non-offscreen implementation are returning NULL.
 - This is reusable for filters, but also for tests, tonemapper, etc.

The different OpenGL offscreen providers are generating pictures
at each `swap`, which are bound to the provider `video context`
and probably tied to a `picture_context_t` describing its opaque
content. During the design, it was expected that `decoder device`
would give the correct clue regarding which implementation should
be used when calling for `any` provider.

This axis has some design flaws regarding the `swap` function. In
particular `swap`  means we can still draw, because the context is
still active and it must have setup a new buffer, so it needs at least
two buffers (if we can provide the filters with the number of picture
they must allocate, it would be this number + 1). A better design scope
has not really be explored, but it could be replaced by some matching
`BeginPicture()` and `EndPicture()` calls for example.

It has been established this way to quickly develop the different
provider implementation, including today:
 - Android surfacetexture
 - EGL pixel buffer (doing CPU <-> GPU transfert)
 - EGL GBM buffers
 - iOS CVPixelBuffer buffers through EAGL
 - MacOSX CVPixelBuffer buffers through CGL

EGL pixel buffer would still work on Windows but wouldn't be a
correct high performance solution, but it could be replaced either
by EGL angle implementation or NVidia / AMD specific interop API
to share the buffer between D3D and OpenGL.

The WASM implementation is almost done too, given that the `opengl`
implementation already follows the same building blocks as this
construction, meaning sending `ImageBitmap` objects (towards either
the canvas that will display it or the future `opengl interop` that
will bind it to an OpenGL texture).

All those implementations makes it quite compatible with every
display we have and allows a clear separation between the filter
that provides the features and the `opengl offscreen` provider
that provides the platform integration bits.

4/ Rework OpenGL to link OpenGL in providers

 - Currently, OpenGL providers are linking the provider libraries like
     EGL, GLX, or CoreVideo but OpenGL is already sometimes needed in
     those kind of modules. But in the general case, OpenGL / OpenGL ES
     must be linked to the clients modules, using the VLC OpenGL vtable
     because they need to load it themselves and it's not always done in
     a dynamic way.
 - Instead, link OpenGL/ES to the provider modules. In particular, on
     Apple systems, it allows to avoid linking OpenGL framework to every
     filters, and on other systems it allows removing the OpenGL
     dependency from `libvlc_opengl` and `libvlc_opengles`.
 - Enable automatic testings of shaders, even though they would be
     generated dynamically during runtime, thanks to mocked OpenGL
     virtual table.

This part is also fixing the current issues on Windows where the
functions are dynamically loaded through WGL and not found instead of
being dynamically linked, and makes the filters more independant of
OpenGL, the last pain point being the includes themselves which should
either be generated from Khronos repository or at least replaced by
Khronos generic includes whenever possible.

5/ Handle OpenGL/ES at runtime

 - Merge both `libvlc_opengl` and `libvlc_opengles` in a single runtime
     and exposes the difference through a runtime parameter.
 - No need to build filters once for OpenGL and once for OpenGL ES.
 - Continue on the path of dynamic generation of shaders according to
     GLSL version, kind and capabilities, for better performance shaders.

This can have some «unwanted» side effects like forcing to compile
some OpenGL provider module twice because they should link and provide
both OpenGL and OpenGL ES, but it will avoid compiling the filter
modules twice so it's still a win.

6/ Add a filter_t module `opengl`

 - The module is able to receive a list of OpenGL filters and execute
     them in a single pipeline.
 - Uses the previous work on filters, interop, samplers and OpenGL
     offscreen to link the OpenGL pipeline with the `filter_t` pipeline.
 - Exposes a first "user" but non-friendly way to run the filters, but it
    is OpenGL-centered and is not adding the correct filters
    automatically in the pipeline. It starts fixing the flaws from 2/
    but needs what follow to fully implement the OpenGL pipeline in a
    transparent way.
- As a `filter_t`, it can be used everywhere a `filter_t` is possible.

This is the first step towards OpenGL filters in general, though
still primitive at this stage since OpenGL filters need to be
specified in a non-direct way in comparison with usual filters,
basically meaning for example:

    --video-filter=opengl{filters=adjust}

However, this brings a reusable component for other filters which
will be able to expose themselves as an usual `filter_t` without
this alienating syntax, and this filter is actually «more powerful»
in this case since it could be able to do things like adjust and
logo filters without additional copies.

We didn't explore the design scope for a different filter_t pipeline
that would be able to automatically gather filters that don't need
a copy since it would probably have a much bigger toll on
architectural changes.

7/ Add pf_close / rework virtual table for `filter_t`

 - Encompass the `filter_t` implementation as an object, like other
     parts of the pipeline (see `vout_window_t` for instance) instead of
     as a module.
 - Enable wrapping a `filter_t` into another in a simple way: you can
     open any `filter_t` implementation in the Open of your module to
     provide a specialization of a specified filter.

This is basically the additional missing building block to use the
previously defined `opengl` filter to expose a `video filter`
capability from OpenGL filters, like we describe in the next axis.

8/ Exposes OpenGL filters as `filter_t` through the `opengl` filter_t
    module

 - Add the `video filter` capability in OpenGL filters modules, whose
     `Activate` function will parse the filter parameters, define the
     VLC variable `opengl-filter` and open the `opengl` filter to run
     the `opengl filter` implementation.
 - Allows to expose some modules to the pipeline in an automatic way,
     and other modules for internal purpose.
 - This execution mode use OpenGL offscreen implementation to export the
     rendered image, with the offscreen video context, thus matching the
     push model completely, and is not a barrier for any future filter
     pipelining.

We currently exposed `deinterlace` (`yadif` and `yadif2x`) and `adjust`
filters with great success and few complexity, while allowing to parse
the module option config like every other modules. Thanks to the GSoC
program student Vatsin, we also have filters like `zoom` which can also
showcase the usage of the mouse, though it has not been added yet.

This also, and mainly, allows to expose the OpenGL filters like any
other filters within the interface, without changes to the interfaces.

9/ Add support for plane filters

 - Allow to filter picture planes separately instead of filtering the whole
     picture at once (typically for deinterlace filters).
 - If a filter sets the "filter_plane" flag, then the filter engine
     generates one output texture per plane and calls the filter once for each
     plane.
 - The sampler of the following (normal) filter will take care of chroma
     conversion to each resulting pixels as a single RGBA vector.

This is a building block for the yadif/2x filter we made, which allows to
reduce bandwidth and integrate with following filters through vlc_gl_sampler.
It mainly bends the sampler object which usually gather the different input
textures behind a single `vlc_texture` call, and is the only current way to
process non-RGB input into non-RGB output.

The thread is a bit dense, but I hope it's also short-enough considering
the real size of the work, which mostly started one year ago with the
points 2/ and 3/.

Feel free to provide any feedback regarding points that might have been
left unaddressed or miss-addressed, or request more details regarding
each axis. I'd invite you to open one subthread from the main topic for
each point you'd want to comment though, so as to ease the discussion
process and avoid confusion.

I hope it fills the need for a global picture over this work.

Regards,
--
Alexandre Janniaux
Videolabs