[vlc-devel] [RFC] video filters, hw acceleration and 10bits

Rémi Denis-Courmont remi at remlab.net
Sat May 20 10:05:34 CEST 2017

Le perjantaina 19. toukokuuta 2017, 12.37.20 EEST Jean-Baptiste Kempf a 
écrit :
> In order to activate those hardware decoders, we need to have at least:
>  - hw decoding
>  + display with scaling
>  + subtitles blending in GPU,
>  - hw deinterlacing,
>  - snapshot,

- colour space conversion matrices,
- orientation (at least the 4 direct ones),
- HD/SD deinterlacing.

>  - hw adjust filter, aka gamma correction.

Yes but this is supported by colour matrices. Likewise transform is supported 
by orientation support.

> All of theses features are done or under work for all the platform we
> care about.

> 2/ The issue: CPU filters
> The issues we have is CPU filters: those do not work with hw decoding,
> and we can't port all of them to GPU in a timely fashion (some might
> be even impossible to port).

Which filters? SPU sources (incl. logo and marquee) and SPU filters should 
still work provided SPU blending. Most actual video filters are irrelevant.

> At the same time, we're seeing exactly more and more content with
> non-YUV 8bits chromas, notably with HEVC, and very few filters work
> in those cases too.

> Users expect VLC filters to work and don't understand why some filters
> work and why some other (when 10bits/RGB or when using hw) don't.

I actually believe that most users don´t use filters, or only use transform, 
deinterlace, maybe adjust, and rotate mistaken for transform.

> 3) Solutions
> There are a few solutions, but not all of them have the same User
> eXperience.
> a) It's very difficult to restart the decoder and the vout when
> a CPU filter is selected: glitches because a different vout will be
> used, we loose input and output frames, we'll need to wait for an I
> frame, we're not sure to be able to go back to the same state/position.
> We have already this difficulty with VT and Mediacodec when we restart
> them.
> And this also does not solve the non-8bits-chroma issue.

Right, although I don´t see a reason to restart the vout.

> b) Another solution, the most seamless we can do, is to copy back the
> video buffer to the CPU, filter it, and copy to the GPU when a CPU
> filter is requested. This is quite friendly for the user, but is of
> course slow.

It might tip the video memory usage over the edge and break completely, as it 
increases the number of YUV surfaces in the pipeline as a whole.

I would also hardly be surprised if some pipelines simply didn´t allow this 
change dynamically, and end up requiring (a) instead.

> This approach is much slower than full hardware rendering but,
> if you have SSE4.1 (MOVNTDQA), it uses a less CPU than full CPU
> decoding, which is a gain compared to what we have now.
> Every CPU for the last 10 years, have SSE4.1.

SSE would help untiling performance a lot while copying the data back to CPU. 
But that is entirely dependent on the device drivers including properly 
optimized untiling algorithms. VLC has no influence there. And then, it also 
depends on the decoding HAL exposing a function for that purpose.

VDPAU has VdpVideoSurfaceGetBitsYCbCr and obviously NVIDIA implements it. But 
IIRC, VA-API has no such thing. You cannot safely assume that SSE or a DMA 
makes copy to CPU fast, in general.

> We've benched that on Linux (vaapi), Windows (D3D), macOS (VT), and we
> always have faster results than full CPU decoding, notably in fullHD.
> However, it starts to be worse with 4K videos: only one thread (the vout
> one) can't cope with 2 GPU/CPU copy.
> c) We could also wait for all the filters to be in GPU/shaders, but
> that's unrealistic
> for this release, but should be a major goal of the next one.

Probably. However I believe that a visual computing "layman" would find 
writing shaders easier than x86 SIMD.

> d) Or just do nothing.

Lets get things straight. We do not have currently have any filter other than 
deinterlace with 10-bits SIMD support, and very few supporting 10-bits (in C) 
at all.
- Most filters do not work at all in high-depth at all, not even in software 
- Then the few that do support high-depth chromas (at least with software 
decoding) are not accelerated and so will not operate at usable speed anyway, 
at least not in 4K.
- And last, the deinterlacer seems premature to consider for 4K. And I 
consider it a requirement for enabled-by-default hardware acceleration in HD 
and lower resolutions.

So the choices are between:
- failing completely, and
- operating too slowly and with 8-bits downsampling.
Frankly, I don't care either way. But spending a lot of development & testing 
efforts only to make filters work at unusable performance in 4K 10-bits would 
seem disingenous.

> 4/ Necessary user interface improvements
> However, whatever we do, we need to mention this issue to the users.
> Notably, the filters dialog must be reworked to mention when we're using
> hw decoding, and we must not save by default the VLC filters. (#6873)
> Maybe we can open the correct settings when the user click on the
> message, to be able to reach easily the "hardware decoder" configuration.

Sounds about right.

> 5/ Opinions?
> In my opinion, it's better to insert CPU filters, even if they could be
> slow,
> than to do nothing. With enough warnings, though.
> This is by far the simplest for our users and our support.
> Moreover, this would solve the different chromas issue (#14037, #13066,
> #16466
> for example). We should probably use I420 as the middle-man, though.
> Of course this solution is not perfect, and we need, for the future VLC
> releases, to focus on getting way more GPU filters.
> But this will not be ready for 3.0.
> What are your opinions on the matter?

With the sole exception of the disastrous bilateral violation of the code 
freeze to merge the resume-seek feature, VLC has not delivered any features 
for over 3 years. Some major new features will have been held for more than 2 
years as things already stand.

Fixing this issue properly is impossible in the short term. I refuse to make 
any architectural changes to the core or filter chain that would further push 
back the release, especially not for the (almost) sole sake of the playing 
with the puzzle filter...

So I don´t really care if the filters are added or not, but I don´t want to 
alter the core either way.


More information about the vlc-devel mailing list