[vlc-devel] [PATCH] copy: add conversions to I420 and NV12

Tue Apr 28 15:39:11 CEST 2015

On Tue, Apr 28, 2015 at 3:20 PM, Jean-Baptiste Kempf <jb at videolan.org> wrote:
> On 28 Apr, Steve Lhomme wrote :
>> ---
>>  modules/video_chroma/copy.c | 117 ++++++++++++++++++++++++++++++++++++++++++++
>>  modules/video_chroma/copy.h |  12 +++++
>>  2 files changed, 129 insertions(+)
>>
>> diff --git a/modules/video_chroma/copy.c b/modules/video_chroma/copy.c
>> index cc98c92..8481d37 100644
>> --- a/modules/video_chroma/copy.c
>> +++ b/modules/video_chroma/copy.c
>> @@ -348,6 +348,59 @@ static void SSE_CopyFromYv12(picture_t *dst,
>>      }
>>      asm volatile ("emms");
>>  }
>> +static void SSE_CopyFromNv12ToI420(picture_t *dst,
>> +                             uint8_t *src[2], size_t src_pitch[2],
>> +                             unsigned width, unsigned height,
>> +                             copy_cache_t *cache, unsigned cpu)
>> +{
>> +    SSE_CopyPlane(dst->p[0].p_pixels, dst->p[0].i_pitch,
>> +                  src[0], src_pitch[0],
>> +                  cache->buffer, cache->size,
>> +                  width, height, cpu);
>> +    SSE_SplitPlanes(dst->p[1].p_pixels, dst->p[1].i_pitch,
>> +                    dst->p[2].p_pixels, dst->p[2].i_pitch,
>> +                    src[1], src_pitch[1],
>> +                    cache->buffer, cache->size,
>> +                    (width+1)/2, (height+1)/2, cpu);
>> +    asm volatile ("emms");
>> +}
>
> This one is exactly SSE_CopyFromNv12 with p[1] and p[2] inverted.
> I fear it is wrong since Nv12 and I420 are inverted wrt U and V order.

I did not test all the routines (no YV12 for ex) but this is the one
that's really needed and tested.

>> +static void SSE_CopyFromYv12ToI420(picture_t *dst,
>> +                             uint8_t *src[3], size_t src_pitch[3],
>> +                             unsigned width, unsigned height,
>> +                             copy_cache_t *cache, unsigned cpu)
>> +{
>> +    SSE_CopyPlane(dst->p[0].p_pixels, dst->p[0].i_pitch,
>> +                  src[0], src_pitch[0],
>> +                  cache->buffer, cache->size,
>> +                  width, height, cpu);
>> +    SSE_CopyPlane(dst->p[1].p_pixels, dst->p[1].i_pitch,
>> +                  src[2], src_pitch[2],
>> +                  cache->buffer, cache->size,
>> +                  width / 2, height / 2, cpu);
>> +    SSE_CopyPlane(dst->p[2].p_pixels, dst->p[2].i_pitch,
>> +                  src[1], src_pitch[1],
>> +                  cache->buffer, cache->size,
>> +                  width / 2, height / 2, cpu);
>> +    asm volatile ("emms");
>> +}
>
> This is not useful, as a function.
> First, having a video decoder that outputs Yv12 is rare, but then
> copying from Yv12 to Yv12 is trivial and converting from Yv12 to I420 is
> just a pointer swap.

All the YV12 routines may never be used as DXVA2 is supposed to output
NV12 by default. But the old code had support for YV12 and IMC3
surfaces, so I kept it.

When using D3D11 or DirectDraw the first filter output tried is RV32
and then I420, I422 and RV24. YV12 is never an option for the filter
so I have to be able to output I420.

See http://git.videolan.org/?p=vlc.git;a=blob;f=modules/video_chroma/chain.c;h=acf8b30b4b4450ead64bfeca2086f2fef04e57a2;hb=HEAD#l63

Or we could add YV12 in that list, at list on Windows where that's the
actual I420 variant of the system.

>> +static void SSE_CopyFromNv12ToNv12(picture_t *dst,
>> +                             uint8_t *src[2], size_t src_pitch[2],
>> +                             unsigned width, unsigned height,
>> +                             copy_cache_t *cache, unsigned cpu)
>> +{
>> +    SSE_CopyPlane(dst->p[0].p_pixels, dst->p[0].i_pitch,
>> +                  src[0], src_pitch[0],
>> +                  cache->buffer, cache->size,
>> +                  width, height, cpu);
>> +    SSE_CopyPlane(dst->p[1].p_pixels, dst->p[1].i_pitch,
>> +                  src[1], src_pitch[1],
>> +                  cache->buffer, cache->size,
>> +                  width, height/2, cpu);
>> +    asm volatile ("emms");
>> +}
>
> This looks very similar to SSE_CopyFromYv12, with n = 2.

Yes, the U and V planes are combined so they are twice the width of a
similar YV12 plane.

> Moreover, please be careful about (width+1) and (height+1)
>
>> +void CopyFromNv12ToI420(picture_t *dst, uint8_t *src[2], size_t src_pitch[2],
>> +                  unsigned width, unsigned height,
>> +                  copy_cache_t *cache)
>> +{
>> +#ifdef CAN_COMPILE_SSE2
>> +    unsigned cpu = vlc_CPU();
>> +    if (vlc_CPU_SSE2())
>> +        return SSE_CopyFromNv12ToI420(dst, src, src_pitch, width, height,
>> +                                cache, cpu);
>> +#else
>> +    (void) cache;
>> +#endif
>> +
>> +    CopyPlane(dst->p[0].p_pixels, dst->p[0].i_pitch,
>> +              src[0], src_pitch[0],
>> +              width, height);
>> +    SplitPlanes(dst->p[1].p_pixels, dst->p[1].i_pitch,
>> +                dst->p[2].p_pixels, dst->p[2].i_pitch,
>> +                src[1], src_pitch[1],
>> +                width/2, height/2);
>> +}
>
> Same as above. CopyFromNv12 does the same. And a swap is trivial.
>
>> +void CopyFromNv12ToNv12(picture_t *dst, uint8_t *src[2], size_t src_pitch[2],
>> +                  unsigned width, unsigned height,
>> +                  copy_cache_t *cache)
>> +{
>> +#ifdef CAN_COMPILE_SSE2
>> +    unsigned cpu = vlc_CPU();
>> +    if (vlc_CPU_SSE2())
>> +        return SSE_CopyFromNv12ToNv12(dst, src, src_pitch, width, height,
>> +                                cache, cpu);
>> +#else
>> +    (void) cache;
>> +#endif
>> +
>> +    CopyPlane(dst->p[0].p_pixels, dst->p[0].i_pitch,
>> +              src[0], src_pitch[0],
>> +              width, height);
>> +    CopyPlane(dst->p[1].p_pixels, dst->p[1].i_pitch,
>> +              src[1], src_pitch[1],
>> +              width, height/2);
>> +}
>
> This is quite similar to CopyFromYv12

Which is using 3 planes while here there are 2 with double width.

>>  void CopyFromYv12(picture_t *dst, uint8_t *src[3], size_t src_pitch[3],
>>                    unsigned width, unsigned height,
>>                    copy_cache_t *cache)
>> @@ -420,3 +516,24 @@ void CopyFromYv12(picture_t *dst, uint8_t *src[3], size_t src_pitch[3],
>>       CopyPlane(dst->p[2].p_pixels, dst->p[2].i_pitch,
>>                 src[2], src_pitch[2], width / 2, height / 2);
>>  }
>> +
>> +void CopyFromYv12ToI420(picture_t *dst, uint8_t *src[3], size_t src_pitch[3],
>> +                  unsigned width, unsigned height,
>> +                  copy_cache_t *cache)
>> +{
>> +#ifdef CAN_COMPILE_SSE2
>> +    unsigned cpu = vlc_CPU();
>> +    if (vlc_CPU_SSE2())
>> +        return SSE_CopyFromYv12ToI420(dst, src, src_pitch, width, height,
>> +                                cache, cpu);
>> +#else
>> +    (void) cache;
>> +#endif
>> +
>> +     CopyPlane(dst->p[0].p_pixels, dst->p[0].i_pitch,
>> +               src[0], src_pitch[0], width, height);
>> +     CopyPlane(dst->p[1].p_pixels, dst->p[1].i_pitch,
>> +               src[2], src_pitch[2], width / 2, height / 2);
>> +     CopyPlane(dst->p[2].p_pixels, dst->p[2].i_pitch,
>> +               src[1], src_pitch[1], width / 2, height / 2);
>> +}
>
> CopyFromYv12 is quite identical.
>
> I think we can do way better than this.
>
> --
> Jean-Baptiste Kempf
> http://www.jbkempf.com/ - +33 672 704 734
> Sent from my Electronic Device
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel