[vlc-devel] Bug with xvideo's copy64 assembler "movntdqa" instruction

Jeroen Ost jeroen.ost at gmail.com
Wed Apr 18 12:07:36 CEST 2012


I found an issue in xvideo with hardware acceleration on linux (both
VIA and NVIDIA graphic cards are tested).
Apparently it's a bug in modules/codec/avcodec/copy.c function "CopyFromUswc"
There is some code there to detected source pointers that are not
aligned to a 16 byte address. If first copies manually some bytes to
the next 16 byte multiple.
However, in this case COPY64(&dst[x], &src[x], "movntdqa", "movdqu");
ALWAYS segfaults.
It's easy to reproduce, you need a H264 encoded video with a size of
e.g 852x480 (852 is not a multiple of 16, so even if the first
scanline starts at a 16byte multiple, the second one will not).
So far the only workaround I've been able to implement is to comment
out that copy64 line (in the unaligned case) so it gets copied byte by
byte. That obviously has a serious performance impact, but it fixes
the segfault.

vlc --ffmpeg-hw -I dummy -V xv file.h264

My assembler knowledge is not good enough (especially those SSE4.1 instructions)

Could some advice how to properly implement the patch ?

Full gdb trace below

Jeroen Ost

Starting program: /usr/bin/vlc --ffmpeg-hw -V xv -f 2-854-480.ts
[Thread debugging using libthread_db enabled]
VLC media player 2.0.1 Twoflower (revision 2.0.0+git20120304+r133)
[New Thread 0x7ffff7fcb700 (LWP 28517)]
[New Thread 0x7ffff2c07700 (LWP 28518)]
04-18-2012  12:05:38.342007 [0x7ffff79a1480] main libvlc: Running vlc
with the default interface. Use 'cvlc' to use vlc without interface.
[New Thread 0x7fffeb5d5700 (LWP 28519)]
[New Thread 0x7fffeb0c1700 (LWP 28520)]
04-18-2012  12:05:38.392469 [0x7ffff088041a] [cli] lua interface:
Listening on host "*console".
VLC media player 2.0.1 Twoflower
Command Line Interface initialized. Type `help' for help.
> [Thread 0x7ffff2c07700 (LWP 28518) exited]
[New Thread 0x7fffe0538700 (LWP 28521)]
[New Thread 0x7fffdfd37700 (LWP 28522)]
[New Thread 0x7fffdf536700 (LWP 28523)]
[New Thread 0x7fffded35700 (LWP 28524)]
[New Thread 0x7ffff2c07700 (LWP 28525)]
[New Thread 0x7fffe8147700 (LWP 28526)]
VA H264 Video acceleration
[New Thread 0x7fffdceed700 (LWP 28527)]
libva: libva version 0.32.0
Xlib:  extension "XFree86-DRI" missing on display ":0.0".
libva: va_getDriverName() returns 0
libva: Trying to open /usr/lib/dri/nvidia_drv_video.so
libva: va_openDriver() returns 0
04-18-2012  12:05:39.217516 [0x7fffe2b7f29a] avcodec decoder: Using VA
API version 0.32 for hardware decoding.
[New Thread 0x7fffd1510700 (LWP 28528)]
[New Thread 0x7fffd0bf6700 (LWP 28529)]
[New Thread 0x7fffd08f2700 (LWP 28530)]
 format test 852 480 id=32595559
 format test 852 480 id=32315659
Findformat returns answer 313
xv autopaint colorkey: 0
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2c07700 (LWP 28525)]
0x00007fffe2b7a8f5 in CopyFromUswc (dst=0x7fffe41f60a0
"\020\020\020\020", dst_pitch=864,
    src=0x7fffe4160434 '\020' <repeats 104 times>,
src_pitch=852, width=852, height=18, cpu=4072) at copy.c:86
86	                    COPY64(&dst[x], &src[x], "movntdqa", "movdqu");
(gdb) bt
#0  0x00007fffe2b7a8f5 in CopyFromUswc (dst=0x7fffe41f60a0
"\020\020\020\020", dst_pitch=864,
    src=0x7fffe4160434 '\020' <repeats 104 times>,
src_pitch=852, width=852, height=18, cpu=4072) at copy.c:86
#1  0x00007fffe2b7ac89 in CopyPlane (dst=0xae6e90
"\350IK\367\377\177", dst_pitch=864, src=0x7fffe41600e0 '\020'
<repeats 102 times>, "\021\022\022\022\b#K^NNOONNMMML", 'M' <repeats
30 times>, "NNOOOOO", 'N' <repeats 29 times>, "OOPPQQQQPPOONN"...,
    cache=0x7fffe41f5d40 '\020' <repeats 102 times>,
"\021\022\022\022\b#K^NNOONNMMML", 'M' <repeats 30 times>, "NNOOOOO",
'N' <repeats 29 times>, "OOPPQQQQPPOONN"..., cache_size=<optimized
out>, width=852, height=480, cpu=4072) at copy.c:243
#2  0x00007fffe2b7b719 in CopyFromNv12 (dst=0x872260,
src=0x7ffff2c06b70, src_pitch=0x7ffff2c06b90, width=<optimized out>,
height=480, cache=0x7fffe400cd10) at copy.c:315
#3  0x00007fffe2b79b94 in Extract (p_external=0x7fffe400cbf0,
p_picture=0x872260, p_ff=<optimized out>) at vaapi.c:432
#4  0x00007fffe2b77484 in vlc_va_Extract (va=<optimized out>,
src=0x637860, dst=0x872260) at va.h:54
#5  ffmpeg_CopyPicture (p_ff_pic=0x637860, p_pic=0x872260,
p_dec=0x7fffe400b478) at video.c:923
#6  DecodeVideo (p_dec=0x7fffe400b478, pp_block=<optimized out>) at video.c:751
#7  0x00007ffff791c087 in DecoderDecodeVideo (p_dec=0x7fffe400b478,
p_block=0x7fffe446cae0) at input/decoder.c:1517
#8  0x00007ffff791baff in DecoderProcessVideo (b_flush=false,
p_block=0x67b7d0, p_dec=0x7fffe400b478) at input/decoder.c:1872
#9  DecoderProcess (p_dec=0x7fffe400b478, p_block=<optimized out>) at
#10 0x00007ffff791bd0b in DecoderThread (p_data=0x7fffe400b478) at
#11 0x00007ffff76c4efc in start_thread (arg=0x7ffff2c07700) at
#12 0x00007ffff71fb59d in clone () at
#13 0x0000000000000000 in ?? ()

More information about the vlc-devel mailing list