<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <font face="Verdana"><small>I thought I'd give x264 a blast on the
        PandaBoard ES (see <a class="moz-txt-link-freetext" href="http://pandaboard.org">http://pandaboard.org</a> for details but
        essentially a dual-core Cortex-A9 with NEON). I compiled it with
        no special options (even though it sets the target machine to
        Cortex-A8).<br>
        <br>
        It encodes for a short while (5-10 seconds worth of video) and
        then gets a segmentation fault. Recompiling with --debug and
        unwinding the stack I can see that it blows up in</small> </font><tt>mc_luma_neon</tt><small><font
        face="Verdana"> because the weight </font></small><tt>struct</tt><small><font
        face="Verdana"> passed in from </font></small><tt>x264_me_refine_qpel_rd</tt><font
      face="Verdana"> <small>via the </small></font><tt>COST_MV_SATD(
      bmx, bmy, bsatd, 0 )</tt><font face="Verdana"><small> macro called
        at encoder/me.c:1210 is an invalid pointer (interestingly
        0x53366970 which spells T69p in ascii). This weight parameter
        comes from the m structure in </small></font><tt>x264_me_refine_qpel_rd</tt><font
      face="Verdana"><small> which has the same corrupted value.<br>
        <br>
        The command line was: </small></font><small><tt>x264 -o </tt></small><small><tt>test.264
        --preset slower --input-res 720x576 test.yuv</tt></small><font
      face="Verdana"><small><br>
        <br>
        If I perform the same run with --no-asm it is amazingly slow but
        does not appear to crash. This would appear to indicate some
        problem with the NEON code. Running it single threaded doesn't
        help either.<br>
        <br>
        What does help is replacing the </small></font><small><tt>--preset
        slower</tt><font face="Verdana"> with </font></small><small><tt>--preset
        slow</tt><font face="Verdana">. <i>Now that's interesting!</i>
        That limits us quite a lot as to what is causing the problem.</font></small><br>
    <font face="Verdana"><small><br>
        Any ideas on this one? More specifically, what can I do to help
        with debugging this? I've avoided sending 126MB core files or
        detailed backtraces but I'm very happy to do whatever is needed
        to help track this down.<br>
        <br>
        As per the information on the web page, here is the gdb
        information:<br>
        <br>
      </small></font><small><tt>(gdb) bt<br>
        #0  mc_luma_neon (dst=0x36ac78 "\345\331\325\327\331\343\344",
        <incomplete sequence \345>, i_dst_stride=32,
        src=0xb5317254, i_src_stride=784, <br>
            mvx=0, mvy=0, i_width=8, i_height=8, weight=0x53366a70) at
        common/arm/mc-c.c:146<br>
        #1  0x0005e0c6 in x264_me_refine_qpel_rd (h=0x365c60,
        m=0xb5317240, i_lambda2=2322, i4=12, i_list=0) at
        encoder/me.c:1210<br>
        #2  0x000563a4 in x264_macroblock_analyse (h=0x365c60) at
        encoder/analyse.c:3362<br>
        #3  0x0001f9b2 in x264_slice_write (h=0x365c60) at
        encoder/encoder.c:2309<br>
        #4  0x0002057a in x264_slices_write (h=0x365c60) at
        encoder/encoder.c:2625<br>
        #5  0x0002539e in x264_threadpool_thread (pool=0x3853a0) at
        common/threadpool.c:69<br>
        #6  0xb6e4fed2 in start_thread () from
        /lib/arm-linux-gnueabihf/libpthread.so.0<br>
        #7  0xb6de6058 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6<br>
        #8  0xb6de6058 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6<br>
        Backtrace stopped: previous frame identical to this frame
        (corrupt stack?)<br>
        (gdb) disass $pc-32,$pc+32<br>
        Dump of assembler code from 0x71ff4 to 0x72034:<br>
           0x00071ff4 <mc_luma_neon+132>:       mov     r0, r5<br>
           0x00071ff6 <mc_luma_neon+134>:       mov     r1, r4<br>
           0x00071ff8 <mc_luma_neon+136>:       blx     r7<br>
           0x00071ffa <mc_luma_neon+138>:       ldr     r3, [r6,
        #44]   ; 0x2c<br>
           0x00071ffc <mc_luma_neon+140>:       cbz     r3,
        0x7204a <mc_luma_neon+218><br>
           0x00071ffe <mc_luma_neon+142>:       str     r6, [sp,
        #0]<br>
           0x00072000 <mc_luma_neon+144>:       str.w   r8, [sp,
        #4]<br>
           0x00072004 <mc_luma_neon+148>:       ldr.w   r6, [r3,
        r9, lsl #2]<br>
           0x00072008 <mc_luma_neon+152>:       mov     r0, r5<br>
           0x0007200a <mc_luma_neon+154>:       mov     r1, r4<br>
           0x0007200c <mc_luma_neon+156>:       mov     r2, r5<br>
           0x0007200e <mc_luma_neon+158>:       mov     r3, r4<br>
           0x00072010 <mc_luma_neon+160>:       blx     r6<br>
           0x00072012 <mc_luma_neon+162>:       b.n     0x7204a
        <mc_luma_neon+218><br>
        => 0x00072014 <mc_luma_neon+164>:       ldr     r1,
        [r6, #44]   ; 0x2c<br>
           0x00072016 <mc_luma_neon+166>:       cbz     r1,
        0x7202e <mc_luma_neon+190><br>
           0x00072018 <mc_luma_neon+168>:       mov.w   r9, r9,
        asr #2<br>
           0x0007201c <mc_luma_neon+172>:       str     r6, [sp,
        #0]<br>
           0x0007201e <mc_luma_neon+174>:       str.w   r8, [sp,
        #4]<br>
           0x00072022 <mc_luma_neon+178>:       ldr.w   r6, [r1,
        r9, lsl #2]<br>
           0x00072026 <mc_luma_neon+182>:       mov     r0, r5<br>
           0x00072028 <mc_luma_neon+184>:       mov     r1, r4<br>
           0x0007202a <mc_luma_neon+186>:       blx     r6<br>
           0x0007202c <mc_luma_neon+188>:       b.n     0x7204a
        <mc_luma_neon+218><br>
           0x0007202e <mc_luma_neon+190>:       movw    r1,
        #42212      ; 0xa4e4<br>
           0x00072032 <mc_luma_neon+194>:       movt    r1, #8<br>
        End of assembler dump.<br>
        (gdb) info all-registers<br>
        r0             0x0      0<br>
        r1             0xb5317254       3039916628<br>
        r2             0xb4b53172       3031773554<br>
        r3             0x310    784<br>
        r4             0x20     32<br>
        r5             0x36ac78 3583096<br>
        r6             0x53366a70       1396075120<br>
        r7             0x0      0<br>
        r8             0x8      8<br>
        r9             0x8      8<br>
        r10            0x3      3<br>
        r11            0x0      0<br>
        r12            0x0      0<br>
        sp             0xb5316a48       0xb5316a48<br>
        lr             0x0      0<br>
        pc             0x72014  0x72014 <mc_luma_neon+164><br>
        cpsr           0x40000130       1073742128<br>
      </tt></small><font face="Verdana"><small><tt><br>
        </tt><font face="Verdana">I've chopped off the rest of the
          registers as it very long and doesn't seem to contain any
          relevant information.</font><tt><br>
        </tt><br>
        Cheers,<br>
        <br>
        Jim.<br>
      </small></font>
  </body>
</html>