<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    I have some more information on this problem. Something is broken
    with the p4x4 partition with NEON.<br>
    <br>
    I can set all the options for slower with the single exception of
    the p4x4 partition. So, in other words the command line:<br>
    <br>
    x264 -o test.264 --preset slow --rc-lookahead 60 --ref 8 --subme 9
    --trellis 2 --partitions b8x8,i8x8,i4x4,p8x8 --input-res 720x576
    test.yuv<br>
    <br>
    works, but the command line:<br>
    <br>
    x264 -o test.264 --preset slow --rc-lookahead 60 --ref 8 --subme 9
    --trellis 2 --partitions p4x4,p8x8 --input-res 720x576 test.yuv<br>
    <br>
    doesn't (note p8x8 is needed for p4x4, even though they're called
    16x16 and 8x8 in the code).<br>
    <br>
    Also the command line:<br>
    <br>
    x264 -o test.264 --preset slow --partitions p4x4,p8x8 --input-res
    720x576 test.yuv<br>
    <br>
    appears to fail faster!<br>
    <br>
    Any ideas guys?<br>
    <br>
    Jim.<br>
    <br>
    <br>
    On 11/09/12 18:15, Jim Darby wrote:
    <blockquote cite="mid:504F71AB.9080304@gmail.com" type="cite">
      <meta http-equiv="content-type" content="text/html;
        charset=ISO-8859-1">
      <font face="Verdana"><small>I thought I'd give x264 a blast on the
          PandaBoard ES (see <a moz-do-not-send="true"
            class="moz-txt-link-freetext" href="http://pandaboard.org">http://pandaboard.org</a>
          for details but essentially a dual-core Cortex-A9 with NEON).
          I compiled it with no special options (even though it sets the
          target machine to Cortex-A8).<br>
          <br>
          It encodes for a short while (5-10 seconds worth of video) and
          then gets a segmentation fault. Recompiling with --debug and
          unwinding the stack I can see that it blows up in</small> </font><tt>mc_luma_neon</tt><small><font
          face="Verdana"> because the weight </font></small><tt>struct</tt><small><font
          face="Verdana"> passed in from </font></small><tt>x264_me_refine_qpel_rd</tt><font
        face="Verdana"> <small>via the </small></font><tt>COST_MV_SATD(

        bmx, bmy, bsatd, 0 )</tt><font face="Verdana"><small> macro
          called at encoder/me.c:1210 is an invalid pointer
          (interestingly 0x53366970 which spells T69p in ascii). This
          weight parameter comes from the m structure in </small></font><tt>x264_me_refine_qpel_rd</tt><font
        face="Verdana"><small> which has the same corrupted value.<br>
          <br>
          The command line was: </small></font><small><tt>x264 -o </tt></small><small><tt>test.264

          --preset slower --input-res 720x576 test.yuv</tt></small><font
        face="Verdana"><small><br>
          <br>
          If I perform the same run with --no-asm it is amazingly slow
          but does not appear to crash. This would appear to indicate
          some problem with the NEON code. Running it single threaded
          doesn't help either.<br>
          <br>
          What does help is replacing the </small></font><small><tt>--preset

          slower</tt><font face="Verdana"> with </font></small><small><tt>--preset

          slow</tt><font face="Verdana">. <i>Now that's interesting!</i>
          That limits us quite a lot as to what is causing the problem.</font></small><br>
      <font face="Verdana"><small><br>
          Any ideas on this one? More specifically, what can I do to
          help with debugging this? I've avoided sending 126MB core
          files or detailed backtraces but I'm very happy to do whatever
          is needed to help track this down.<br>
          <br>
          As per the information on the web page, here is the gdb
          information:<br>
          <br>
        </small></font><small><tt>(gdb) bt<br>
          #0  mc_luma_neon (dst=0x36ac78 "\345\331\325\327\331\343\344",
          <incomplete sequence \345>, i_dst_stride=32,
          src=0xb5317254, i_src_stride=784, <br>
              mvx=0, mvy=0, i_width=8, i_height=8, weight=0x53366a70) at
          common/arm/mc-c.c:146<br>
          #1  0x0005e0c6 in x264_me_refine_qpel_rd (h=0x365c60,
          m=0xb5317240, i_lambda2=2322, i4=12, i_list=0) at
          encoder/me.c:1210<br>
          #2  0x000563a4 in x264_macroblock_analyse (h=0x365c60) at
          encoder/analyse.c:3362<br>
          #3  0x0001f9b2 in x264_slice_write (h=0x365c60) at
          encoder/encoder.c:2309<br>
          #4  0x0002057a in x264_slices_write (h=0x365c60) at
          encoder/encoder.c:2625<br>
          #5  0x0002539e in x264_threadpool_thread (pool=0x3853a0) at
          common/threadpool.c:69<br>
          #6  0xb6e4fed2 in start_thread () from
          /lib/arm-linux-gnueabihf/libpthread.so.0<br>
          #7  0xb6de6058 in ?? () from
          /lib/arm-linux-gnueabihf/libc.so.6<br>
          #8  0xb6de6058 in ?? () from
          /lib/arm-linux-gnueabihf/libc.so.6<br>
          Backtrace stopped: previous frame identical to this frame
          (corrupt stack?)<br>
          (gdb) disass $pc-32,$pc+32<br>
          Dump of assembler code from 0x71ff4 to 0x72034:<br>
             0x00071ff4 <mc_luma_neon+132>:       mov     r0, r5<br>
             0x00071ff6 <mc_luma_neon+134>:       mov     r1, r4<br>
             0x00071ff8 <mc_luma_neon+136>:       blx     r7<br>
             0x00071ffa <mc_luma_neon+138>:       ldr     r3, [r6,
          #44]   ; 0x2c<br>
             0x00071ffc <mc_luma_neon+140>:       cbz     r3,
          0x7204a <mc_luma_neon+218><br>
             0x00071ffe <mc_luma_neon+142>:       str     r6, [sp,
          #0]<br>
             0x00072000 <mc_luma_neon+144>:       str.w   r8, [sp,
          #4]<br>
             0x00072004 <mc_luma_neon+148>:       ldr.w   r6, [r3,
          r9, lsl #2]<br>
             0x00072008 <mc_luma_neon+152>:       mov     r0, r5<br>
             0x0007200a <mc_luma_neon+154>:       mov     r1, r4<br>
             0x0007200c <mc_luma_neon+156>:       mov     r2, r5<br>
             0x0007200e <mc_luma_neon+158>:       mov     r3, r4<br>
             0x00072010 <mc_luma_neon+160>:       blx     r6<br>
             0x00072012 <mc_luma_neon+162>:       b.n     0x7204a
          <mc_luma_neon+218><br>
          => 0x00072014 <mc_luma_neon+164>:       ldr     r1,
          [r6, #44]   ; 0x2c<br>
             0x00072016 <mc_luma_neon+166>:       cbz     r1,
          0x7202e <mc_luma_neon+190><br>
             0x00072018 <mc_luma_neon+168>:       mov.w   r9, r9,
          asr #2<br>
             0x0007201c <mc_luma_neon+172>:       str     r6, [sp,
          #0]<br>
             0x0007201e <mc_luma_neon+174>:       str.w   r8, [sp,
          #4]<br>
             0x00072022 <mc_luma_neon+178>:       ldr.w   r6, [r1,
          r9, lsl #2]<br>
             0x00072026 <mc_luma_neon+182>:       mov     r0, r5<br>
             0x00072028 <mc_luma_neon+184>:       mov     r1, r4<br>
             0x0007202a <mc_luma_neon+186>:       blx     r6<br>
             0x0007202c <mc_luma_neon+188>:       b.n     0x7204a
          <mc_luma_neon+218><br>
             0x0007202e <mc_luma_neon+190>:       movw    r1,
          #42212      ; 0xa4e4<br>
             0x00072032 <mc_luma_neon+194>:       movt    r1, #8<br>
          End of assembler dump.<br>
          (gdb) info all-registers<br>
          r0             0x0      0<br>
          r1             0xb5317254       3039916628<br>
          r2             0xb4b53172       3031773554<br>
          r3             0x310    784<br>
          r4             0x20     32<br>
          r5             0x36ac78 3583096<br>
          r6             0x53366a70       1396075120<br>
          r7             0x0      0<br>
          r8             0x8      8<br>
          r9             0x8      8<br>
          r10            0x3      3<br>
          r11            0x0      0<br>
          r12            0x0      0<br>
          sp             0xb5316a48       0xb5316a48<br>
          lr             0x0      0<br>
          pc             0x72014  0x72014 <mc_luma_neon+164><br>
          cpsr           0x40000130       1073742128<br>
        </tt></small><font face="Verdana"><small><tt><br>
          </tt><font face="Verdana">I've chopped off the rest of the
            registers as it very long and doesn't seem to contain any
            relevant information.</font><tt><br>
          </tt><br>
          Cheers,<br>
          <br>
          Jim.<br>
        </small></font> </blockquote>
    <br>
  </body>
</html>