[x264-devel] [PATCH 2/2] arm: Implement x264_mbtree_propagate_{cost, list}_neon
Martin Storsjö
martin at martin.st
Thu Sep 3 09:53:17 CEST 2015
On Thu, 3 Sep 2015, Janne Grunau wrote:
> On 2015-09-03 09:30:44 +0300, Martin Storsjö wrote:
>> The cost function could be simplified to avoid having to clobber
>> q4/q5, but this requires reordering instructions which increase
>> the total runtime.
>>
>> checkasm timing Cortex-A7 A8 A9
>> mbtree_propagate_cost_c 63702 155835 62829
>> mbtree_propagate_cost_neon 17199 10454 11106
>
> any idea why the cortex-a8 c version is that bad? Different
> compiler/system?
These are all run with the same exact static binary, and the beaglebone
I've tested it on is completely idle, so it shouldn't really be any noise.
I can reproduce the numbers as well. No idea what is causing it though...
>> mbtree_propagate_list_c 104203 108949 84532
>> mbtree_propagate_list_neon 82035 78348 60410
>>
>> ---
>> Applied Janne's suggestions on mbtree_propagate_cost_neon, and squashed
>> his patch for mbtree_propagate_list_neon.
>> ---
>> common/arm/mc-a.S | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> common/arm/mc-c.c | 9 ++++
>> 2 files changed, 128 insertions(+)
>>
>> diff --git a/common/arm/mc-a.S b/common/arm/mc-a.S
>> index 5e0c117..b06b957 100644
>> --- a/common/arm/mc-a.S
>> +++ b/common/arm/mc-a.S
>> @@ -28,6 +28,11 @@
>>
>> #include "asm.S"
>>
>> +.section .rodata
>> +.align 4
>> +pw_0to15:
>> +.short 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
>> +
>> .text
>>
>> // note: prefetch stuff assumes 64-byte cacheline, true for the Cortex-A8
>> @@ -1760,3 +1765,117 @@ function integral_init8v_neon
>> 2:
>> bx lr
>> endfunc
>> +
>> +function x264_mbtree_propagate_cost_neon
>> + push {r4-r5,lr}
>> + ldrd r4, r5, [sp, #12]
>> + ldr lr, [sp, #20]
>> + vld1.32 {d6[], d7[]}, [r5]
>
> push {r11}
> ldrd r11, r12, [sp, #3]
> vld1.32 {d6[], d7[]}, [r12]
> ldr r12, [sp, #12]
>
> and adapt the rest. patch ok, no need to change this, it won't make a
> large difference (I'm not even sure if it'll be faster). just to
> satiesfy my OCD.
Hmm, neat. I'll keep that in mind if the patch needs to be remade for some
other reason.
// Martin
More information about the x264-devel
mailing list