[x264-devel] [PATCH 1/1] arm: make the combined x264_pixel_sa8d_satd_16x16_neon faster

Martin Storsjö martin at martin.st
Tue Aug 25 10:51:01 CEST 2015


On Wed, 19 Aug 2015, Janne Grunau wrote:

> On 2015-08-13 23:59:42 +0300, Martin Storsjö wrote:
>> This requires spilling some registers to the stack,
>> contray to the aarch64 version.
>
> there are barely enough registers to use the same approach as on arm64.
> see below
>
>> checkasm timing        Cortex-A7      A8     A9
>> sa8d_satd_16x16_neon          14393   7427   9146
>> sa8d_satd_16x16_separate_neon 14624   7074   8294
>
> which should make the combined version faster on all three cpus, see my
> cortex-a9 results below.
>
> Feel free to squash this patch.

Thanks, looks good, will squash.

// Martin


More information about the x264-devel mailing list