[x264-devel] [PATCH 1/1] arm: make the combined x264_pixel_sa8d_satd_16x16_neon faster
Martin Storsjö
martin at martin.st
Tue Aug 25 10:51:01 CEST 2015
On Wed, 19 Aug 2015, Janne Grunau wrote:
> On 2015-08-13 23:59:42 +0300, Martin Storsjö wrote:
>> This requires spilling some registers to the stack,
>> contray to the aarch64 version.
>
> there are barely enough registers to use the same approach as on arm64.
> see below
>
>> checkasm timing Cortex-A7 A8 A9
>> sa8d_satd_16x16_neon 14393 7427 9146
>> sa8d_satd_16x16_separate_neon 14624 7074 8294
>
> which should make the combined version faster on all three cpus, see my
> cortex-a9 results below.
>
> Feel free to squash this patch.
Thanks, looks good, will squash.
// Martin
More information about the x264-devel
mailing list