[x264-devel] [PATCH 16/24] arm: Implement x264_deblock_h_chroma_422_neon
Martin Storsjö
martin at martin.st
Mon Aug 24 21:43:09 CEST 2015
On Thu, 20 Aug 2015, Janne Grunau wrote:
> On 2015-08-13 23:59:37 +0300, Martin Storsjö wrote:
>> checkasm timing Cortex-A7 A8 A9
>> deblock_h_chroma_422_c 6928 6194 5172
>> deblock_h_chroma_422_neon 3697 2720 2641
>> ---
>> common/arm/deblock-a.S | 19 +++++++++++++++++++
>> common/deblock.c | 4 ++--
>> 2 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
>> index 446e678..26e95ed 100644
>> --- a/common/arm/deblock-a.S
>> +++ b/common/arm/deblock-a.S
>> @@ -4,6 +4,7 @@
>> * Copyright (C) 2009-2015 x264 project
>> *
>> * Authors: Mans Rullgard <mans at mansr.com>
>> + * Martin Storsjo <martin at martin.st>
>> *
>> * This program is free software; you can redistribute it and/or modify
>> * it under the terms of the GNU General Public License as published by
>> @@ -261,6 +262,7 @@ function x264_deblock_h_chroma_neon
>> h264_loop_filter_start
>>
>> sub r0, r0, #4
>> +deblock_h_chroma:
>> vld1.8 {d18}, [r0], r1
>> vld1.8 {d16}, [r0], r1
>> vld1.8 {d0}, [r0], r1
>> @@ -290,6 +292,23 @@ function x264_deblock_h_chroma_neon
>> bx lr
>> endfunc
>>
>> +function x264_deblock_h_chroma_422_neon
>
> see the patch I just sent for the arm64 version. h264_loop_filter_start
> should be here since we deblock otherwise every 2nd line even if it
> should be skipped.
Good catch
>> + ldr ip, [sp]
>> + push {lr}
>> + push {ip}
>
> I think pushing just lr and adding an optional offset for loading tc0 in
> h264_loop_filter_start is cleaner.
That doesn't quite work, since if we've pushed something on the stack, the
return in h264_loop_filter_start won't pop that. But we can just call
h264_loop_filter_start before pushing lr, and then do (about) the same as
in your aarch64 version.
>> + add r1, r1, r1
>> + bl X(x264_deblock_h_chroma_neon)
>> + ldr ip, [sp]
>> + ldr ip, [ip]
>> + vdup.32 d24, ip
>> + sub r0, r0, r1, lsl #3
>> + add r0, r0, r1, lsr #1
>> + sub r0, r0, #2
>> + bl deblock_h_chroma
>> + pop {ip}
>> + pop {pc}
>
> if you restore the stack before branching to deblock_h_chroma you can
> return from there
Fixed locally.
// Martin
More information about the x264-devel
mailing list