[x264-devel] [PATCH 16/24] arm: Implement x264_deblock_h_chroma_422_neon

Martin Storsjö martin at martin.st
Mon Aug 24 21:43:09 CEST 2015


On Thu, 20 Aug 2015, Janne Grunau wrote:

> On 2015-08-13 23:59:37 +0300, Martin Storsjö wrote:
>> checkasm timing       Cortex-A7      A8     A9
>> deblock_h_chroma_422_c       6928    6194   5172
>> deblock_h_chroma_422_neon    3697    2720   2641
>> ---
>>  common/arm/deblock-a.S |   19 +++++++++++++++++++
>>  common/deblock.c       |    4 ++--
>>  2 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
>> index 446e678..26e95ed 100644
>> --- a/common/arm/deblock-a.S
>> +++ b/common/arm/deblock-a.S
>> @@ -4,6 +4,7 @@
>>   * Copyright (C) 2009-2015 x264 project
>>   *
>>   * Authors: Mans Rullgard <mans at mansr.com>
>> + *          Martin Storsjo <martin at martin.st>
>>   *
>>   * This program is free software; you can redistribute it and/or modify
>>   * it under the terms of the GNU General Public License as published by
>> @@ -261,6 +262,7 @@ function x264_deblock_h_chroma_neon
>>      h264_loop_filter_start
>>
>>      sub             r0,  r0,  #4
>> +deblock_h_chroma:
>>      vld1.8          {d18}, [r0], r1
>>      vld1.8          {d16}, [r0], r1
>>      vld1.8          {d0},  [r0], r1
>> @@ -290,6 +292,23 @@ function x264_deblock_h_chroma_neon
>>      bx              lr
>>  endfunc
>>
>> +function x264_deblock_h_chroma_422_neon
>
> see the patch I just sent for the arm64 version. h264_loop_filter_start
> should be here since we deblock otherwise every 2nd line even if it
> should be skipped.

Good catch

>> +    ldr             ip, [sp]
>> +    push            {lr}
>> +    push            {ip}
>
> I think pushing just lr and adding an optional offset for loading tc0 in
> h264_loop_filter_start is cleaner.

That doesn't quite work, since if we've pushed something on the stack, the 
return in h264_loop_filter_start won't pop that. But we can just call 
h264_loop_filter_start before pushing lr, and then do (about) the same as 
in your aarch64 version.

>> +    add             r1,  r1,  r1
>> +    bl              X(x264_deblock_h_chroma_neon)
>> +    ldr             ip,  [sp]
>> +    ldr             ip,  [ip]
>> +    vdup.32         d24, ip
>> +    sub             r0,  r0,  r1, lsl #3
>> +    add             r0,  r0,  r1, lsr #1
>> +    sub             r0,  r0,  #2
>> +    bl              deblock_h_chroma
>> +    pop             {ip}
>> +    pop             {pc}
>
> if you restore the stack before branching to deblock_h_chroma you can
> return from there

Fixed locally.

// Martin


More information about the x264-devel mailing list