[x264-devel] [PATCH 16/24] arm: Implement x264_deblock_h_chroma_422_neon

Janne Grunau janne-x264 at jannau.net
Thu Aug 20 14:01:32 CEST 2015


On 2015-08-13 23:59:37 +0300, Martin Storsjö wrote:
> checkasm timing       Cortex-A7      A8     A9
> deblock_h_chroma_422_c       6928    6194   5172
> deblock_h_chroma_422_neon    3697    2720   2641
> ---
>  common/arm/deblock-a.S |   19 +++++++++++++++++++
>  common/deblock.c       |    4 ++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
> index 446e678..26e95ed 100644
> --- a/common/arm/deblock-a.S
> +++ b/common/arm/deblock-a.S
> @@ -4,6 +4,7 @@
>   * Copyright (C) 2009-2015 x264 project
>   *
>   * Authors: Mans Rullgard <mans at mansr.com>
> + *          Martin Storsjo <martin at martin.st>
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
> @@ -261,6 +262,7 @@ function x264_deblock_h_chroma_neon
>      h264_loop_filter_start
>  
>      sub             r0,  r0,  #4
> +deblock_h_chroma:
>      vld1.8          {d18}, [r0], r1
>      vld1.8          {d16}, [r0], r1
>      vld1.8          {d0},  [r0], r1
> @@ -290,6 +292,23 @@ function x264_deblock_h_chroma_neon
>      bx              lr
>  endfunc
>  
> +function x264_deblock_h_chroma_422_neon

see the patch I just sent for the arm64 version. h264_loop_filter_start 
should be here since we deblock otherwise every 2nd line even if it 
should be skipped.

> +    ldr             ip, [sp]
> +    push            {lr}
> +    push            {ip}

I think pushing just lr and adding an optional offset for loading tc0 in 
h264_loop_filter_start is cleaner.

> +    add             r1,  r1,  r1
> +    bl              X(x264_deblock_h_chroma_neon)
> +    ldr             ip,  [sp]
> +    ldr             ip,  [ip]
> +    vdup.32         d24, ip
> +    sub             r0,  r0,  r1, lsl #3
> +    add             r0,  r0,  r1, lsr #1
> +    sub             r0,  r0,  #2
> +    bl              deblock_h_chroma
> +    pop             {ip}
> +    pop             {pc}

if you restore the stack before branching to deblock_h_chroma you can 
return from there


Janne


More information about the x264-devel mailing list