[x264-devel] [PATCH 16/24] arm: Implement x264_deblock_h_chroma_422_neon
Janne Grunau
janne-x264 at jannau.net
Thu Aug 20 14:01:32 CEST 2015
On 2015-08-13 23:59:37 +0300, Martin Storsjö wrote:
> checkasm timing Cortex-A7 A8 A9
> deblock_h_chroma_422_c 6928 6194 5172
> deblock_h_chroma_422_neon 3697 2720 2641
> ---
> common/arm/deblock-a.S | 19 +++++++++++++++++++
> common/deblock.c | 4 ++--
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
> index 446e678..26e95ed 100644
> --- a/common/arm/deblock-a.S
> +++ b/common/arm/deblock-a.S
> @@ -4,6 +4,7 @@
> * Copyright (C) 2009-2015 x264 project
> *
> * Authors: Mans Rullgard <mans at mansr.com>
> + * Martin Storsjo <martin at martin.st>
> *
> * This program is free software; you can redistribute it and/or modify
> * it under the terms of the GNU General Public License as published by
> @@ -261,6 +262,7 @@ function x264_deblock_h_chroma_neon
> h264_loop_filter_start
>
> sub r0, r0, #4
> +deblock_h_chroma:
> vld1.8 {d18}, [r0], r1
> vld1.8 {d16}, [r0], r1
> vld1.8 {d0}, [r0], r1
> @@ -290,6 +292,23 @@ function x264_deblock_h_chroma_neon
> bx lr
> endfunc
>
> +function x264_deblock_h_chroma_422_neon
see the patch I just sent for the arm64 version. h264_loop_filter_start
should be here since we deblock otherwise every 2nd line even if it
should be skipped.
> + ldr ip, [sp]
> + push {lr}
> + push {ip}
I think pushing just lr and adding an optional offset for loading tc0 in
h264_loop_filter_start is cleaner.
> + add r1, r1, r1
> + bl X(x264_deblock_h_chroma_neon)
> + ldr ip, [sp]
> + ldr ip, [ip]
> + vdup.32 d24, ip
> + sub r0, r0, r1, lsl #3
> + add r0, r0, r1, lsr #1
> + sub r0, r0, #2
> + bl deblock_h_chroma
> + pop {ip}
> + pop {pc}
if you restore the stack before branching to deblock_h_chroma you can
return from there
Janne
More information about the x264-devel
mailing list