[x264-devel] [PATCH 03/24] arm: Simplify x264_predict_8x8c_p_neon

Janne Grunau janne-x264 at jannau.net
Fri Aug 14 08:28:49 CEST 2015


On 2015-08-13 23:59:24 +0300, Martin Storsjö wrote:
> This gets rid of a few unnecessary (and confusing) steps in
> calculating the increment to i00.
> 
> checkasm timing      Cortex-A7    A8    A9
> intra_predict_8x8c_p_c      5525  4732  4755
> intra_predict_8x8c_p_neon   1719  1140  1262  (before)
> intra_predict_8x8c_p_neon   1663  1142  1255  (after)
> ---
>  common/arm/predict-a.S |    7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/common/arm/predict-a.S b/common/arm/predict-a.S
> index 3343144..7e5d9d3 100644
> --- a/common/arm/predict-a.S
> +++ b/common/arm/predict-a.S
> @@ -535,17 +535,12 @@ function x264_predict_8x8c_p_neon
>      vadd.i16    d16, d16, d0
>      vshl.i16    d2,  d16, #4
>      vsub.i16    d2,  d2,  d3
> -    vshl.i16    d3,  d4,  #3
>      vext.16     q0,  q0,  q0,  #7
> -    vsub.i16    d6,  d5,  d3
>      vmov.16     d0[0], r3
>      vmul.i16    q0,  q0,  d4[0]
>      vdup.16     q1,  d2[0]
> -    vdup.16     q2,  d4[0]
> -    vdup.16     q3,  d6[0]
> -    vshl.i16    q2,  q2,  #3
> +    vdup.16     q3,  d5[0]
>      vadd.i16    q1,  q1,  q0
> -    vadd.i16    q3,  q3,  q2
>      mov         r3,  #8
>  1:
>      vqshrun.s16 d0,  q1,  #5

ok

Janne


More information about the x264-devel mailing list