[x264-devel] [PATCH] arm: Load mb_y properly in x264_mbtree_propagate_list_internal_neon

Janne Grunau janne-x264 at jannau.net
Tue Dec 27 10:53:25 CET 2016


On 2016-12-27 00:22:48 +0200, Martin Storsjö wrote:
> The previous version, attempting to load two stack parameters at once,
> only would have worked if they were interpreted and loaded as 32 bit
> elements, not when loading them as 16 bit elements.
> ---
>  common/arm/mc-a.S | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/common/arm/mc-a.S b/common/arm/mc-a.S
> index 165c1fa..8c15191 100644
> --- a/common/arm/mc-a.S
> +++ b/common/arm/mc-a.S
> @@ -1818,13 +1818,14 @@ function x264_mbtree_propagate_cost_neon
>  endfunc
>  
>  function x264_mbtree_propagate_list_internal_neon
> -    vld2.16         {d4[], d5[]}, [sp]      @ bipred_weight, mb_y
> +    vld1.16         {d4[]}, [sp]            @ bipred_weight
>      movrel          r12, pw_0to15
>      vmov.u16        q10, #0xc000
>      vld1.16         {q0},  [r12, :128]      @h->mb.i_mb_x,h->mb.i_mb_y
> +    ldrh            r12,  [sp, #4]
>      vmov.u32        q11, #4
>      vmov.u8         q3,  #32
> -    vdup.u16        q8,  d5[0]              @ mb_y
> +    vdup.u16        q8,  r12                @ mb_y
>      vzip.u16        q0,  q8
>      ldr             r12, [sp, #8]

ok

Janne



More information about the x264-devel mailing list