[x264-devel] [PATCH 1/6] aarch64: Correctly sign extend int parameters in x264_plane_copy_core_neon

Martin Storsjö martin at martin.st
Wed Nov 16 09:48:18 CET 2016


On Tue, 15 Nov 2016, Janne Grunau wrote:

> On 2016-11-14 23:54:48 +0200, Martin Storsjö wrote:
>> ---
>>  common/aarch64/mc-a.S | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/common/aarch64/mc-a.S b/common/aarch64/mc-a.S
>> index 3a99fbe..a7a383d 100644
>> --- a/common/aarch64/mc-a.S
>> +++ b/common/aarch64/mc-a.S
>> @@ -1256,8 +1256,8 @@ endfunc
>>  function x264_plane_copy_core_neon, export=1
>>      add         x8,  x4,  #15
>>      and         x4,  x8,  #~15
>> -    sub         x1,  x1,  x4
>> -    sub         x3,  x3,  x4
>> +    sub         x1,  x1,  w4, sxtw
>> +    sub         x3,  x3,  w4, sxtw
>
> This patch is not very consequential. I'd change it into 
>
> add w8, w4, #15 // 32-bit write clears the upper 32-bit the register
> and w4, w8, #~15
> // safe use of the full reg since negative width makes no sense
> sub         x1,  x1,  x4
> sub         x3,  x3,  x4

Oh, indeed, I somehow missed that the first two lines also used x4. I'll 
repost with your version of it.

// Martin


More information about the x264-devel mailing list