[x264-devel] [PATCH 1/6] aarch64: Correctly sign extend int parameters in x264_plane_copy_core_neon
Martin Storsjö
martin at martin.st
Wed Nov 16 09:48:18 CET 2016
On Tue, 15 Nov 2016, Janne Grunau wrote:
> On 2016-11-14 23:54:48 +0200, Martin Storsjö wrote:
>> ---
>> common/aarch64/mc-a.S | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/common/aarch64/mc-a.S b/common/aarch64/mc-a.S
>> index 3a99fbe..a7a383d 100644
>> --- a/common/aarch64/mc-a.S
>> +++ b/common/aarch64/mc-a.S
>> @@ -1256,8 +1256,8 @@ endfunc
>> function x264_plane_copy_core_neon, export=1
>> add x8, x4, #15
>> and x4, x8, #~15
>> - sub x1, x1, x4
>> - sub x3, x3, x4
>> + sub x1, x1, w4, sxtw
>> + sub x3, x3, w4, sxtw
>
> This patch is not very consequential. I'd change it into
>
> add w8, w4, #15 // 32-bit write clears the upper 32-bit the register
> and w4, w8, #~15
> // safe use of the full reg since negative width makes no sense
> sub x1, x1, x4
> sub x3, x3, x4
Oh, indeed, I somehow missed that the first two lines also used x4. I'll
repost with your version of it.
// Martin
More information about the x264-devel
mailing list