[x264-devel] [PATCH 09/24] arm: Add x264_nal_escape_neon

Martin Storsjö martin at martin.st
Mon Aug 24 19:27:30 CEST 2015


On Tue, 18 Aug 2015, Janne Grunau wrote:

> On 2015-08-13 23:59:30 +0300, Martin Storsjö wrote:
>> checkasm timing      Cortex-A7      A8      A9
>> nal_escape_c                908338  878032  633692
>> nal_escape_neon             379946  451936  373471
>> ---
>>  Makefile                 |    2 +-
>>  common/arm/bitstream-a.S |   89 ++++++++++++++++++++++++++++++++++++++++++++++
>>  common/bitstream.c       |    4 +++
>>  3 files changed, 94 insertions(+), 1 deletion(-)
>>  create mode 100644 common/arm/bitstream-a.S
>>
>> diff --git a/Makefile b/Makefile
>> index 6193c59..4403a11 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -119,7 +119,7 @@ ifeq ($(SYS_ARCH),ARM)
>>  ifneq ($(AS),)
>>  ASMSRC += common/arm/cpu-a.S common/arm/pixel-a.S common/arm/mc-a.S \
>>            common/arm/dct-a.S common/arm/quant-a.S common/arm/deblock-a.S \
>> -          common/arm/predict-a.S
>> +          common/arm/predict-a.S common/arm/bitstream-a.S
>>  SRCS   += common/arm/mc-c.c common/arm/predict-c.c
>>  OBJASM  = $(ASMSRC:%.S=%.o)
>>  endif
>> diff --git a/common/arm/bitstream-a.S b/common/arm/bitstream-a.S
>> new file mode 100644
>> index 0000000..62f9c96
>> --- /dev/null
>> +++ b/common/arm/bitstream-a.S
>> @@ -0,0 +1,89 @@
>> +/*****************************************************************************
>> + * bitstream-a.S: arm bitstream functions
>> + *****************************************************************************
>> + * Copyright (C) 2014-2015 x264 project
>> + *
>> + * Authors: Janne Grunau <janne-x264 at jannau.net>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
>> + *
>> + * This program is also available under a commercial proprietary license.
>> + * For more information, contact us at licensing at x264.com.
>> + *****************************************************************************/
>> +
>> +#include "asm.S"
>> +
>> +function x264_nal_escape_neon
>> +    push        {r4-r9}
>
> I'm not quite sure if you need all those registers. I certainly only
> used that many because arm64 has enough caller saved registers. Also lr
> is usually in the register list when registers are pushed/popped to/from
> the stack. The function returns becomes then pop {rx-ry, pc} instead of
> pop; bx

Done locally; I got it down to r4-r5,lr.

>> +    vpush       {q4-q7}
>
> please use q8-q15, I know this register number conflicts are annoying
> when porting neon between arm and arm64 (both ways, I endured )

Doh, yes - and this is one of the really simple ones to remap.

// Martin


More information about the x264-devel mailing list