[x264-devel] AArch64 NEON support
Janne Grunau
janne-x264 at jannau.net
Sat Jul 19 20:57:46 CEST 2014
Hi,
this is my initial set of patches for ARMv8 64-bit support. The patches
port the existing ARM 32-bit NEON asm to AArch64. This is more than a
rewrite. Improvements in the AArch64 arch are used where appropriate
(32 instad of 16 128bit vector register, new instructions, operations
across a vector for example). The differences to the 32-bit architecture
are large enough to make supporting both well from a single source at
least a hard problem if not impossible. The biggest issue is the change
in the SIMD register layout: ARMv7 NEON has essentially 32 64-bit
register, adjacent pairs of registers can be used as one 128-bit register.
ARMv8 NEON has 32 128-bit register. Less wide registers just operate
on the lower bits of a 128-bit register.
Performance stats (all on an apple A7/ipad mini): 4 times faster than
the C code (720p 4:2:0 y4m --profile high --preset medium). The factor
is higher for more expansive settings (larger than 6) and lower for less
expansive settings (~3).
Perhaps more interesting is the comparison to 32-bit code on the same
hardware. The 64-bit code is around 5-8% faster with enabled asm. That
is on the lower end of the expected range. I haven't looked if there is
something obvious missing yet. I'll spend the next weeks writing more asm
and will also look if existing asm can be improved. I don't have access
to the CPU cycle counter so micro optimizations are not done. That will
hopefully change soon.
The splitting of the first three build and utility patches is questionable.
The main objective was not to bury them in a large asm patch.
[PATCH 2/9] aarch64: add armv8 and neon cpu flags and test them
The armv8 flag is a little questionable and mostly there to disable the
asm functions via --asm/--no-asm. Both flags are always on. There is
still no user-space accessible register to check if the cpu supports
neon. OTOH advanced simd and floating point support is mandatory for
ARMv8 application processors.
Janne
More information about the x264-devel
mailing list