[libdvbpsi-devel] [Videolan-devel] libdvbcsa - NEON acceleration

Rémi Denis-Courmont remi at remlab.net
Tue Jul 19 12:59:38 CEST 2011


   Hello again,

On Tue, 19 Jul 2011 13:01:56 +0300, Nikolay Nikolaev
<nicknickolaev at gmail.com> wrote:
>> Does this mean that there's no point in trying to do NEON in this code?
> I would still like to see this run on real hardware and see if it's
worth
> it at all.

Typically, well written but (non-un)rolled NEON assembly code will be a
bit faster than plain ARM code, but a lot slower than properly unrolled
code. So your code _might_ be better than nothing.

Except that...

>> Also, without explicit vector load/store (VLD and VST) instructions,
>> memory transfers to/from the coprocessor may be painfully slow.
>>
>> This probably means that BS_VAL can be done as VLD, I can't see where
> VST can be used though.

...transferring data between ARM registers and NEON coprocessor registers
is painfully slow according to the ARM documentation. You really need to
use VLD _and_ VST. Otherwise, it might well be slower than plain ARM code.

In other words, the existing macro structure of libdvbcsa is incompatible
with ARM NEON optimization (and probably some other SIMD sets as well).
This is not a critic for your code, but more a design error in libdvbcsa in
fact. I know those macros look nice, but they just don't fit with ARM NEON.

>> It might also be worth looking at aligned memory accesses, which are
much
>> faster in NEON than unaligned access. But that optimization is only
>> usable with assembly code (inline or out of line), not with intrinsics.

> As I said this is my first NEON experience, so I am not completely sure
I
> understand how to do that?

The only way is to use assembly. You can't write really good NEON code
with intrinsic C functions. If you don't want to use assembly files (.S or
.s), then you should learn to use GCC inline assembly.

> Now - if the code calls BS_SHL on an unaligned data, what can possibly
be
> done?

Point is, the code should do everything it can to align data.
One thing that libdvbcsa does get right is memory alignment, using
posix_memalign().

-- 
Rémi Denis-Courmont
http://www.remlab.net/


More information about the libdvbpsi-devel mailing list