[vlc-devel] [PATCH 00/16] libdvbcsa: improve performance

glenvt18 glenvt18 at gmail.com
Fri Jun 26 13:19:54 CEST 2015


Hi folks.

I'm posting it here as there is no libdvbcsa or videolan-devel 
mailing lists.

This patch series considerably (up to 3 times) improves libdvbcsa 
performance on both x86 and ARM platforms. It also introduces NEON 
support on ARM.

Here are some benchmarks.

x86_64 Intel Celeron 847 @ 1100Mhz:

uint32   81/185
uint64   96/276
sse2    129/348
ssse3     -/386

ffdecsa(sse2)   325

2.99x faster, 1.19x faster than ffdecsa

x86_64 Intel Atom D425 @ 1800Mhz:

uint32   76/134
uint64  109/183
sse2    117/228
ssse3     -/218

ffdecsa(sse2)   180

1.95x faster, 1.27x faster than ffdecsa

ARM Cortex-A7 @ 912Mhz:

uint32   32/48
uint64   29/33
neon      -/84

2.63x faster

Notes:
line format: bs_word old/new [Mbit/s]
"old" means the latest svn snapshot.
Only decryption benchmarks are shown. Encryption figures are nearly
the same.
Tests were compiled with gcc 4.8.

I also tested 32-bit and 64-bit versions on powerpc (big-endian) with qemu.
Altivec build is broken. I could fix it, but have no ppc hardware to make
sure the performance wont suffer.

Please review.

glenvt18 (16):
  block cipher: improve performance
  stream cipher: refactoring
  stream cipher: optimizations
  Add ARM NEON support
  bitslice transform: rewrite
  neon: add matrix transpose macros
  block cipher: use one lookup table for sbox and permutation
  neon: add deinterleaving macro
  ssse3: add deinterleaving macro and SSSE3 option
  Add deinterleaving test case.
  stream cipher: use the same buffer for input and output
  Add matrix transpose ops tests
  Fix automake version check
  Fix C++ compilation
  Change compiler options
  Remove unused attribute

 bootstrap                          |   4 +-
 configure.ac                       |  26 ++-
 src/Makefile.am                    |  26 ++-
 src/dvbcsa_algo.c                  |   2 +-
 src/dvbcsa_bs.h                    |  12 +-
 src/dvbcsa_bs_algo.c               |   2 +-
 src/dvbcsa_bs_block.c              | 267 ++++++++++++++++++------
 src/dvbcsa_bs_neon.h               |  95 +++++++++
 src/dvbcsa_bs_sse.h                |  19 ++
 src/dvbcsa_bs_stream.c             | 410 +++----------------------------------
 src/dvbcsa_bs_stream_kernel.h      |  22 ++
 src/dvbcsa_bs_stream_kernel.inc    | 261 +++++++++++++++++++++++
 src/dvbcsa_bs_transpose.c          | 112 ----------
 src/dvbcsa_bs_transpose.h          |  71 +++++++
 src/dvbcsa_bs_transpose128.c       | 209 -------------------
 src/dvbcsa_bs_transpose32.c        | 185 -----------------
 src/dvbcsa_bs_transpose64.c        | 186 -----------------
 src/dvbcsa_bs_transpose_block.c    |  97 +++++++++
 src/dvbcsa_bs_transpose_stream.c   | 231 +++++++++++++++++++++
 src/dvbcsa_bs_transpose_stream32.c | 150 ++++++++++++++
 src/dvbcsa_pv.h                    |  78 ++++++-
 test/testbitslice.c                |   2 +-
 test/testbsops.c                   | 132 ++++++++++++
 test/testdec.c                     |   2 +-
 test/testenc.c                     |   2 +-
 25 files changed, 1421 insertions(+), 1182 deletions(-)
 create mode 100644 src/dvbcsa_bs_neon.h
 create mode 100644 src/dvbcsa_bs_stream_kernel.h
 create mode 100644 src/dvbcsa_bs_stream_kernel.inc
 delete mode 100644 src/dvbcsa_bs_transpose.c
 create mode 100644 src/dvbcsa_bs_transpose.h
 delete mode 100644 src/dvbcsa_bs_transpose128.c
 delete mode 100644 src/dvbcsa_bs_transpose32.c
 delete mode 100644 src/dvbcsa_bs_transpose64.c
 create mode 100644 src/dvbcsa_bs_transpose_block.c
 create mode 100644 src/dvbcsa_bs_transpose_stream.c
 create mode 100644 src/dvbcsa_bs_transpose_stream32.c

-- 
1.9.1




More information about the vlc-devel mailing list