<div dir="ltr">The same as with NEON. It would be great. Also autodetect unaligned access and endianess (not on x86). It requires some work and good testing. For instance, on my cortex-a7 target a test program sometimes just hangs (no trap) attempting an unaligned 64 bit access. Other thoughts: 1) these days most targets suitable for real-time high-bitrate processing of csa has ssse3 and neon; 2) this library is only valuable as long as this version of csa is in use.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-06 19:14 GMT+03:00 Jean-Baptiste Kempf <span dir="ltr"><<a href="mailto:jb@videolan.org" target="_blank">jb@videolan.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Can't we get autodetection at runtime, please?<br>
<br>
On 26 Jun, glenvt18 wrote :<br>
<div><div class="h5">> ---<br>
> <a href="http://configure.ac" rel="noreferrer" target="_blank">configure.ac</a> | 5 +++++<br>
> src/dvbcsa_bs.h | 3 +++<br>
> src/dvbcsa_bs_sse.h | 19 +++++++++++++++++++<br>
> 3 files changed, 27 insertions(+)<br>
><br>
> diff --git a/<a href="http://configure.ac" rel="noreferrer" target="_blank">configure.ac</a> b/<a href="http://configure.ac" rel="noreferrer" target="_blank">configure.ac</a><br>
> index 4dd0726..f978a02 100644<br>
> --- a/<a href="http://configure.ac" rel="noreferrer" target="_blank">configure.ac</a><br>
> +++ b/<a href="http://configure.ac" rel="noreferrer" target="_blank">configure.ac</a><br>
> @@ -13,6 +13,7 @@ AC_ARG_ENABLE(uint32, AC_HELP_STRING(--enable-uint32, [Use native 32 bits intege<br>
> AC_ARG_ENABLE(uint64, AC_HELP_STRING(--enable-uint64, [Use native 64 bits integers for bitslice]), enable_uint64=$enableval, enable_uint64=no)<br>
> AC_ARG_ENABLE(mmx, AC_HELP_STRING(--enable-mmx, [Use MMX for bitslice]), mmx_debug=$enableval, enable_mmx=no)<br>
> AC_ARG_ENABLE(sse2, AC_HELP_STRING(--enable-sse2, [Use SSE2 for bitslice]), sse2_debug=$enableval, enable_sse2=no)<br>
> +AC_ARG_ENABLE(ssse3, AC_HELP_STRING(--enable-ssse3, [Use SSSE3 for bitslice]), ssse3_debug=$enableval, enable_ssse3=no)<br>
> AC_ARG_ENABLE(altivec, AC_HELP_STRING(--enable-altivec, [Use AltiVec for bitslice]), altivec_debug=$enableval, enable_altivec=no)<br>
> AC_ARG_ENABLE(neon, AC_HELP_STRING(--enable-neon, [Use NEON for bitslice]), neon_debug=$enableval, enable_neon=no)<br>
><br>
> @@ -47,6 +48,10 @@ elif test "$enable_sse2" = "yes" ; then<br>
> AC_DEFINE(DVBCSA_USE_SSE, 1, Using SSE2 bitslice.)<br>
> GCC_CFLAGS="$GCC_CFLAGS -msse -msse2"<br>
><br>
> +elif test "$enable_ssse3" = "yes" ; then<br>
> + AC_DEFINE(DVBCSA_USE_SSSE3, 1, Using SSSE3 bitslice.)<br>
> + GCC_CFLAGS="$GCC_CFLAGS -mssse3"<br>
> +<br>
> elif test "$enable_altivec" = "yes" ; then<br>
> AC_DEFINE(DVBCSA_USE_ALTIVEC, 1, Using AltiVec bitslice.)<br>
> GCC_CFLAGS="$GCC_CFLAGS -maltivec -mabi=altivec"<br>
> diff --git a/src/dvbcsa_bs.h b/src/dvbcsa_bs.h<br>
> index 7145048..8162405 100644<br>
> --- a/src/dvbcsa_bs.h<br>
> +++ b/src/dvbcsa_bs.h<br>
> @@ -40,6 +40,9 @@<br>
> #elif defined(DVBCSA_USE_SSE)<br>
> # include "dvbcsa_bs_sse.h"<br>
><br>
> +#elif defined(DVBCSA_USE_SSSE3)<br>
> +# include "dvbcsa_bs_sse.h"<br>
> +<br>
> #elif defined(DVBCSA_USE_ALTIVEC)<br>
> # include "dvbcsa_bs_altivec.h"<br>
><br>
> diff --git a/src/dvbcsa_bs_sse.h b/src/dvbcsa_bs_sse.h<br>
> index f1b0c79..02ecb1b 100644<br>
> --- a/src/dvbcsa_bs_sse.h<br>
> +++ b/src/dvbcsa_bs_sse.h<br>
> @@ -29,6 +29,10 @@<br>
> #include <xmmintrin.h><br>
> #include <emmintrin.h><br>
><br>
> +#ifdef DVBCSA_USE_SSSE3<br>
> +#include <tmmintrin.h><br>
> +#endif<br>
> +<br>
> typedef __m128i dvbcsa_bs_word_t;<br>
><br>
> #define BS_BATCH_SIZE 128<br>
> @@ -54,4 +58,19 @@ typedef __m128i dvbcsa_bs_word_t;<br>
><br>
> #define BS_EMPTY()<br>
><br>
> +#ifdef DVBCSA_USE_SSSE3<br>
> +/* block cipher 2-word load with byte-deinterleaving */<br>
> +#define BS_LOAD_DEINTERLEAVE_8(ptr, var_lo, var_hi) \<br>
> + {\<br>
> + dvbcsa_bs_word_t a, b; \<br>
> + a = _mm_load_si128((ptr)); \<br>
> + b = _mm_load_si128((ptr) + 1); \<br>
> + a = _mm_shuffle_epi8(a, _mm_set_epi8(15,13,11,9,7,5,3,1,14,12,10,8,6,4,2,0)); \<br>
> + b = _mm_shuffle_epi8(b, _mm_set_epi8(15,13,11,9,7,5,3,1,14,12,10,8,6,4,2,0)); \<br>
> + var_lo = _mm_unpacklo_epi64(a, b); \<br>
> + var_hi = _mm_unpackhi_epi64(a, b); \<br>
> + }<br>
> #endif<br>
> +<br>
> +#endif<br>
> +<br>
> --<br>
> 1.9.1<br>
><br>
</div></div>> _______________________________________________<br>
> vlc-devel mailing list<br>
> To unsubscribe or modify your subscription options:<br>
> <a href="https://mailman.videolan.org/listinfo/vlc-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/vlc-devel</a><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
With my kindest regards,<br>
<br>
--<br>
Jean-Baptiste Kempf<br>
<a href="http://www.jbkempf.com/" rel="noreferrer" target="_blank">http://www.jbkempf.com/</a> - +33 672 704 734<br>
Sent from my Electronic Device<br>
_______________________________________________<br>
vlc-devel mailing list<br>
To unsubscribe or modify your subscription options:<br>
<a href="https://mailman.videolan.org/listinfo/vlc-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/vlc-devel</a><br>
</font></span></blockquote></div><br></div>