[x265] Fwd: [PATCH] refactorizaton of the transform/quant path
Steve Borho
steve at borho.org
Wed Nov 19 19:03:51 CET 2014
On 11/19, Praveen Tiwari wrote:
> On 11/18, praveen at multicorewareinc.com wrote:
> > for (int pass = 0; pass < 2; pass++)
> > @@ -1564,7 +1418,7 @@
> > * still somewhat rare on end-user PCs we still compile and link
> these SSE3
> > * intrinsic SIMD functions */
> > #if !HIGH_BIT_DEPTH
> > - p.idct[IDCT_8x8] = idct8;
> > +// p.idct[IDCT_8x8] = idct8;
> > p.idct[IDCT_16x16] = idct16;
> > p.idct[IDCT_32x32] = idct32;
> > #endif
>
> Getting the intrinsic idct8 re-enabled or coded in assembly should be a
> priority.
>
> [MC] We don't have any sse version of assembly code for IDCT_16x16
> and IDCT_32x32, only avx2 asm codes this is why intrinsic version is
> enabled. (We have AVX2 assembly for these two functions, but since AVX2 is
> still somewhat rare on end-user PCs we still compile and link these SSE3
> intrinsic SIMD functions). Further I will clean up idct8 intrinsic
> (disabled) code as we have sse and avx2 asm code for it so, I think it is
> no longer useful.
I tried removing this idct8 intrinsic primitive about two months ago but
there were complaints because the assembly version is SSSE3 and the
intrinsic version is SSE3. There are still many (mostly AMD) CPUs out
there with SSE3 but not SSSE3, so I had to put the intrinsic function
back.
So in order to remove the SSE3 intrinsic function, we would need to make
the assembly version also work on SSE3 CPUs.
--
Steve Borho
More information about the x265-devel
mailing list