[x265] [PATCH] refactorizaton of the transform/quant path

dave dtyx265 at gmail.com
Tue Nov 18 22:33:22 CET 2014


I have been working on an sse2 idct8 assembler primitive.  Currently 
it's only performs a little better than the intrinsic.  It is based on 
the gcc assembler output of the intrinsic.

FYI, at first I simply converted the ssse3 idct8 assembler primitive to 
sse2 since is only uses 3 ssse3 instructions in 5 places and no sse3 
instructions but that performed poorly compared to the intrinsic 
version.  While similar, It seems like the intrinsic algorithm performs 
better than the multiple function and macro setup of the idct8 ssse3 
primitive.

I can submit what I have but there is probably more room for improvement.

On 11/18/2014 10:01 AM, Steve Borho wrote:
> On 11/18, praveen at multicorewareinc.com wrote:
>> # HG changeset patch
>> # User Praveen Tiwari
>> # Date 1416299427 -19800
>> # Node ID 706fa4af912bc1610478de8f09a651ae3e58624c
>> # Parent  2f0062f0791b822fa932712a56e6b0a14e976d91
>> refactorizaton of the transform/quant path.
>> This patch involves scaling down the DCT/IDCT coefficients from int32_t to int16_t
>> as they can be accommodated on int16_t without any introduction of encode error,
>> this allows us to clean up lots of DCT/IDCT intermediated buffers, optimize enode efficiency for different
>> cli options including noise reduction by reducing data movement operations, accommodating more number of
>> coefficients in a single register for SIMD operations. This patch include all necessary
>> changes for the transfor/quant path including unit test code.
> <snip>
>
>>       for (int pass = 0; pass < 2; pass++)
>> @@ -1564,7 +1418,7 @@
>>        * still somewhat rare on end-user PCs we still compile and link these SSE3
>>        * intrinsic SIMD functions */
>>   #if !HIGH_BIT_DEPTH
>> -    p.idct[IDCT_8x8] = idct8;
>> +//    p.idct[IDCT_8x8] = idct8;
>>       p.idct[IDCT_16x16] = idct16;
>>       p.idct[IDCT_32x32] = idct32;
>>   #endif
> Getting the intrinsic idct8 re-enabled or coded in assembly should be a
> priority.
>



More information about the x265-devel mailing list