[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16

Guillaume POIRIER gpoirier at mplayerhq.hu
Fri Sep 8 00:24:50 CEST 2006


Hi,

Loren Merritt a écrit :
> On Wed, 6 Sep 2006, Guillaume POIRIER wrote:
>>
>> Another day, another revision of my patchset. In today's menu:
>> improved all quant routines to yet again shave off a couple of
>> percents of CPU cycles, and some more GCC3.3 fixes.
>>
>> Get it while it's hot, and please test && review,
> 
>> +#ifdef ARCH_PPC
>> +if( cpu&X264_CPU_ALTIVEC )
>> +{
>> +    /* determine the biggest coeffient in all quant8_mf tables */
>> +    for( i = 0; i < 2*6*8*8; i++ )
>> +    {
>> +        int q = h->quant8_mf[0][0][0][i];
>> +        if( maxQ8 < q )
>> +            maxQ8 = q;
>> +    }
> [...]
> 
> Duplicate code. Just move the first copy out of if(mmx).
> 
>> +static const int def_quant4_mf[6][4][4] 
>> __attribute__((__aligned__(16))) =
> 
> DECLARE_ALIGNED. yes, it works with assignments too.


Ok, the attached patch fixes these problems and adds routine 
pixel_sa8d_16x16 optimized in Altivec.

Next port FFmpeg's START/STOP_TIMER macros to be able to use them with 
PMCs. As noted in the doc 
ffmpeg_powerpc_performance_evaluation_howto.txt, Time Base Registers 
(TBL), which is somewhat the equivalent of x86's rdtsc register, has the 
disadvantge that it can't provide an accurate measurement: the registers 
increment by one every four *bus* cycles. That's too rough for precise 
measurement.

That's why in practice you wanna use PMCs, which are equivalent to all 
the various event counters available on x86 through tools such as 
Oprofile and Perfmon.
What "sucks" about not having a hard coded register that counts CPU 
cycles is that you need to initialize it (or them, as chips such as 970 
has 8 PMCs, however due to the routing constrains inside the chip, it's 
not possible to monitor any 8 events at the time. Some of them are just 
mutually exclusive).
Anyway, since you can't really know in advance which of the 8 regs holds 
the CPU counter, you somehow have to pass this information to the 
START/STOP macros.
The only way I see is just to let the user set it up by a #define, but 
if someone knows if there's a way to check dynamically which of the PMC 
records CPU cycles, I'd be glad to hear that.

Anyway, that will only be present in future patches...

Guillaume
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Altivec_quant+dct+pixel_routines+PMC_11.diff
Type: text/x-patch
Size: 44807 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20060908/38fb0c21/attachment.bin 


More information about the x264-devel mailing list