[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16
Guillaume POIRIER
gpoirier at mplayerhq.hu
Fri Sep 8 00:24:50 CEST 2006
- Previous message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8
- Next message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Hi,
Loren Merritt a écrit :
> On Wed, 6 Sep 2006, Guillaume POIRIER wrote:
>>
>> Another day, another revision of my patchset. In today's menu:
>> improved all quant routines to yet again shave off a couple of
>> percents of CPU cycles, and some more GCC3.3 fixes.
>>
>> Get it while it's hot, and please test && review,
>
>> +#ifdef ARCH_PPC
>> +if( cpu&X264_CPU_ALTIVEC )
>> +{
>> + /* determine the biggest coeffient in all quant8_mf tables */
>> + for( i = 0; i < 2*6*8*8; i++ )
>> + {
>> + int q = h->quant8_mf[0][0][0][i];
>> + if( maxQ8 < q )
>> + maxQ8 = q;
>> + }
> [...]
>
> Duplicate code. Just move the first copy out of if(mmx).
>
>> +static const int def_quant4_mf[6][4][4]
>> __attribute__((__aligned__(16))) =
>
> DECLARE_ALIGNED. yes, it works with assignments too.
Ok, the attached patch fixes these problems and adds routine
pixel_sa8d_16x16 optimized in Altivec.
Next port FFmpeg's START/STOP_TIMER macros to be able to use them with
PMCs. As noted in the doc
ffmpeg_powerpc_performance_evaluation_howto.txt, Time Base Registers
(TBL), which is somewhat the equivalent of x86's rdtsc register, has the
disadvantge that it can't provide an accurate measurement: the registers
increment by one every four *bus* cycles. That's too rough for precise
measurement.
That's why in practice you wanna use PMCs, which are equivalent to all
the various event counters available on x86 through tools such as
Oprofile and Perfmon.
What "sucks" about not having a hard coded register that counts CPU
cycles is that you need to initialize it (or them, as chips such as 970
has 8 PMCs, however due to the routing constrains inside the chip, it's
not possible to monitor any 8 events at the time. Some of them are just
mutually exclusive).
Anyway, since you can't really know in advance which of the 8 regs holds
the CPU counter, you somehow have to pass this information to the
START/STOP macros.
The only way I see is just to let the user set it up by a #define, but
if someone knows if there's a way to check dynamically which of the PMC
records CPU cycles, I'd be glad to hear that.
Anyway, that will only be present in future patches...
Guillaume
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Altivec_quant+dct+pixel_routines+PMC_11.diff
Type: text/x-patch
Size: 44807 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20060908/38fb0c21/attachment.bin
- Previous message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8
- Next message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the x264-devel
mailing list