[x264-devel] [PATCH] Add all remaining 16x16 predict Altivec routines
gpoirier at mplayerhq.hu
Tue Jan 13 23:50:25 CET 2009
Antoine Gerschenfeld a écrit :
> I got the following numbers from checkasm by calling the
> mach_absolute_time() function (counts nanoseconds) on MacOSX instead
> of rdtsc.
> I don't know how accurate they are : it seems you can't access the PPC
> performance counters on Darwin without a driver.
Well, I have made patch back in the days to add proper support for
PPC970's PMC counters, based on some FFmpeg's code, but didn't try to
make it part of x264's source tree.
The one thing that sucked about that aproach is that it required to be
root to set one of the PMC to monitor clock cycles.
> intra_predict_16x16_dc_c: 25
> intra_predict_16x16_dc_altivec: 16
> intra_predict_16x16_dc8_c: 17
> intra_predict_16x16_dc8_altivec: 9
> intra_predict_16x16_dcl_c: 23
> intra_predict_16x16_dcl_altivec: 13
> intra_predict_16x16_dct_c: 23
> intra_predict_16x16_dct_altivec: 13
> intra_predict_16x16_h_c: 17
> intra_predict_16x16_h_altivec: 54
> intra_predict_16x16_p_c: 290
> intra_predict_16x16_p_altivec: 26
> intra_predict_16x16_v_c: 17
> intra_predict_16x16_v_altivec: 11
> With the exception of intra_predict_16x16_h, all new functions seem to
> be faster than their C equivalents.
> This was on a PPC970 (quad G5). For reference, here is the checkasm
> patch I used :
I don't exactly have the same numbers over here (PPC970MP with GCC4.2 on
Leopard), but it's close enough.
I guess I'll have to drop intra_predict_16x16_h_altivec since I don't
know how to make it faster with Altivec, even after some unrolling.
However, it looks like doing some pseudo-64bits SIMD with general
purpose registers allows this code to go faster on that machine.
I'll experience more with that later one.
Thanks for the patch, thanks for your benchmark numbers.
More information about the x264-devel