[x264-devel] [PATCH] Add all remaining 16x16 predict Altivec routines

Alexander Strange astrange at ithinksw.com
Tue Jan 13 22:27:09 CET 2009


On Jan 13, 2009, at 4:06 PM, Antoine Gerschenfeld wrote:

> Hello,
>
> I got the following numbers from checkasm by calling the
> mach_absolute_time() function (counts nanoseconds) on MacOSX instead
> of rdtsc.
> I don't know how accurate they are : it seems you can't access the PPC
> performance counters on Darwin without a driver.

It's accurate enough - PPC isn't severely out-of-order like x86, and the
units aren't CPU ticks so won't change meaning if it speedsteps.
But it doesn't necessarily count in nanoseconds; you have to call  
mach_timebase_info():
http://developer.apple.com/qa/qa2004/qa1398.html#LISTMACHTIMEBASEINT

And for thorough benchmarking you should remember to drop unexpectedly  
large times,
since they're probably from unexpectedly long context switches.

I don't see any problem with these numbers, though.

Also, remember to check out CHUD and Shark.app if you're really  
profiling.

> intra_predict_16x16_dc_c: 25
> intra_predict_16x16_dc_altivec: 16
> intra_predict_16x16_dc8_c: 17
> intra_predict_16x16_dc8_altivec: 9
> intra_predict_16x16_dcl_c: 23
> intra_predict_16x16_dcl_altivec: 13
> intra_predict_16x16_dct_c: 23
> intra_predict_16x16_dct_altivec: 13
> intra_predict_16x16_h_c: 17
> intra_predict_16x16_h_altivec: 54
> intra_predict_16x16_p_c: 290
> intra_predict_16x16_p_altivec: 26
> intra_predict_16x16_v_c: 17
> intra_predict_16x16_v_altivec: 11
>
> With the exception of intra_predict_16x16_h, all new functions seem to
> be faster than their C equivalents.
>
> This was on a PPC970 (quad G5). For reference, here is the checkasm
> patch I used :
>
> diff --git a/tools/checkasm.c b/tools/checkasm.c
> index aeaf5fb..7825b97 100644
> --- a/tools/checkasm.c
> +++ b/tools/checkasm.c
> @@ -30,6 +30,10 @@
>  #include "common/common.h"
>  #include "common/cpu.h"
>
> +#ifdef SYS_MACOSX
> +#include <mach/mach_time.h>
> +#endif
> +
>  /* buf1, buf2: initialised to random data and shouldn't write into
> them */
>  uint8_t * buf1, * buf2;
>  /* buf3, buf4: used to store output */
> @@ -80,6 +84,8 @@ static inline uint32_t read_time(void)
>      uint32_t a;
>      asm volatile( "rdtsc" :"=a"(a) ::"edx" );
>      return a;
> +#elif defined(SYS_MACOSX)
> +   return mach_absolute_time() & 0xFFFFFFFF;
>  #else
>      return 0;
>  #endif

The 0xFFFFFFFF looks unnecessary to me.

> [...]
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel



More information about the x264-devel mailing list