[x264-devel] [PATCH] Add all remaining 16x16 predict Altivec routines
gerschen at gmail.com
Wed Jan 14 00:03:57 CET 2009
On 13 janv. 09, at 22:27, Alexander Strange wrote:
> It's accurate enough - PPC isn't severely out-of-order like x86, and
> units aren't CPU ticks so won't change meaning if it speedsteps.
> But it doesn't necessarily count in nanoseconds; you have to call
Thanks. Since checkasm is used for relative comparison, I guess I'll
leave it like this
(it seems the iPhone is the only platform that doesn't use a non-
trivial timebase anyway).
> The 0xFFFFFFFF looks unnecessary to me.
You're right. I just learned what the default behavior of a uint64_t
to a uint32_t cast is :-).
On 13 janv. 09, at 23:50, Guillaume POIRIER wrote:
> I don't exactly have the same numbers over here (PPC970MP with
> GCC4.2 on
> Leopard), but it's close enough.
My own runs exhibit slight variations, on the order of +/- 1 for the
and +/- 5 for the longest (intra_predict_16x16_p_c). Still, the
conclusions seem clear
> I guess I'll have to drop intra_predict_16x16_h_altivec since I don't
> know how to make it faster with Altivec, even after some unrolling.
> However, it looks like doing some pseudo-64bits SIMD with general
> purpose registers allows this code to go faster on that machine.
> I'll experience more with that later one.
Good luck !
gerschen at gmail.com
P.S. : New checkasm patch :
diff --git a/tools/checkasm.c b/tools/checkasm.c
index aeaf5fb..7825b97 100644
@@ -30,6 +30,10 @@
/* buf1, buf2: initialised to random data and shouldn't write into
uint8_t * buf1, * buf2;
/* buf3, buf4: used to store output */
@@ -80,6 +84,8 @@ static inline uint32_t read_time(void)
asm volatile( "rdtsc" :"=a"(a) ::"edx" );
+ return mach_absolute_time();
@@ -153,7 +159,8 @@ static void print_bench(void)
/* print sse2slow only if there's also a sse2fast
version of the same func */
b->cpu&X264_CPU_SSE2_IS_SLOW && j<MAX_CPUS &&
b.cpu&X264_CPU_SSE2_IS_FAST && !(b.cpu&X264_CPU_SSE3) ?
b->cpu&X264_CPU_SSE2 ? "sse2" :
- b->cpu&X264_CPU_MMX ? "mmx" : "c",
+ b->cpu&X264_CPU_MMX ? "mmx" :
+ b->cpu&X264_CPU_ALTIVEC ? "altivec" : "c",
b->cpu&X264_CPU_CACHELINE_32 ? "_c32" :
b->cpu&X264_CPU_CACHELINE_64 ? "_c64" :
b->cpu&X264_CPU_SSE_MISALIGN ? "_misalign" :
@@ -1448,7 +1455,7 @@ int main(int argc, char *argv)
if( argc > 1 && !strncmp( argv, "--bench", 7 ) )
-#if !defined(ARCH_X86) && !defined(ARCH_X86_64)
+#if !defined(ARCH_X86) && !defined(ARCH_X86_64) && !defined(SYS_MACOSX)
fprintf( stderr, "no --bench for your cpu until you port rdtsc
More information about the x264-devel