[x264-devel] [PATCH] Add all remaining 16x16 predict Altivec routines
Antoine Gerschenfeld
gerschen at gmail.com
Wed Jan 14 00:03:57 CET 2009
On 13 janv. 09, at 22:27, Alexander Strange wrote:
> It's accurate enough - PPC isn't severely out-of-order like x86, and
> the
> units aren't CPU ticks so won't change meaning if it speedsteps.
> But it doesn't necessarily count in nanoseconds; you have to call
> mach_timebase_info():
> http://developer.apple.com/qa/qa2004/qa1398.html#LISTMACHTIMEBASEINT
Thanks. Since checkasm is used for relative comparison, I guess I'll
leave it like this
(it seems the iPhone is the only platform that doesn't use a non-
trivial timebase anyway).
> The 0xFFFFFFFF looks unnecessary to me.
You're right. I just learned what the default behavior of a uint64_t
to a uint32_t cast is :-).
On 13 janv. 09, at 23:50, Guillaume POIRIER wrote:
> I don't exactly have the same numbers over here (PPC970MP with
> GCC4.2 on
> Leopard), but it's close enough.
My own runs exhibit slight variations, on the order of +/- 1 for the
shorter functions
and +/- 5 for the longest (intra_predict_16x16_p_c). Still, the
conclusions seem clear
enough...
> I guess I'll have to drop intra_predict_16x16_h_altivec since I don't
> know how to make it faster with Altivec, even after some unrolling.
>
> However, it looks like doing some pseudo-64bits SIMD with general
> purpose registers allows this code to go faster on that machine.
>
> I'll experience more with that later one.
Good luck !
Antoine Gerschenfeld
gerschen at gmail.com
P.S. : New checkasm patch :
diff --git a/tools/checkasm.c b/tools/checkasm.c
index aeaf5fb..7825b97 100644
--- a/tools/checkasm.c
+++ b/tools/checkasm.c
@@ -30,6 +30,10 @@
#include "common/common.h"
#include "common/cpu.h"
+#ifdef SYS_MACOSX
+#include <mach/mach_time.h>
+#endif
+
/* buf1, buf2: initialised to random data and shouldn't write into
them */
uint8_t * buf1, * buf2;
/* buf3, buf4: used to store output */
@@ -80,6 +84,8 @@ static inline uint32_t read_time(void)
uint32_t a;
asm volatile( "rdtsc" :"=a"(a) ::"edx" );
return a;
+#elif defined(SYS_MACOSX)
+ return mach_absolute_time();
#else
return 0;
#endif
@@ -153,7 +159,8 @@ static void print_bench(void)
/* print sse2slow only if there's also a sse2fast
version of the same func */
b->cpu&X264_CPU_SSE2_IS_SLOW && j<MAX_CPUS &&
b[1].cpu&X264_CPU_SSE2_IS_FAST && !(b[1].cpu&X264_CPU_SSE3) ?
"sse2slow" :
b->cpu&X264_CPU_SSE2 ? "sse2" :
- b->cpu&X264_CPU_MMX ? "mmx" : "c",
+ b->cpu&X264_CPU_MMX ? "mmx" :
+ b->cpu&X264_CPU_ALTIVEC ? "altivec" : "c",
b->cpu&X264_CPU_CACHELINE_32 ? "_c32" :
b->cpu&X264_CPU_CACHELINE_64 ? "_c64" :
b->cpu&X264_CPU_SSE_MISALIGN ? "_misalign" :
@@ -1448,7 +1455,7 @@ int main(int argc, char *argv[])
if( argc > 1 && !strncmp( argv[1], "--bench", 7 ) )
{
-#if !defined(ARCH_X86) && !defined(ARCH_X86_64)
+#if !defined(ARCH_X86) && !defined(ARCH_X86_64) && !defined(SYS_MACOSX)
fprintf( stderr, "no --bench for your cpu until you port rdtsc
\n" );
return 1;
#endif
More information about the x264-devel
mailing list