[x264-devel] [PATCH] Add all remaining 16x16 predict Altivec routines
gpoirier at mplayerhq.hu
Wed Jan 14 11:14:56 CET 2009
Antoine Gerschenfeld wrote:
> On 13 janv. 09, at 23:50, Guillaume POIRIER wrote:
>> I don't exactly have the same numbers over here (PPC970MP with
>> GCC4.2 on
>> Leopard), but it's close enough.
> My own runs exhibit slight variations, on the order of +/- 1 for the
> shorter functions
> and +/- 5 for the longest (intra_predict_16x16_p_c). Still, the
> conclusions seem clear
Except if sleep on it and come back with fresh ideas.
>> I guess I'll have to drop intra_predict_16x16_h_altivec since I don't
>> know how to make it faster with Altivec, even after some unrolling.
>> However, it looks like doing some pseudo-64bits SIMD with general
>> purpose registers allows this code to go faster on that machine.
>> I'll experience more with that later one.
Even if 64-bits fast indeed faster, it's nowhere as fast as the new
intra_predict_16x16_h_altivec in attached patch.
Here are the benchmark figures on PPC7450:
Please try on other CPUs if you can, but I believe that the speed-up should be consistent across all.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4527 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090114/e5d026ae/attachment.bin
More information about the x264-devel