[x264-devel] [PATCH] Add all remaining 16x16 predict Altivec routines
Guillaume Poirier
gpoirier at mplayerhq.hu
Wed Jan 14 11:14:56 CET 2009
Hello,
Antoine Gerschenfeld wrote:
> On 13 janv. 09, at 23:50, Guillaume POIRIER wrote:
>
>
>> I don't exactly have the same numbers over here (PPC970MP with
>> GCC4.2 on
>> Leopard), but it's close enough.
>>
>
> My own runs exhibit slight variations, on the order of +/- 1 for the
> shorter functions
> and +/- 5 for the longest (intra_predict_16x16_p_c). Still, the
> conclusions seem clear
> enough...
>
Except if sleep on it and come back with fresh ideas.
>> I guess I'll have to drop intra_predict_16x16_h_altivec since I don't
>> know how to make it faster with Altivec, even after some unrolling.
>>
>> However, it looks like doing some pseudo-64bits SIMD with general
>> purpose registers allows this code to go faster on that machine.
>>
>> I'll experience more with that later one.
Even if 64-bits fast indeed faster, it's nowhere as fast as the new
intra_predict_16x16_h_altivec in attached patch.
Here are the benchmark figures on PPC7450:
intra_predict_16x16_dc_c: 43
intra_predict_16x16_dc_altivec: 25
intra_predict_16x16_dc8_c: 25
intra_predict_16x16_dc8_altivec: 12
intra_predict_16x16_dcl_c: 39
intra_predict_16x16_dcl_altivec: 21
intra_predict_16x16_dct_c: 39
intra_predict_16x16_dct_altivec: 21
intra_predict_16x16_h_c: 45
intra_predict_16x16_h_altivec: 22
intra_predict_16x16_p_c: 433
intra_predict_16x16_p_altivec: 65
intra_predict_16x16_v_c: 26
intra_predict_16x16_v_altivec: 21
Please try on other CPUs if you can, but I believe that the speed-up should be consistent across all.
Guillaume
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predict_altivec_16x16.3.diff
Type: text/x-patch
Size: 4527 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090114/e5d026ae/attachment.bin
More information about the x264-devel
mailing list