[x264-devel] commit: add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8 (Guillaume Poirier )
Jason Garrett-Glaser
darkshikari at gmail.com
Sun Jan 25 23:47:37 CET 2009
2009/1/25 Guillaume POIRIER <gpoirier at mplayerhq.hu>:
> Hello,
>
> On Sun, Jan 25, 2009 at 12:33 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> On Sat, 24 Jan 2009, Guillaume POIRIER wrote:
>
>>> I have no experience in using LUT: what's the right way to compute the
>>> index fast?
>>
>> perm_tab needs to be static const, otherwise it gets written to the stack
>> at every function call.
>
> Damn, I thought that modern compilers were smart enough to catch that.
> Fixed. That brings the overall number of cycles down to 16 cycles as
> previously.
>
>
>> gcc might also be failing to optimize the ! (it's not a simple arithmetic
>> op). Fix that by
>> perm_tab[(((uintptr_t)pix & 8) >> 3) + ((i_stride & 8) >> 2)]
>
> I tried that, and interestingly it doesn't change cycle count. I used
> your expression anyway since it's simpler.
>
>
>> And then promote chroma planes to 16 byte alignment, and change it to
>> perm_tab[((uintptr_t)pix & 8) >> 3]
>
> I implemented this, and it's brilliant! I shaves one more cycle overall.
>
> Attached patch aligns chroma planes on mod16, and has the rest of the
> PPC optimizations you suggested.
Um... that patch doesn't have any effect on the chroma planes. You're
changing pic_alloc, when you should be changing x264_frame_new.
Furthermore, you're trying to align the chroma planes, but you're not
aligning their stride. You have to change:
frame->i_stride[i] = i_stride >> !!i;
to
frame->i_stride[i] = ALIGN( i_stride >> !!i, 16);
I'm not sure if this should be conditional for PPC to save a bit of
memory on x86.
Dark Shikari
More information about the x264-devel
mailing list