[x264-devel] commit: add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8 (Guillaume Poirier )

Jason Garrett-Glaser darkshikari at gmail.com
Sun Jan 25 23:47:37 CET 2009


2009/1/25 Guillaume POIRIER <gpoirier at mplayerhq.hu>:
> Hello,
>
> On Sun, Jan 25, 2009 at 12:33 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> On Sat, 24 Jan 2009, Guillaume POIRIER wrote:
>
>>> I have no experience in using LUT: what's the right way to compute the
>>> index fast?
>>
>> perm_tab needs to be static const, otherwise it gets written to the stack
>> at every function call.
>
> Damn, I thought that modern compilers were smart enough to catch that.
> Fixed. That brings the overall number of cycles down to 16 cycles as
> previously.
>
>
>> gcc might also be failing to optimize the ! (it's not a simple arithmetic
>> op). Fix that by
>> perm_tab[(((uintptr_t)pix & 8) >> 3) + ((i_stride & 8) >> 2)]
>
> I tried that, and interestingly it doesn't change cycle count. I used
> your expression anyway since it's simpler.
>
>
>> And then promote chroma planes to 16 byte alignment, and change it to
>> perm_tab[((uintptr_t)pix & 8) >> 3]
>
> I implemented this, and it's brilliant! I shaves one more cycle overall.
>
> Attached patch aligns chroma planes on mod16, and has the rest of the
> PPC optimizations you suggested.

Um... that patch doesn't have any effect on the chroma planes.  You're
changing pic_alloc, when you should be changing x264_frame_new.
Furthermore, you're trying to align the chroma planes, but you're not
aligning their stride.  You have to change:

frame->i_stride[i] = i_stride >> !!i;

to

frame->i_stride[i] = ALIGN( i_stride >> !!i, 16);

I'm not sure if this should be conditional for PPC to save a bit of
memory on x86.

Dark Shikari


More information about the x264-devel mailing list