[x264-devel] commit: add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8 (Guillaume Poirier )
darkshikari at gmail.com
Sun Jan 25 23:47:37 CET 2009
2009/1/25 Guillaume POIRIER <gpoirier at mplayerhq.hu>:
> On Sun, Jan 25, 2009 at 12:33 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> On Sat, 24 Jan 2009, Guillaume POIRIER wrote:
>>> I have no experience in using LUT: what's the right way to compute the
>>> index fast?
>> perm_tab needs to be static const, otherwise it gets written to the stack
>> at every function call.
> Damn, I thought that modern compilers were smart enough to catch that.
> Fixed. That brings the overall number of cycles down to 16 cycles as
>> gcc might also be failing to optimize the ! (it's not a simple arithmetic
>> op). Fix that by
>> perm_tab[(((uintptr_t)pix & 8) >> 3) + ((i_stride & 8) >> 2)]
> I tried that, and interestingly it doesn't change cycle count. I used
> your expression anyway since it's simpler.
>> And then promote chroma planes to 16 byte alignment, and change it to
>> perm_tab[((uintptr_t)pix & 8) >> 3]
> I implemented this, and it's brilliant! I shaves one more cycle overall.
> Attached patch aligns chroma planes on mod16, and has the rest of the
> PPC optimizations you suggested.
Um... that patch doesn't have any effect on the chroma planes. You're
changing pic_alloc, when you should be changing x264_frame_new.
Furthermore, you're trying to align the chroma planes, but you're not
aligning their stride. You have to change:
frame->i_stride[i] = i_stride >> !!i;
frame->i_stride[i] = ALIGN( i_stride >> !!i, 16);
I'm not sure if this should be conditional for PPC to save a bit of
memory on x86.
More information about the x264-devel