[x264-devel] commit: add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8 (Guillaume Poirier )
Guillaume POIRIER
gpoirier at mplayerhq.hu
Sun Jan 25 23:39:11 CET 2009
Hello,
On Sun, Jan 25, 2009 at 12:33 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Sat, 24 Jan 2009, Guillaume POIRIER wrote:
>> I have no experience in using LUT: what's the right way to compute the
>> index fast?
>
> perm_tab needs to be static const, otherwise it gets written to the stack
> at every function call.
Damn, I thought that modern compilers were smart enough to catch that.
Fixed. That brings the overall number of cycles down to 16 cycles as
previously.
> gcc might also be failing to optimize the ! (it's not a simple arithmetic
> op). Fix that by
> perm_tab[(((uintptr_t)pix & 8) >> 3) + ((i_stride & 8) >> 2)]
I tried that, and interestingly it doesn't change cycle count. I used
your expression anyway since it's simpler.
> And then promote chroma planes to 16 byte alignment, and change it to
> perm_tab[((uintptr_t)pix & 8) >> 3]
I implemented this, and it's brilliant! I shaves one more cycle overall.
Attached patch aligns chroma planes on mod16, and has the rest of the
PPC optimizations you suggested.
Please check I didn't screw up something.
Thanks a lot for all your suggestions!
Guillaume
--
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
Marilyn Monroe - "It's not true that I had nothing on. I had the radio on."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: variance_optimization.2.diff
Type: application/octet-stream
Size: 4071 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090125/6bdd9255/attachment.obj
More information about the x264-devel
mailing list