[x264-devel] Re: Very small optimizations

Michael Niedermayer michaelni at gmx.at
Fri Dec 2 01:54:13 CET 2005


Hi

On Thu, Dec 01, 2005 at 01:52:41PM -0800, Loren Merritt wrote:
> On Thu, 1 Dec 2005, David Pio wrote:
> 
> >encoder/me.c
> >function 'x264_me_search_ref'
> >
> >various search methods use 'i_me_range/2' or 'i_me_range/4' in the
> >conditional part of the looping structure.
> >i_me_range seems not to change within the context of the function, so 
> >should
> >those divides be taken out of the looping structure?  say create 2 new
> >variables i_me_range_div2 and i_me_range_div4??  It would save some divide
> >CPU cycles.
> >
> >Or does the compiler optimize this out?
> 
> Division symbol != division instruction. 

yes


> The compiler knows to use shifts.

and how? (-1)/2 != (-1)>>1 ... yes for unsigned the compiler should use shifts


> 
> >common/mc.c
> >
> >line 247 and 278:
> > int filter1 = (hpel1x & 1) + ( (hpel1y & 1) << 1 );
> >could be
> > int filter1 = (hpel1x & 1) ^ ( (hpel1y & 1) << 1 );
> >
> >replacing an addition with a bitwise OR, should save some CPU cycles?
> 
> Is there any modern CPU where ADD and OR are not equally fast?

yes the P4 can execute more adds per cycle then ors, then again some might
argue that the P4 isnt a modern CPU but a pile of shit ...

furthermore add should be faster on some x86 cpus as the above code can be
done with a lea instruction (2*reg1 + reg2)

[...]

-- 
Michael

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list