[x264-devel] Very small optimizations
David Pio
puffpio at gmail.com
Thu Dec 1 14:09:53 CET 2005
Hello,
New to the list, but longtime user of x264 and thought I could try to get
involved somehow.
Really I've just started to look through the code and see if there could be
any small optimizations
Hopefully some of these are helpful..if not, just ignore them. :) Only
trying to help.
---------
encoder/me.c
function 'x264_me_search_ref'
various search methods use 'i_me_range/2' or 'i_me_range/4' in the
conditional part of the looping structure.
i_me_range seems not to change within the context of the function, so should
those divides be taken out of the looping structure? say create 2 new
variables i_me_range_div2 and i_me_range_div4?? It would save some divide
CPU cycles.
Or does the compiler optimize this out?
-------
common/mc.c
line 247 and 278:
int filter1 = (hpel1x & 1) + ( (hpel1y & 1) << 1 );
could be
int filter1 = (hpel1x & 1) ^ ( (hpel1y & 1) << 1 );
replacing an addition with a bitwise OR, should save some CPU cycles?
the same with line 255 and 286
lines 314 to 317:
const int cA = (8-d8x)*(8-d8y);
const int cB = d8x *(8-d8y);
const int cC = (8-d8x)*d8y;
const int cD = d8x *d8y;
could be rewritten as:
int d8x_times8 = d8x * 8;
int d8y_times8 = d8x * 8;
const int cD = d8x * d8y;
const int cC = d8y_times8 - cD;
const int cB = d8x_times8 - cD;
const int cA = 64 - d8x_times8 - d8y_times8 + cD;
4 subtractions and 4 multiplications are replaced by 1 multplication, 4
subtractions, 1 addition, and 2 bit shifts ( the *8 should be optimized by
the compiler to a 3 bit left shift, right?)
In the SSE version it is accomplished with 2 subtractions and 4
multiplications
I couldn't find how many cycles a multiplcation takes, but additions,
subtractions, and bit shifts take like 1 cycle each, right?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.videolan.org/pipermail/x264-devel/attachments/20051201/eddb6513/attachment.htm
More information about the x264-devel
mailing list