<div>Hello,</div>
<div> </div>
<div>New to the list, but longtime user of x264 and thought I could try to get involved somehow.</div>
<div>Really I've just started to look through the code and see if there could be any small optimizations</div>
<div> </div>
<div>Hopefully some of these are helpful..if not, just ignore them. :) Only trying to help.</div>
<div> </div>
<div>---------</div>
<div> </div>
<div>encoder/me.c</div>
<div>function 'x264_me_search_ref'</div>
<div> </div>
<div>various search methods use 'i_me_range/2' or 'i_me_range/4' in the conditional part of the looping structure.</div>
<div>i_me_range seems not to change within the context of the function, so should those divides be taken out of the looping structure? say create 2 new variables i_me_range_div2 and i_me_range_div4?? It would save some divide CPU cycles.
</div>
<div> </div>
<div>Or does the compiler optimize this out?</div>
<div> </div>
<div>-------</div>
<div> </div>
<div>common/mc.c</div>
<div> </div>
<div>line 247 and 278:</div>
<div> int filter1 = (hpel1x & 1) + ( (hpel1y & 1) << 1 );</div>
<div>could be</div>
<div> int filter1 = (hpel1x & 1) ^ ( (hpel1y & 1) << 1 );</div>
<div> </div>
<div>replacing an addition with a bitwise OR, should save some CPU cycles?</div>
<div>the same with line 255 and 286</div>
<div> </div>
<div>lines 314 to 317:</div>
<div> const int cA = (8-d8x)*(8-d8y);<br> const int cB = d8x *(8-d8y);<br> const int cC = (8-d8x)*d8y;<br> const int cD = d8x *d8y;</div>
<div> </div>
<div>could be rewritten as:</div>
<div> int d8x_times8 = d8x * 8;</div>
<div> int d8y_times8 = d8x * 8;</div>
<div> const int cD = d8x * d8y;</div>
<div> const int cC = d8y_times8 - cD;</div>
<div> const int cB = d8x_times8 - cD;</div>
<div> const int cA = 64 - d8x_times8 - d8y_times8 + cD;</div>
<div> </div>
<div>4 subtractions and 4 multiplications are replaced by 1 multplication, 4 subtractions, 1 addition, and 2 bit shifts ( the *8 should be optimized by the compiler to a 3 bit left shift, right?)</div>
<div>In the SSE version it is accomplished with 2 subtractions and 4 multiplications</div>
<div>I couldn't find how many cycles a multiplcation takes, but additions, subtractions, and bit shifts take like 1 cycle each, right?</div>
<div> </div>
<div> </div>