Correction to my previous patch. Line:<br><br>+ for( x = 0; width - x >= 2; x++ )\<br><br>should be<br><br>+ for( ; width - x >= 2; x++ )\<br><br>A quick test of the code yields:<br><br>SSSE3: 2.23m clocks<br>
SSE2: 3.36m clocks<br>MMX: 7.46m clocks<br>Scalar/C: 7.5m clocks<br><br>for 1280x720 input on a Core 2 Merom.<br><br>Dark Shikari<br><br>