<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div> </div><pre><br>At 2014-08-27 12:57:19,dnyaneshwar@multicorewareinc.com wrote:
># HG changeset patch
># User Dnyaneshwar G <dnyaneshwar@multicorewareinc.com>
># Date 1409115349 -19800
># Wed Aug 27 10:25:49 2014 +0530
># Node ID f49ed93e3daff100903e5fd7aa1bd874b9e79caf
># Parent 32891b95f6693a39afbdf7929e12e3e0c6e990d1
>asm: optimize dct4, replaced pshufd(latency 4-6)+pshufhw(latency 2) instructions with pshufb(latency 1)
In the Agner's documents, pshufd and pshufb have same latency 1.</pre><pre>In your patch, there have some memory access operators, it is danger things, a cache miss cost more than function cycles.</pre><pre>I think in the asm code, Through is most possible things since modern CPU have outorder engine.</pre><pre> </pre></div>