[x265] [PATCH] asm: optimize dct4, replaced pshufd(latency 4-6)+pshufhw(latency 2) instructions with pshufb(latency 1)

chen chenm003 at 163.com
Wed Aug 27 17:57:59 CEST 2014


 

At 2014-08-27 12:57:19,dnyaneshwar at multicorewareinc.com wrote:
># HG changeset patch
># User Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
># Date 1409115349 -19800
>#      Wed Aug 27 10:25:49 2014 +0530
># Node ID f49ed93e3daff100903e5fd7aa1bd874b9e79caf
># Parent  32891b95f6693a39afbdf7929e12e3e0c6e990d1
>asm: optimize dct4, replaced pshufd(latency 4-6)+pshufhw(latency 2) instructions with pshufb(latency 1)

In the Agner's documents, pshufd and pshufb have same latency 1.
In your patch, there have some memory access operators, it is danger things, a cache miss cost more than function cycles.
I think in the asm code, Through is most possible things since modern CPU have outorder engine.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140827/39075e2c/attachment.html>


More information about the x265-devel mailing list