[x265] [PATCH] testbench: added new optimized c primitive for psyCost_pp, suitable to write asm code

Steve Borho steve at borho.org
Mon Dec 15 18:17:44 CET 2014


On 12/15, dnyaneshwar at multicorewareinc.com wrote:
> # HG changeset patch
> # User Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
> # Date 1418633185 -19800
> #      Mon Dec 15 14:16:25 2014 +0530
> # Node ID ff352d647f4b3a8f0c249fc7a8f4eb3645aaa974
> # Parent  6ba7be7b169783db1d667d1140e51b68ff4b64fb
> testbench: added new optimized c primitive for psyCost_pp, suitable to write asm code
> 
> in new primitive, combined sa8d_8x8 and sad_8x8 together to save redundant loads, removed unnecessary zeroBuffer
> testbench checks old c vs new c code correctness

Queued. 

As a test I wired up the two C refs to be run by the speed tests. The results
are interesting. The new C functions are faster, primarily because there are
fewer function calls.  I look forward to the assembly code.

       psycost_pp[4x4]  1.56x    957.33      1489.44 
       psycost_ss[4x4]  1.09x    1216.87     1323.21 
       psycost_pp[8x8]  1.47x    3876.76     5689.62 
       psycost_ss[8x8]  1.17x    4575.75     5364.46 
     psycost_pp[16x16]  1.23x    25199.46    30884.86
     psycost_ss[16x16]  1.00x    21506.01    21432.01
     psycost_pp[32x32]  1.14x    81989.49    93471.70
     psycost_ss[32x32]  1.94x    81693.79    158855.92
     psycost_pp[64x64]  1.46x    263514.53   385280.00
     psycost_ss[64x64]  1.02x    339000.50   344397.19

My hacks:

diff -r be5ab1a2a3fa source/test/pixelharness.cpp
--- a/source/test/pixelharness.cpp  Mon Dec 15 15:10:27 2014 +0530
+++ b/source/test/pixelharness.cpp  Mon Dec 15 11:11:30 2014 -0600
@@ -1695,6 +1695,18 @@
             HEADER("copy_cnt[%dx%d]", 4 << i, 4 << i);
             REPORT_SPEEDUP(opt.copy_cnt[i], ref.copy_cnt[i], sbuf1, sbuf2, STRIDE);
         }
+
+        if (ref.psy_cost_pp[i])
+        {
+            HEADER("psycost_pp[%dx%d]", 4 << i, 4 << i);
+            REPORT_SPEEDUP(ref.psy_cost_pp[i + NUM_SQUARE_BLOCKS], ref.psy_cost_pp[i], pbuf1, STRIDE, pbuf2, STRIDE);
+        }
+
+        if (ref.psy_cost_ss[i])
+        {
+            HEADER("psycost_ss[%dx%d]", 4 << i, 4 << i);
+            REPORT_SPEEDUP(ref.psy_cost_ss[i + NUM_SQUARE_BLOCKS], ref.psy_cost_ss[i], sbuf1, STRIDE, sbuf2, STRIDE);
+        }
     }
 
     if (opt.weight_pp)

-- 
Steve Borho


More information about the x265-devel mailing list