[x265] [PATCH] psyCost_pp: C code optimization, suitable for ASM conversion

chen chenm003 at 163.com
Fri Dec 12 19:09:12 CET 2014


I suggest you make a new function and keep origin code.
After that, we make a new primivite item and put your new code there.
Third, write a testbench to verify origin and new code is matched.

At 2014-12-12 17:33:51,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1418376812 -19800
># Node ID ca805794c7519ef698776d0f717fd66b0d3bc626
># Parent  b1c2ef980dfe59c486454a8838c2c1bb74bf4d32
>psyCost_pp: C code optimization, suitable for ASM conversion
>
>psy_sum_std_4x4, is combined version of satd_4x4 and sad<4, 4>, eliminating unnecessary load operations, extra arguments(zeroBuf and 0).
>It also replaces SAD operation with low cost sum operation as the sencod pixel buffer is zero.
>
>diff -r b1c2ef980dfe -r ca805794c751 source/common/pixel.cpp
>--- a/source/common/pixel.cpp	Thu Dec 11 16:52:06 2014 -0600
>+++ b/source/common/pixel.cpp	Fri Dec 12 15:03:32 2014 +0530
>@@ -801,6 +801,38 @@
> #pragma warning(disable: 4127) // conditional expression is constant
> #endif
> 
>+int psy_sum_std_4x4(const pixel* pix1, intptr_t stride_pix1)
>+{
>+    sum2_t tmp[4][2];
>+    sum2_t a0, a1, a2, a3, b0, b1;
>+    sum2_t sum = 0;
>+    int sum0 = 0;
>+    uint32_t sum1 = 0;
>+
>+    for (int i = 0; i < 4; i++, pix1 += stride_pix1)
>+    {
>+        a0 = pix1[0];
>+        a1 = pix1[1];
>+        a2 = pix1[2];
>+        a3 = pix1[3];
>+        sum1 = sum1 + (a0 + a1 + a2 + a3);
>+        b0 = (a0 + a1) + ((a0 - a1) << BITS_PER_SUM);
>+        b1 = (a2 + a3) + ((a2 - a3) << BITS_PER_SUM);
>+        tmp[i][0] = b0 + b1;
>+        tmp[i][1] = b0 - b1;
>+    }
>+
>+    for (int i = 0; i < 2; i++)
>+    {
>+        HADAMARD4(a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
>+        a0 = abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3);
>+        sum += ((sum_t)a0) + (a0 >> BITS_PER_SUM);
>+    }
>+
>+    sum0 = (int)(sum >> 1);
>+    return (int)(sum0 - (sum1 >> 2));
>+}
>+
> template<int size>
> int psyCost_pp(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride)
> {
>@@ -828,8 +860,8 @@
>     else
>     {
>         /* 4x4 is too small for sa8d */
>-        int sourceEnergy = satd_4x4(source, sstride, zeroBuf, 0) - (sad<4, 4>(source, sstride, zeroBuf, 0) >> 2);
>-        int reconEnergy = satd_4x4(recon, rstride, zeroBuf, 0) - (sad<4, 4>(recon, rstride, zeroBuf, 0) >> 2);
>+        int sourceEnergy = psy_sum_std_4x4(source, sstride);
>+        int reconEnergy = psy_sum_std_4x4(recon, rstride);
>         return abs(sourceEnergy - reconEnergy);
>     }
> }
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20141213/e026e1a5/attachment.html>


More information about the x265-devel mailing list