[x265-commits] [x265] asm: the pixel value in blockcopy_ps is saturation by cal...

Thu Nov 7 20:44:55 CET 2013

details:   http://hg.videolan.org/x265/rev/0a1b379be359
branches:  
changeset: 4918:0a1b379be359
user:      Min Chen <chenm003 at 163.com>
date:      Thu Nov 07 18:17:52 2013 +0800
description:
asm: the pixel value in blockcopy_ps is saturation by calcRecon, so asm can use packuswb
Subject: [x265] tcompicyuv: improvement for Extend the right if width is not multiple of min CU size

details:   http://hg.videolan.org/x265/rev/85002898f5b4
branches:  
changeset: 4919:85002898f5b4
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Thu Nov 07 14:31:05 2013 +0530
description:
tcompicyuv: improvement for Extend the right if width is not multiple of min CU size
Subject: [x265] asm: assembly code for pixel_sad_x3_48x64

details:   http://hg.videolan.org/x265/rev/74682dfe5342
branches:  
changeset: 4920:74682dfe5342
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Nov 07 12:25:14 2013 +0530
description:
asm: assembly code for pixel_sad_x3_48x64
Subject: [x265] asm: assembly code for pixel_sad_x4_48x64

details:   http://hg.videolan.org/x265/rev/96f1bb63b747
branches:  
changeset: 4921:96f1bb63b747
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Nov 07 13:07:18 2013 +0530
description:
asm: assembly code for pixel_sad_x4_48x64
Subject: [x265] asm: assembly code for pixel_sad_x3_64xN

details:   http://hg.videolan.org/x265/rev/d6644a32e6bc
branches:  
changeset: 4922:d6644a32e6bc
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Nov 07 16:40:20 2013 +0530
description:
asm: assembly code for pixel_sad_x3_64xN
Subject: [x265] asm: assembly code for pixel_sad_x4_64xN

details:   http://hg.videolan.org/x265/rev/dc31fc1daf42
branches:  
changeset: 4923:dc31fc1daf42
user:      Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date:      Thu Nov 07 17:29:52 2013 +0530
description:
asm: assembly code for pixel_sad_x4_64xN
Subject: [x265] pixel: remove last remaining intrinsic SAD primitives

details:   http://hg.videolan.org/x265/rev/536db32fc253
branches:  
changeset: 4924:536db32fc253
user:      Steve Borho <steve at borho.org>
date:      Thu Nov 07 12:01:09 2013 -0600
description:
pixel: remove last remaining intrinsic SAD primitives
Subject: [x265] aq: bug fix, extend right and bot of TComPic::m_origPicYuv to a multiple of 16

details:   http://hg.videolan.org/x265/rev/93a4f88844f1
branches:  
changeset: 4925:93a4f88844f1
user:      Aarthi Thirumalai
date:      Thu Nov 07 16:46:57 2013 +0530
description:
aq: bug fix, extend right and bot of TComPic::m_origPicYuv to a multiple of 16
Subject: [x265] tcompicyuv: add right boundary padding while applying bottom row padding.

details:   http://hg.videolan.org/x265/rev/397a201b0ea3
branches:  
changeset: 4926:397a201b0ea3
user:      Aarthi Thirumalai
date:      Thu Nov 07 17:22:26 2013 +0530
description:
tcompicyuv: add right boundary padding while applying bottom row padding.
Subject: [x265] asm: the pixel value in blockcopy_ps is saturation by calcRecon, so asm can use packuswb

details:   http://hg.videolan.org/x265/rev/b572831429ec
branches:  
changeset: 4927:b572831429ec
user:      Min Chen <chenm003 at 163.com>
date:      Thu Nov 07 18:17:52 2013 +0800
description:
asm: the pixel value in blockcopy_ps is saturation by calcRecon, so asm can use packuswb
Subject: [x265] cleanup: remove unused blockcpy_sc

details:   http://hg.videolan.org/x265/rev/db7752a46693
branches:  
changeset: 4928:db7752a46693
user:      Min Chen <chenm003 at 163.com>
date:      Thu Nov 07 18:18:16 2013 +0800
description:
cleanup: remove unused blockcpy_sc
Subject: [x265] Bug fix for luma vpp asm routines.Also incorporated review comment changes.

details:   http://hg.videolan.org/x265/rev/9ba49b482a1e
branches:  
changeset: 4929:9ba49b482a1e
user:      Nabajit Deka
date:      Thu Nov 07 21:10:38 2013 +0530
description:
Bug fix for luma vpp asm routines.Also incorporated review comment changes.
Subject: [x265] asm: enable luma_vpp block MC functions

details:   http://hg.videolan.org/x265/rev/4d9aac4f0985
branches:  
changeset: 4930:4d9aac4f0985
user:      Steve Borho <steve at borho.org>
date:      Thu Nov 07 12:31:34 2013 -0600
description:
asm: enable luma_vpp block MC functions
Subject: [x265] unit test code for blockfill_s_c function

details:   http://hg.videolan.org/x265/rev/12ec248f7390
branches:  
changeset: 4931:12ec248f7390
user:      Praveen Tiwari
date:      Thu Nov 07 18:16:22 2013 +0530
description:
unit test code for blockfill_s_c function
Subject: [x265] asm code for blockfill_s, 4x4

details:   http://hg.videolan.org/x265/rev/29d208555299
branches:  
changeset: 4932:29d208555299
user:      Praveen Tiwari
date:      Thu Nov 07 18:26:36 2013 +0530
description:
asm code for blockfill_s, 4x4
Subject: [x265] asm code for blockfill_s, 8x8

details:   http://hg.videolan.org/x265/rev/7d3e461312a5
branches:  
changeset: 4933:7d3e461312a5
user:      Praveen Tiwari
date:      Thu Nov 07 18:59:28 2013 +0530
description:
asm code for blockfill_s, 8x8
Subject: [x265] asm code for blockfill_s, 16x16

details:   http://hg.videolan.org/x265/rev/a8df8123e9ab
branches:  
changeset: 4934:a8df8123e9ab
user:      Praveen Tiwari
date:      Thu Nov 07 19:40:51 2013 +0530
description:
asm code for blockfill_s, 16x16
Subject: [x265] asm code for blockfil_s, 32x32

details:   http://hg.videolan.org/x265/rev/b4993b1fef7c
branches:  
changeset: 4935:b4993b1fef7c
user:      Praveen Tiwari
date:      Thu Nov 07 20:06:56 2013 +0530
description:
asm code for blockfil_s, 32x32
Subject: [x265] rename: pixelsub_sp to pixelsub_ps, because it sub two Pixel and result is Short

details:   http://hg.videolan.org/x265/rev/cb24ed71905d
branches:  
changeset: 4936:cb24ed71905d
user:      Min Chen <chenm003 at 163.com>
date:      Thu Nov 07 13:13:47 2013 +0800
description:
rename: pixelsub_sp to pixelsub_ps, because it sub two Pixel and result is Short

diffstat:

 source/Lib/TLibCommon/TComPicYuv.cpp     |   239 +--
 source/Lib/TLibCommon/TComPrediction.cpp |     2 +-
 source/common/TShortYUV.cpp              |     6 +-
 source/common/lowres.cpp                 |    25 -
 source/common/pixel.cpp                  |    21 +-
 source/common/primitives.h               |     5 +-
 source/common/vec/blockcopy-sse3.cpp     |     5 +-
 source/common/vec/pixel-sse41.cpp        |  1795 +-----------------------------
 source/common/x86/asm-primitives.cpp     |    26 +
 source/common/x86/blockcopy8.asm         |   149 ++
 source/common/x86/blockcopy8.h           |     5 +
 source/common/x86/ipfilter8.asm          |   232 +-
 source/common/x86/sad-a.asm              |   898 +++++++++++++++
 source/encoder/frameencoder.cpp          |     5 +-
 source/encoder/motion.cpp                |     2 +-
 source/test/pixelharness.cpp             |    85 +-
 source/test/pixelharness.h               |     4 +-
 17 files changed, 1378 insertions(+), 2126 deletions(-)

diffs (truncated from 4181 to 300 lines):

diff -r ed1b1a7b0b38 -r cb24ed71905d source/Lib/TLibCommon/TComPicYuv.cpp

--- a/source/Lib/TLibCommon/TComPicYuv.cpp	Thu Nov 07 13:05:53 2013 +0530
+++ b/source/Lib/TLibCommon/TComPicYuv.cpp	Thu Nov 07 13:13:47 2013 +0800
@@ -323,18 +323,36 @@ void TComPicYuv::dump(char* pFileName, b
 //! \}
 
 /* Copy pixels from an input picture (C structure) into internal TComPicYuv instance
- * Upscale pixels from 8bits to 16 bits when required, but do not modify pixels.
- * This new routine is GPL
- */
+ * Upscale pixels from 8bits to 16 bits when required, but do not modify
+ * pixels. */
 void TComPicYuv::copyFromPicture(const x265_picture& pic, int32_t *pad)
 {
     Pel *Y = getLumaAddr();
     Pel *U = getCbAddr();
     Pel *V = getCrAddr();
 
+    // m_picWidth is the width that is being encoded, padx indicates how many
+    // of those pixels are padding to reach multiple of MinCU(4) size.
+    //
+    // Internally, we need to extend rows out to a multiple of 16 for lowres
+    // downscale and other operations. But those padding pixels are never
+    // encoded.
+    //
+    // The same applies to m_picHeight and pady
+
     int padx = pad[0];
     int pady = pad[1];
 
+    /* width and height - without padsize (input picture raw width and height) */
+    int width = m_picWidth - padx;
+    int height = m_picHeight - pady;
+
+    /* internal pad to multiple of 16x16 blocks */
+    uint8_t rem = m_picWidth & 15;
+    padx = rem ? 16 - rem : padx;
+    rem = m_picHeight & 15;
+    pady = rem ? 16 - rem : pady;
+
 #if HIGH_BIT_DEPTH
     if (pic.bitDepth > 8)
     {
@@ -342,10 +360,6 @@ void TComPicYuv::copyFromPicture(const x
         uint16_t *u = (uint16_t*)pic.planes[1];
         uint16_t *v = (uint16_t*)pic.planes[2];
 
-        /* width and height - without padsize */
-        int width = m_picWidth - padx;
-        int height = m_picHeight - pady;
-
         // Manually copy pixels to up-size them
         for (int r = 0; r < height; r++)
         {
@@ -354,6 +368,10 @@ void TComPicYuv::copyFromPicture(const x
                 Y[c] = (Pel)y[c];
             }
 
+            for (int x = 0; x < padx; x++)
+            {
+                Y[width + x] = Y[width - 1];
+            }
             Y += getStride();
             y += pic.stride[0];
         }
@@ -366,73 +384,42 @@ void TComPicYuv::copyFromPicture(const x
                 V[c] = (Pel)v[c];
             }
 
+            for (int x = 0; x < padx >> m_hChromaShift; x++)
+            {
+                U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
+                V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
+            }
             U += getCStride();
             V += getCStride();
             u += pic.stride[1];
             v += pic.stride[2];
         }
 
-        /* Extend the right if width is not multiple of minimum CU size */
-
-        if (padx)
-        {
-            Y = getLumaAddr();
-            U = getCbAddr();
-            V = getCrAddr();
-
-            for (int r = 0; r < height; r++)
-            {
-                for (int x = 0; x < padx; x++)
-                {
-                    Y[width + x] = Y[width - 1];
-                }
-
-                Y += getStride();
-            }
-
-            for (int r = 0; r < height >> m_vChromaShift; r++)
-            {
-                for (int x = 0; x < padx >> m_hChromaShift; x++)
-                {
-                    U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
-                    V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
-                }
-
-                U += getCStride();
-                V += getCStride();
-            }
-        }
-
         /* extend the bottom if height is not multiple of the minimum CU size */
         if (pady)
         {
-            width = m_picWidth;
             Y = getLumaAddr() + (height - 1) * getStride();
             U = getCbAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
             V = getCrAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
 
             for (uint32_t i = 1; i <= pady; i++)
             {
-                memcpy(Y + i * getStride(), Y, width * sizeof(Pel));
+                memcpy(Y + i * getStride(), Y, (width + padx) * sizeof(Pel));
             }
 
             for (uint32_t j = 1; j <= pady >> m_vChromaShift; j++)
             {
-                memcpy(U + j * getCStride(), U, (width >> m_hChromaShift) * sizeof(Pel));
-                memcpy(V + j * getCStride(), V, (width >> m_hChromaShift) * sizeof(Pel));
+                memcpy(U + j * getCStride(), U, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
+                memcpy(V + j * getCStride(), V, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
             }
         }
     }
-    else if(pic.bitDepth == 8)
+    else
     {
         uint8_t *y = (uint8_t*)pic.planes[0];
         uint8_t *u = (uint8_t*)pic.planes[1];
         uint8_t *v = (uint8_t*)pic.planes[2];
 
-        /* width and height - without padsize */
-        int width = m_picWidth - padx;
-        int height = m_picHeight - pady;
-
         // Manually copy pixels to up-size them
         for (int r = 0; r < height; r++)
         {
@@ -441,6 +428,10 @@ void TComPicYuv::copyFromPicture(const x
                 Y[c] = (Pel)y[c];
             }
 
+            for (int x = 0; x < padx; x++)
+            {
+                Y[width + x] = Y[width - 1];
+            }
             Y += getStride();
             y += pic.stride[0];
         }
@@ -453,97 +444,10 @@ void TComPicYuv::copyFromPicture(const x
                 V[c] = (Pel)v[c];
             }
 
-            U += getCStride();
-            V += getCStride();
-            u += pic.stride[1];
-            v += pic.stride[2];
-        }
-
-        /* Extend the right if width is not multiple of minimum CU size */
-
-        if (padx)
-        {
-            Y = getLumaAddr();
-            U = getCbAddr();
-            V = getCrAddr();
-
-            for (int r = 0; r < height; r++)
+            for (int x = 0; x < padx >> m_hChromaShift; x++)
             {
-                for (int x = 0; x < padx; x++)
-                {
-                    Y[width + x] = Y[width - 1];
-                }
-
-                Y += getStride();
-            }
-
-            for (int r = 0; r < height >> m_vChromaShift; r++)
-            {
-                for (int x = 0; x < padx >> m_hChromaShift; x++)
-                {
-                    U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
-                    V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
-                }
-
-                U += getCStride();
-                V += getCStride();
-            }
-        }
-
-        /* extend the bottom if height is not multiple of the minimum CU size */
-        if (pady)
-        {
-            width = m_picWidth;
-            Y = getLumaAddr() + (height - 1) * getStride();
-            U = getCbAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
-            V = getCrAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
-
-            for (uint32_t i = 1; i <= pady; i++)
-            {
-                memcpy(Y + i * getStride(), Y, width * sizeof(Pel));
-            }
-
-            for (uint32_t j = 1; j <= pady >> m_vChromaShift; j++)
-            {
-                memcpy(U + j * getCStride(), U, (width >> m_hChromaShift) * sizeof(Pel));
-                memcpy(V + j * getCStride(), V, (width >> m_hChromaShift) * sizeof(Pel));
-            }
-        }
-    }
-    else
-#endif // if HIGH_BIT_DEPTH
-    {
-        uint8_t *y = (uint8_t*)pic.planes[0];
-        uint8_t *u = (uint8_t*)pic.planes[1];
-        uint8_t *v = (uint8_t*)pic.planes[2];
-
-        /* width and height - without padsize */
-        int width = (m_picWidth * (pic.bitDepth > 8 ? 2 : 1)) - padx;
-        int height = m_picHeight - pady;
-
-        // copy pixels by row into encoder's buffer
-        for (int r = 0; r < height; r++)
-        {
-            memcpy(Y, y, width);
-
-            /* extend the right if width is not multiple of the minimum CU size */
-            if (padx)
-                ::memset(Y + width, Y[width - 1], padx);
-
-            Y += getStride();
-            y += pic.stride[0];
-        }
-
-        for (int r = 0; r < height >> m_vChromaShift; r++)
-        {
-            memcpy(U, u, width >> m_hChromaShift);
-            memcpy(V, v, width >> m_hChromaShift);
-
-            /* extend the right if width is not multiple of the minimum CU size */
-            if (padx)
-            {
-                ::memset(U + (width >> m_hChromaShift), U[(width >> m_hChromaShift) - 1], padx >> m_hChromaShift);
-                ::memset(V + (width >> m_hChromaShift), V[(width >> m_hChromaShift) - 1], padx >> m_hChromaShift);
+                U[(width >> m_hChromaShift) + x] = U[(width >> m_hChromaShift) - 1];
+                V[(width >> m_hChromaShift) + x] = V[(width >> m_hChromaShift) - 1];
             }
 
             U += getCStride();
@@ -555,21 +459,74 @@ void TComPicYuv::copyFromPicture(const x
         /* extend the bottom if height is not multiple of the minimum CU size */
         if (pady)
         {
-            width = m_picWidth;
             Y = getLumaAddr() + (height - 1) * getStride();
             U = getCbAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
             V = getCrAddr() + ((height >> m_vChromaShift) - 1) * getCStride();
 
             for (uint32_t i = 1; i <= pady; i++)
             {
-                memcpy(Y + i * getStride(), Y, width * sizeof(pixel));
+                memcpy(Y + i * getStride(), Y, (width + padx) * sizeof(Pel));
             }
 
             for (uint32_t j = 1; j <= pady >> m_vChromaShift; j++)
             {
-                memcpy(U + j * getCStride(), U, (width >> m_hChromaShift) * sizeof(pixel));
-                memcpy(V + j * getCStride(), V, (width >> m_hChromaShift) * sizeof(pixel));
+                memcpy(U + j * getCStride(), U, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
+                memcpy(V + j * getCStride(), V, ((width + padx) >> m_hChromaShift) * sizeof(Pel));
             }
         }
     }
+#else // if HIGH_BIT_DEPTH
+    uint8_t *y = (uint8_t*)pic.planes[0];
+    uint8_t *u = (uint8_t*)pic.planes[1];
+    uint8_t *v = (uint8_t*)pic.planes[2];
+
+    for (int r = 0; r < height; r++)
+    {
+        memcpy(Y, y, width);
+
+        /* extend the right if width is not multiple of the minimum CU size */
+        if (padx)
+            ::memset(Y + width, Y[width - 1], padx);
+
+        Y += getStride();
+        y += pic.stride[0];
+    }