[x265-commits] [x265] analysis: re-order RD 5/6 analysis to do splits before ME...
Ashok Kumar Mishra
ashok at multicorewareinc.com
Wed Jul 15 20:05:55 CEST 2015
details: http://hg.videolan.org/x265/rev/42e55c6eafb0
branches:
changeset: 10816:42e55c6eafb0
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Thu May 21 19:16:28 2015 +0530
description:
analysis: re-order RD 5/6 analysis to do splits before ME or intra
This commit changes outputs because splits used to be avoided when an inter or
intra mode was chosen without residual coding. This recursion early-out is no
longer possible. Only merge without residual (aka skip) can abort recursion.
This commit changes the order of analysis such that the four split blocks are
analyzed prior to attempting any ME or intra modes. Future commits we will use
the knowledge learned during split analysis to avoid unlikely work at the
current depth (reducing motion references avoiding unlikely intra, rectangular,
asymmetric, and lossless modes)
Subject: [x265] analysis: at RD 5/6 avoid motion references if not used by split blocks
details: http://hg.videolan.org/x265/rev/c19a4ae5cf7d
branches:
changeset: 10817:c19a4ae5cf7d
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:07 2015 +0530
description:
analysis: at RD 5/6 avoid motion references if not used by split blocks
Subject: [x265] analysis: skip intra in RD 5/6 if split was analyzed and no split CUs used intra
details: http://hg.videolan.org/x265/rev/ef9cd36f3672
branches:
changeset: 10818:ef9cd36f3672
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:10 2015 +0530
description:
analysis: skip intra in RD 5/6 if split was analyzed and no split CUs used intra
Subject: [x265] stats: RD 5/6 profile effectiveness of avoiding intra if split CUs did not select it
details: http://hg.videolan.org/x265/rev/af57c28db2ff
branches:
changeset: 10819:af57c28db2ff
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:13 2015 +0530
description:
stats: RD 5/6 profile effectiveness of avoiding intra if split CUs did not select it
Subject: [x265] analysis: respect X265_REF_LIMIT_DEPTH with RD 5/6
details: http://hg.videolan.org/x265/rev/98bfdc49b66e
branches:
changeset: 10820:98bfdc49b66e
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:17 2015 +0530
description:
analysis: respect X265_REF_LIMIT_DEPTH with RD 5/6
Subject: [x265] analysis: model the effectiveness of --limit-ref with RD 5/6
details: http://hg.videolan.org/x265/rev/63f8b338f2be
branches:
changeset: 10821:63f8b338f2be
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:20 2015 +0530
description:
analysis: model the effectiveness of --limit-ref with RD 5/6
Subject: [x265] Regression Test: added new command line --ref-limits for RD-5/6 in regression-tests.txt
details: http://hg.videolan.org/x265/rev/8521e8d7a477
branches:
changeset: 10822:8521e8d7a477
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:24 2015 +0530
description:
Regression Test: added new command line --ref-limits for RD-5/6 in regression-tests.txt
Subject: [x265] analysis: removed switch-case to read the best ref index
details: http://hg.videolan.org/x265/rev/19f3f98b5c73
branches:
changeset: 10823:19f3f98b5c73
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:28 2015 +0530
description:
analysis: removed switch-case to read the best ref index
Subject: [x265] analysis: used CUData helper function to get number of PUs and offset
details: http://hg.videolan.org/x265/rev/a850ecb0895b
branches:
changeset: 10824:a850ecb0895b
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:34 2015 +0530
description:
analysis: used CUData helper function to get number of PUs and offset
Subject: [x265] entropy: removed g_puOffset table
details: http://hg.videolan.org/x265/rev/35029d6001c5
branches:
changeset: 10825:35029d6001c5
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Jun 23 20:35:38 2015 +0530
description:
entropy: removed g_puOffset table
Subject: [x265] dither: fix bitdepth check
details: http://hg.videolan.org/x265/rev/8fc3ea4894c2
branches:
changeset: 10826:8fc3ea4894c2
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Jul 15 11:54:24 2015 +0530
description:
dither: fix bitdepth check
Subject: [x265] cli: add 12-bit to showHelp
details: http://hg.videolan.org/x265/rev/54689cbd2e01
branches:
changeset: 10827:54689cbd2e01
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Jul 15 12:24:46 2015 +0530
description:
cli: add 12-bit to showHelp
Subject: [x265] asm: fix intra_pred_dc_sse2 in Main12
details: http://hg.videolan.org/x265/rev/8efce8620ae2
branches:
changeset: 10828:8efce8620ae2
user: Min Chen <chenm003 at 163.com>
date: Tue Jul 14 16:29:46 2015 -0700
description:
asm: fix intra_pred_dc_sse2 in Main12
diffstat:
source/common/cudata.cpp | 32 ----
source/common/cudata.h | 37 ++++-
source/common/x86/intrapred16.asm | 61 +++----
source/encoder/analysis.cpp | 281 ++++++++++++++++++++++---------------
source/encoder/analysis.h | 4 +-
source/encoder/entropy.cpp | 7 +-
source/encoder/entropy.h | 2 -
source/encoder/search.cpp | 2 +-
source/test/regression-tests.txt | 24 +-
source/x265-extras.cpp | 4 +-
source/x265cli.h | 2 +-
11 files changed, 247 insertions(+), 209 deletions(-)
diffs (truncated from 930 to 300 lines):
diff -r 8023786c5247 -r 8efce8620ae2 source/common/cudata.cpp
--- a/source/common/cudata.cpp Mon Jul 13 17:38:02 2015 -0700
+++ b/source/common/cudata.cpp Tue Jul 14 16:29:46 2015 -0700
@@ -112,38 +112,6 @@ inline MV scaleMv(MV mv, int scale)
return MV((int16_t)mvx, (int16_t)mvy);
}
-// Partition table.
-// First index is partitioning mode. Second index is partition index.
-// Third index is 0 for partition sizes, 1 for partition offsets. The
-// sizes and offsets are encoded as two packed 4-bit values (X,Y).
-// X and Y represent 1/4 fractions of the block size.
-const uint32_t partTable[8][4][2] =
-{
- // XY
- { { 0x44, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2Nx2N.
- { { 0x42, 0x00 }, { 0x42, 0x02 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxN.
- { { 0x24, 0x00 }, { 0x24, 0x20 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_Nx2N.
- { { 0x22, 0x00 }, { 0x22, 0x20 }, { 0x22, 0x02 }, { 0x22, 0x22 } }, // SIZE_NxN.
- { { 0x41, 0x00 }, { 0x43, 0x01 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnU.
- { { 0x43, 0x00 }, { 0x41, 0x03 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnD.
- { { 0x14, 0x00 }, { 0x34, 0x10 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_nLx2N.
- { { 0x34, 0x00 }, { 0x14, 0x30 }, { 0x00, 0x00 }, { 0x00, 0x00 } } // SIZE_nRx2N.
-};
-
-// Partition Address table.
-// First index is partitioning mode. Second index is partition address.
-const uint32_t partAddrTable[8][4] =
-{
- { 0x00, 0x00, 0x00, 0x00 }, // SIZE_2Nx2N.
- { 0x00, 0x08, 0x08, 0x08 }, // SIZE_2NxN.
- { 0x00, 0x04, 0x04, 0x04 }, // SIZE_Nx2N.
- { 0x00, 0x04, 0x08, 0x0C }, // SIZE_NxN.
- { 0x00, 0x02, 0x02, 0x02 }, // SIZE_2NxnU.
- { 0x00, 0x0A, 0x0A, 0x0A }, // SIZE_2NxnD.
- { 0x00, 0x01, 0x01, 0x01 }, // SIZE_nLx2N.
- { 0x00, 0x05, 0x05, 0x05 } // SIZE_nRx2N.
-};
-
}
cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
diff -r 8023786c5247 -r 8efce8620ae2 source/common/cudata.h
--- a/source/common/cudata.h Mon Jul 13 17:38:02 2015 -0700
+++ b/source/common/cudata.h Tue Jul 14 16:29:46 2015 -0700
@@ -121,6 +121,38 @@ typedef void(*cubcast_t)(uint8_t* dst, u
// Partition count table, index represents partitioning mode.
const uint32_t nbPartsTable[8] = { 1, 2, 2, 4, 2, 2, 2, 2 };
+// Partition table.
+// First index is partitioning mode. Second index is partition index.
+// Third index is 0 for partition sizes, 1 for partition offsets. The
+// sizes and offsets are encoded as two packed 4-bit values (X,Y).
+// X and Y represent 1/4 fractions of the block size.
+const uint32_t partTable[8][4][2] =
+{
+ // XY
+ { { 0x44, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2Nx2N.
+ { { 0x42, 0x00 }, { 0x42, 0x02 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxN.
+ { { 0x24, 0x00 }, { 0x24, 0x20 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_Nx2N.
+ { { 0x22, 0x00 }, { 0x22, 0x20 }, { 0x22, 0x02 }, { 0x22, 0x22 } }, // SIZE_NxN.
+ { { 0x41, 0x00 }, { 0x43, 0x01 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnU.
+ { { 0x43, 0x00 }, { 0x41, 0x03 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_2NxnD.
+ { { 0x14, 0x00 }, { 0x34, 0x10 }, { 0x00, 0x00 }, { 0x00, 0x00 } }, // SIZE_nLx2N.
+ { { 0x34, 0x00 }, { 0x14, 0x30 }, { 0x00, 0x00 }, { 0x00, 0x00 } } // SIZE_nRx2N.
+};
+
+// Partition Address table.
+// First index is partitioning mode. Second index is partition address.
+const uint32_t partAddrTable[8][4] =
+{
+ { 0x00, 0x00, 0x00, 0x00 }, // SIZE_2Nx2N.
+ { 0x00, 0x08, 0x08, 0x08 }, // SIZE_2NxN.
+ { 0x00, 0x04, 0x04, 0x04 }, // SIZE_Nx2N.
+ { 0x00, 0x04, 0x08, 0x0C }, // SIZE_NxN.
+ { 0x00, 0x02, 0x02, 0x02 }, // SIZE_2NxnU.
+ { 0x00, 0x0A, 0x0A, 0x0A }, // SIZE_2NxnD.
+ { 0x00, 0x01, 0x01, 0x01 }, // SIZE_nLx2N.
+ { 0x00, 0x05, 0x05, 0x05 } // SIZE_nRx2N.
+};
+
// Holds part data for a CU of a given size, from an 8x8 CU to a CTU
class CUData
{
@@ -222,8 +254,11 @@ public:
void getNeighbourMV(uint32_t puIdx, uint32_t absPartIdx, InterNeighbourMV* neighbours) const;
void getIntraTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
void getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
+ uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) |
+ (((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); }
+ uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
- uint32_t getNumPartInter() const { return nbPartsTable[(int)m_partSize[0]]; }
+ uint32_t getNumPartInter(uint32_t absPartIdx) const { return nbPartsTable[(int)m_partSize[absPartIdx]]; }
bool isIntra(uint32_t absPartIdx) const { return m_predMode[absPartIdx] == MODE_INTRA; }
bool isInter(uint32_t absPartIdx) const { return !!(m_predMode[absPartIdx] & MODE_INTER); }
bool isSkipped(uint32_t absPartIdx) const { return m_predMode[absPartIdx] == MODE_SKIP; }
diff -r 8023786c5247 -r 8efce8620ae2 source/common/x86/intrapred16.asm
--- a/source/common/x86/intrapred16.asm Mon Jul 13 17:38:02 2015 -0700
+++ b/source/common/x86/intrapred16.asm Tue Jul 14 16:29:46 2015 -0700
@@ -142,7 +142,7 @@ cglobal intra_pred_dc4, 5,6,2
test r4d, r4d
paddw m0, [pw_4]
- psraw m0, 3
+ psrlw m0, 3
; store DC 4x4
movh [r0], m0
@@ -161,7 +161,7 @@ cglobal intra_pred_dc4, 5,6,2
; filter top
movh m1, [r2 + 2]
paddw m1, m0
- psraw m1, 2
+ psrlw m1, 2
movh [r0], m1 ; overwrite top-left pixel, we will update it later
; filter top-left
@@ -176,7 +176,7 @@ cglobal intra_pred_dc4, 5,6,2
; filter left
movu m1, [r2 + 20]
paddw m1, m0
- psraw m1, 2
+ psrlw m1, 2
movd r3d, m1
mov [r0 + r1 * 2], r3w
shr r3d, 16
@@ -202,7 +202,7 @@ cglobal intra_pred_dc8, 5, 8, 2
pmaddwd m0, [pw_1]
paddw m0, [pw_8]
- psraw m0, 4 ; sum = sum / 16
+ psrlw m0, 4 ; sum = sum / 16
pshuflw m0, m0, 0
pshufd m0, m0, 0 ; m0 = word [dc_val ...]
@@ -235,7 +235,7 @@ cglobal intra_pred_dc8, 5, 8, 2
; filter top
movu m0, [r2 + 2]
paddw m0, m1
- psraw m0, 2
+ psrlw m0, 2
movu [r0], m0
; filter top-left
@@ -250,7 +250,7 @@ cglobal intra_pred_dc8, 5, 8, 2
; filter left
movu m0, [r2 + 36]
paddw m0, m1
- psraw m0, 2
+ psrlw m0, 2
movh r3, m0
mov [r0 + r1 * 2], r3w
shr r3, 16
@@ -284,14 +284,10 @@ cglobal intra_pred_dc16, 5, 10, 4
paddw m0, m1
paddw m2, m3
paddw m0, m2
- movhlps m1, m0
- paddw m0, m1
- pshuflw m1, m0, 0x6E
- paddw m0, m1
- pmaddwd m0, [pw_1]
-
- paddw m0, [pw_16]
- psraw m0, 5
+ HADDUW m0, m1
+ paddd m0, [pd_16]
+ psrld m0, 5
+
movd r5d, m0
pshuflw m0, m0, 0 ; m0 = word [dc_val ...]
pshufd m0, m0, 0
@@ -347,11 +343,11 @@ cglobal intra_pred_dc16, 5, 10, 4
; filter top
movu m2, [r2 + 2]
paddw m2, m1
- psraw m2, 2
+ psrlw m2, 2
movu [r0], m2
movu m3, [r2 + 18]
paddw m3, m1
- psraw m3, 2
+ psrlw m3, 2
movu [r0 + 16], m3
; filter top-left
@@ -366,7 +362,7 @@ cglobal intra_pred_dc16, 5, 10, 4
; filter left
movu m2, [r3 + 2]
paddw m2, m1
- psraw m2, 2
+ psrlw m2, 2
movq r2, m2
pshufd m2, m2, 0xEE
@@ -388,7 +384,7 @@ cglobal intra_pred_dc16, 5, 10, 4
movu m3, [r3 + 18]
paddw m3, m1
- psraw m3, 2
+ psrlw m3, 2
movq r3, m3
pshufd m3, m3, 0xEE
@@ -423,20 +419,19 @@ cglobal intra_pred_dc32, 3, 4, 6
paddw m0, m1
paddw m2, m3
paddw m0, m2
+ HADDUWD m0, m1
+
movu m1, [r2]
- movu m3, [r2 + 16]
- movu m4, [r2 + 32]
- movu m5, [r2 + 48]
+ movu m2, [r2 + 16]
+ movu m3, [r2 + 32]
+ movu m4, [r2 + 48]
+ paddw m1, m2
+ paddw m3, m4
paddw m1, m3
- paddw m4, m5
- paddw m1, m4
- paddw m0, m1
- movhlps m1, m0
- paddw m0, m1
- pshuflw m1, m0, 0x6E
- paddw m0, m1
- pmaddwd m0, [pw_1]
-
+ HADDUWD m1, m2
+
+ paddd m0, m1
+ HADDD m0, m1
paddd m0, [pd_32] ; sum = sum + 32
psrld m0, 6 ; sum = sum / 64
pshuflw m0, m0, 0
@@ -487,7 +482,7 @@ cglobal intra_pred_dc16, 3, 9, 4
phaddw xm0, xm0
pmaddwd xm0, [pw_1]
paddd xm0, [pd_16]
- psrad xm0, 5
+ psrld xm0, 5
movd r5d, xm0
vpbroadcastw m0, xm0
@@ -527,7 +522,7 @@ cglobal intra_pred_dc16, 3, 9, 4
; filter top
movu m2, [r2 + 2]
paddw m2, m1
- psraw m2, 2
+ psrlw m2, 2
movu [r0], m2
; filter top-left
@@ -542,7 +537,7 @@ cglobal intra_pred_dc16, 3, 9, 4
; filter left
movu m2, [r2 + 68]
paddw m2, m1
- psraw m2, 2
+ psrlw m2, 2
vextracti128 xm3, m2, 1
movq r3, xm2
diff -r 8023786c5247 -r 8efce8620ae2 source/encoder/analysis.cpp
--- a/source/encoder/analysis.cpp Mon Jul 13 17:38:02 2015 -0700
+++ b/source/encoder/analysis.cpp Tue Jul 14 16:29:46 2015 -0700
@@ -385,10 +385,10 @@ void Analysis::processPmode(PMODE& pmode
/* perform Mode task, repeat until no more work is available */
do
{
+ uint32_t refMasks[2] = { 0, 0 };
+
if (m_param->rdLevel <= 4)
{
- uint32_t refMasks[2] = { 0, 0 };
-
switch (pmode.modes[task])
{
case PRED_INTRA:
@@ -443,7 +443,7 @@ void Analysis::processPmode(PMODE& pmode
break;
case PRED_2Nx2N:
- slave.checkInter_rd5_6(md.pred[PRED_2Nx2N], pmode.cuGeom, SIZE_2Nx2N);
+ slave.checkInter_rd5_6(md.pred[PRED_2Nx2N], pmode.cuGeom, SIZE_2Nx2N, refMasks);
md.pred[PRED_BIDIR].rdCost = MAX_INT64;
if (m_slice->m_sliceType == B_SLICE)
{
@@ -454,27 +454,27 @@ void Analysis::processPmode(PMODE& pmode
break;
case PRED_Nx2N:
- slave.checkInter_rd5_6(md.pred[PRED_Nx2N], pmode.cuGeom, SIZE_Nx2N);
+ slave.checkInter_rd5_6(md.pred[PRED_Nx2N], pmode.cuGeom, SIZE_Nx2N, refMasks);
break;
case PRED_2NxN:
- slave.checkInter_rd5_6(md.pred[PRED_2NxN], pmode.cuGeom, SIZE_2NxN);
+ slave.checkInter_rd5_6(md.pred[PRED_2NxN], pmode.cuGeom, SIZE_2NxN, refMasks);
More information about the x265-commits
mailing list