[x265-commits] [x265] remove unused recon[] from assembly code
Min Chen
chenm003 at 163.com
Thu Apr 3 07:33:38 CEST 2014
details: http://hg.videolan.org/x265/rev/a6930bfbd908
branches:
changeset: 6643:a6930bfbd908
user: Min Chen <chenm003 at 163.com>
date: Tue Apr 01 16:41:59 2014 -0700
description:
remove unused recon[] from assembly code
Subject: [x265] calcQpForCU: remove m_pic input parameter.
details: http://hg.videolan.org/x265/rev/03bad90e94ad
branches:
changeset: 6644:03bad90e94ad
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Apr 02 06:51:35 2014 +0530
description:
calcQpForCU: remove m_pic input parameter.
Subject: [x265] Backed out changeset: a6930bfbd908
details: http://hg.videolan.org/x265/rev/d0b5ea32525b
branches:
changeset: 6645:d0b5ea32525b
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Apr 02 15:58:19 2014 +0530
description:
Backed out changeset: a6930bfbd908
This changeset causes crashes. Needs to be re-examined.
Subject: [x265] frameencoder: removing assign qp inconsistencies which were triggered for unreferenced P frames
details: http://hg.videolan.org/x265/rev/606da0b6bc58
branches: stable
changeset: 6646:606da0b6bc58
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Apr 02 17:05:08 2014 +0530
description:
frameencoder: removing assign qp inconsistencies which were triggered for unreferenced P frames
Subject: [x265] Merge from stable
details: http://hg.videolan.org/x265/rev/3f27daf35506
branches:
changeset: 6647:3f27daf35506
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Apr 02 17:08:25 2014 +0530
description:
Merge from stable
Subject: [x265] param: set aq strength to 0 in CQP
details: http://hg.videolan.org/x265/rev/dc887415f6df
branches:
changeset: 6648:dc887415f6df
user: Aarthi Thirumalai
date: Wed Apr 02 17:19:38 2014 +0530
description:
param: set aq strength to 0 in CQP
Subject: [x265] param: fix typo in if-check.
details: http://hg.videolan.org/x265/rev/261b3c2e788e
branches:
changeset: 6649:261b3c2e788e
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Apr 02 17:37:58 2014 +0530
description:
param: fix typo in if-check.
Subject: [x265] frameencoder: fix white-space nit, add comment
details: http://hg.videolan.org/x265/rev/67c0aa70a125
branches:
changeset: 6650:67c0aa70a125
user: Steve Borho <steve at borho.org>
date: Wed Apr 02 15:45:14 2014 -0500
description:
frameencoder: fix white-space nit, add comment
Subject: [x265] weight: properly reset weights when no-residual early-out is taken
details: http://hg.videolan.org/x265/rev/e03388e98ecc
branches: stable
changeset: 6651:e03388e98ecc
user: Steve Borho <steve at borho.org>
date: Wed Apr 02 22:51:49 2014 -0500
description:
weight: properly reset weights when no-residual early-out is taken
This fixes a hash mismatch seen with a Main10 encode of sintel-480p
Subject: [x265] cleanup m_cuColocated[]
details: http://hg.videolan.org/x265/rev/ccb2b7c26bb6
branches:
changeset: 6652:ccb2b7c26bb6
user: Satoshi Nakagawa <nakagawa424 at oki.com>
date: Wed Apr 02 17:09:23 2014 +0900
description:
cleanup m_cuColocated[]
Subject: [x265] remove unused parameter *recon from assembly code
details: http://hg.videolan.org/x265/rev/fdfad9734231
branches:
changeset: 6653:fdfad9734231
user: Min Chen <chenm003 at 163.com>
date: Wed Apr 02 13:12:50 2014 -0700
description:
remove unused parameter *recon from assembly code
Subject: [x265] testbench: use different stride on calcrecon
details: http://hg.videolan.org/x265/rev/89af57686794
branches:
changeset: 6654:89af57686794
user: Min Chen <chenm003 at 163.com>
date: Wed Apr 02 13:13:05 2014 -0700
description:
testbench: use different stride on calcrecon
Subject: [x265] dpb: Allow two L1 refs when b-pyramid is enabled [CHANGES OUTPUTS]
details: http://hg.videolan.org/x265/rev/d815c4a8fa74
branches:
changeset: 6655:d815c4a8fa74
user: Gopu Govindaswamy
date: Wed Apr 02 14:07:54 2014 +0530
description:
dpb: Allow two L1 refs when b-pyramid is enabled [CHANGES OUTPUTS]
Consider this common case: if we have 5 consecutive (display order frames) that
are determined to be P1-B1-B2-B3-P2 by the lookahead. When b-pyramid is
enabled, the middle B will be encoded first and used as a reference by the two
following B frames (in encode order); P1-P2-B2ref-B1-B3
frame L0 L1
P1
P2 P1
B2ref P1 P2
B1 P1 P2 B2ref
B3 B2ref P1 P2
When B1 is encoded, both B2ref and P2 should be available as L1 references,
this will improve the encode compression efficiency when b-pyramid is enabled
(closes #12)
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/c0362b478e23
branches:
changeset: 6656:c0362b478e23
user: Steve Borho <steve at borho.org>
date: Wed Apr 02 22:52:19 2014 -0500
description:
Merge with stable
diffstat:
source/Lib/TLibCommon/TComDataCU.cpp | 24 -
source/Lib/TLibCommon/TComDataCU.h | 3 -
source/Lib/TLibEncoder/TEncSearch.cpp | 4 +-
source/common/pixel.cpp | 4 +-
source/common/primitives.h | 2 +-
source/common/x86/pixel-util.h | 12 +-
source/common/x86/pixel-util8.asm | 652 +++++++++++++--------------------
source/encoder/compress.cpp | 10 +-
source/encoder/dpb.h | 2 +-
source/encoder/encoder.cpp | 6 +
source/encoder/frameencoder.cpp | 18 +-
source/encoder/frameencoder.h | 2 +-
source/encoder/weightPrediction.cpp | 3 +
source/test/pixelharness.cpp | 41 +-
14 files changed, 315 insertions(+), 468 deletions(-)
diffs (truncated from 1206 to 300 lines):
diff -r 0206822d9fea -r c0362b478e23 source/Lib/TLibCommon/TComDataCU.cpp
--- a/source/Lib/TLibCommon/TComDataCU.cpp Tue Apr 01 23:28:32 2014 +0530
+++ b/source/Lib/TLibCommon/TComDataCU.cpp Wed Apr 02 22:52:19 2014 -0500
@@ -91,8 +91,6 @@ TComDataCU::TComDataCU()
m_cuAboveRight = NULL;
m_cuAbove = NULL;
m_cuLeft = NULL;
- m_cuColocated[0] = NULL;
- m_cuColocated[1] = NULL;
m_mvpIdx[0] = NULL;
m_mvpIdx[1] = NULL;
m_chromaFormat = 0;
@@ -280,9 +278,6 @@ void TComDataCU::initCU(TComPic* pic, ui
m_cuAboveLeft = NULL;
m_cuAboveRight = NULL;
- m_cuColocated[0] = NULL;
- m_cuColocated[1] = NULL;
-
uint32_t uiWidthInCU = pic->getFrameWidthInCU();
if (m_cuAddr % uiWidthInCU)
{
@@ -303,16 +298,6 @@ void TComDataCU::initCU(TComPic* pic, ui
{
m_cuAboveRight = pic->getCU(m_cuAddr - uiWidthInCU + 1);
}
-
- if (getSlice()->getNumRefIdx(REF_PIC_LIST_0) > 0)
- {
- m_cuColocated[0] = getSlice()->getRefPic(REF_PIC_LIST_0, 0)->getCU(m_cuAddr);
- }
-
- if (getSlice()->getNumRefIdx(REF_PIC_LIST_1) > 0)
- {
- m_cuColocated[1] = getSlice()->getRefPic(REF_PIC_LIST_1, 0)->getCU(m_cuAddr);
- }
}
/** initialize prediction data with enabling sub-LCU-level delta QP
@@ -457,9 +442,6 @@ void TComDataCU::initSubCU(TComDataCU* c
m_cuAbove = cu->getCUAbove();
m_cuAboveLeft = cu->getCUAboveLeft();
m_cuAboveRight = cu->getCUAboveRight();
-
- m_cuColocated[0] = cu->getCUColocated(REF_PIC_LIST_0);
- m_cuColocated[1] = cu->getCUColocated(REF_PIC_LIST_1);
}
// initialize Sub partition
@@ -526,9 +508,6 @@ void TComDataCU::initSubCU(TComDataCU* c
m_cuAbove = cu->getCUAbove();
m_cuAboveLeft = cu->getCUAboveLeft();
m_cuAboveRight = cu->getCUAboveRight();
-
- m_cuColocated[0] = cu->getCUColocated(REF_PIC_LIST_0);
- m_cuColocated[1] = cu->getCUColocated(REF_PIC_LIST_1);
}
@@ -620,9 +599,6 @@ void TComDataCU::copyPartFrom(TComDataCU
m_cuAbove = cu->getCUAbove();
m_cuLeft = cu->getCULeft();
- m_cuColocated[0] = cu->getCUColocated(REF_PIC_LIST_0);
- m_cuColocated[1] = cu->getCUColocated(REF_PIC_LIST_1);
-
m_cuMvField[0].copyFrom(cu->getCUMvField(REF_PIC_LIST_0), cu->getTotalNumPart(), offset);
m_cuMvField[1].copyFrom(cu->getCUMvField(REF_PIC_LIST_1), cu->getTotalNumPart(), offset);
diff -r 0206822d9fea -r c0362b478e23 source/Lib/TLibCommon/TComDataCU.h
--- a/source/Lib/TLibCommon/TComDataCU.h Tue Apr 01 23:28:32 2014 +0530
+++ b/source/Lib/TLibCommon/TComDataCU.h Wed Apr 02 22:52:19 2014 -0500
@@ -129,7 +129,6 @@ private:
TComDataCU* m_cuAboveRight; ///< pointer of above-right CU
TComDataCU* m_cuAbove; ///< pointer of above CU
TComDataCU* m_cuLeft; ///< pointer of left CU
- TComDataCU* m_cuColocated[2]; ///< pointer of temporally colocated CU's for both directions
// -------------------------------------------------------------------------------------------------------------------
// coding tool information
@@ -387,8 +386,6 @@ public:
TComDataCU* getCUAboveRight() { return m_cuAboveRight; }
- TComDataCU* getCUColocated(int picList) { return m_cuColocated[picList]; }
-
TComDataCU* getPULeft(uint32_t& lPartUnitIdx,
uint32_t curPartUnitIdx,
bool bEnforceSliceRestriction = true,
diff -r 0206822d9fea -r c0362b478e23 source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp Tue Apr 01 23:28:32 2014 +0530
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp Wed Apr 02 22:52:19 2014 -0500
@@ -465,7 +465,7 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
assert(width <= 32);
//===== reconstruction =====
- primitives.calcrecon[size](pred, residual, 0, reconQt, reconIPred, stride, MAX_CU_SIZE, reconIPredStride);
+ primitives.calcrecon[size](pred, residual, reconQt, reconIPred, stride, MAX_CU_SIZE, reconIPredStride);
//===== update distortion =====
outDist += primitives.sse_sp[part](reconQt, MAX_CU_SIZE, fenc, stride);
}
@@ -587,7 +587,7 @@ void TEncSearch::xIntraCodingChromaBlk(T
assert(((intptr_t)residual & (width - 1)) == 0);
assert(width <= 32);
//===== reconstruction =====
- primitives.calcrecon[size](pred, residual, 0, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
+ primitives.calcrecon[size](pred, residual, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
//===== update distortion =====
uint32_t dist = primitives.sse_sp[part](reconQt, reconQtStride, fenc, stride);
if (ttype == TEXT_CHROMA_U)
diff -r 0206822d9fea -r c0362b478e23 source/common/pixel.cpp
--- a/source/common/pixel.cpp Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/pixel.cpp Wed Apr 02 22:52:19 2014 -0500
@@ -460,9 +460,7 @@ void getResidual(pixel *fenc, pixel *pre
}
template<int blockSize>
-void calcRecons(pixel* pred, int16_t* residual,
- pixel*,
- int16_t* recqt, pixel* recipred, int stride, int qtstride, int ipredstride)
+void calcRecons(pixel* pred, int16_t* residual, int16_t* recqt, pixel* recipred, int stride, int qtstride, int ipredstride)
{
for (int y = 0; y < blockSize; y++)
{
diff -r 0206822d9fea -r c0362b478e23 source/common/primitives.h
--- a/source/common/primitives.h Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/primitives.h Wed Apr 02 22:52:19 2014 -0500
@@ -125,7 +125,7 @@ typedef void (*cvt32to16_shr_t)(int16_t
typedef void (*dct_t)(int16_t *src, int32_t *dst, intptr_t stride);
typedef void (*idct_t)(int32_t *src, int16_t *dst, intptr_t stride);
typedef void (*calcresidual_t)(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);
-typedef void (*calcrecon_t)(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+typedef void (*calcrecon_t)(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
typedef void (*transpose_t)(pixel* dst, pixel* src, intptr_t stride);
typedef uint32_t (*quant_t)(int32_t *coef, int32_t *quantCoeff, int32_t *deltaU, int32_t *qCoef, int qBits, int add, int numCoeff, int32_t* lastPos);
typedef void (*dequant_scaling_t)(const int32_t* src, const int32_t *dequantCoef, int32_t* dst, int num, int mcqp_miper, int shift);
diff -r 0206822d9fea -r c0362b478e23 source/common/x86/pixel-util.h
--- a/source/common/x86/pixel-util.h Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/x86/pixel-util.h Wed Apr 02 22:52:19 2014 -0500
@@ -24,12 +24,12 @@
#ifndef X265_PIXEL_UTIL_H
#define X265_PIXEL_UTIL_H
-void x265_calcRecons4_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons8_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons16_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons32_sse2(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons16_sse4(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
-void x265_calcRecons32_sse4(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons4_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons8_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons16_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons32_sse2(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons16_sse4(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
+void x265_calcRecons32_sse4(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred);
void x265_getResidual4_sse2(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);
void x265_getResidual8_sse2(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);
diff -r 0206822d9fea -r c0362b478e23 source/common/x86/pixel-util8.asm
--- a/source/common/x86/pixel-util8.asm Tue Apr 01 23:28:32 2014 +0530
+++ b/source/common/x86/pixel-util8.asm Wed Apr 02 22:52:19 2014 -0500
@@ -58,590 +58,452 @@ cextern pw_2000
cextern pw_pixel_max
;-----------------------------------------------------------------------------
-; void calcrecon(pixel* pred, int16_t* residual, pixel* recon, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred)
+; void calcrecon(pixel* pred, int16_t* residual, int16_t* reconqt, pixel *reconipred, int stride, int strideqt, int strideipred)
;-----------------------------------------------------------------------------
INIT_XMM sse2
-cglobal calcRecons4
%if HIGH_BIT_DEPTH
%if ARCH_X86_64 == 1
- DECLARE_REG_TMP 0,1,2,3,4,5,6,7,8
- PROLOGUE 6,9,6
+cglobal calcRecons4, 5,8,4
+ %define t7b r7b
%else
- DECLARE_REG_TMP 0,1,2,3,4,5
- PROLOGUE 6,7,6
- %define t6 r6m
- %define t6d r6d
- %define t7 r7m
- %define t8d r6d
+cglobal calcRecons4, 5,7,4,0-1
+ %define t7b byte [rsp]
%endif
-
- mov t6d, r6m
-%if ARCH_X86_64 == 0
- add t6d, t6d
- mov r6m, t6d
-%else
+ mov r4d, r4m
mov r5d, r5m
- mov r7d, r7m
- add t6d, t6d
- add t7, t7
-%endif
+ mov r6d, r6m
+ add r4d, r4d
+ add r5d, r5d
+ add r6d, r6d
pxor m4, m4
mova m5, [pw_pixel_max]
- add t5, t5
- mov t8d, 4/2
+ mov t7b, 4/2
.loop:
- movh m0, [t0]
- movh m1, [t0 + t5]
+ movh m0, [r0]
+ movh m1, [r0 + r4]
punpcklqdq m0, m1
- movh m2, [t1]
- movh m3, [t1 + t5]
+ movh m2, [r1]
+ movh m3, [r1 + r4]
punpcklqdq m2, m3
paddw m0, m2
CLIPW m0, m4, m5
- ; store recon[] and recipred[]
- movh [t4], m0
-%if ARCH_X86_64 == 0
- add t4, t7
- add t4, t7
- movhps [t4], m0
- add t4, t7
- add t4, t7
+ ; store recipred[]
+ movh [r3], m0
+ movhps [r3 + r6], m0
+
+ ; store recqt[]
+ movh [r2], m0
+ movhps [r2 + r5], m0
+
+ lea r0, [r0 + r4 * 2]
+ lea r1, [r1 + r4 * 2]
+ lea r2, [r2 + r5 * 2]
+ lea r3, [r3 + r6 * 2]
+
+ dec t7b
+ jnz .loop
+ RET
+%else ;HIGH_BIT_DEPTH
+
+%if ARCH_X86_64 == 1
+cglobal calcRecons4, 5,8,4
+ %define t7b r7b
%else
- movhps [t4 + t7], m0
- lea t4, [t4 + t7 * 2]
+cglobal calcRecons4, 5,7,4,0-1
+ %define t7b byte [rsp]
%endif
-
- ; store recqt[]
- movh [t3], m0
- add t3, t6
- movhps [t3], m0
- add t3, t6
-
- lea t0, [t0 + t5 * 2]
- lea t1, [t1 + t5 * 2]
-
- dec t8d
- jnz .loop
-
-%else ;HIGH_BIT_DEPTH
-%if ARCH_X86_64 == 1
- DECLARE_REG_TMP 0,1,2,3,4,5,6,7,8
- PROLOGUE 6,9,4
-%else
- DECLARE_REG_TMP 0,1,2,3,4,5
- PROLOGUE 6,7,4
- %define t6 r6m
- %define t6d r6d
- %define t7 r7m
- %define t8d r6d
-%endif
-
- mov t6d, r6m
-%if ARCH_X86_64 == 0
- add t6d, t6d
- mov r6m, t6d
-%else
+ mov r4d, r4m
mov r5d, r5m
- mov r7d, r7m
- add t6d, t6d
-%endif
+ mov r6d, r6m
+ add r5d, r5d
pxor m0, m0
- mov t8d, 4/2
+ mov t7b, 4/2
.loop:
More information about the x265-commits
mailing list