[x265-commits] [x265] no-rdo early exit: giving weightage to the cost of all CU...
Sumalatha at videolan.org
Sumalatha at videolan.org
Wed Nov 13 01:42:33 CET 2013
details: http://hg.videolan.org/x265/rev/dc5c51ff542f
branches:
changeset: 5037:dc5c51ff542f
user: Sumalatha Polureddy
date: Tue Nov 12 10:45:56 2013 +0530
description:
no-rdo early exit: giving weightage to the cost of all CU's and neighbour CU's for early exit
Early exit is done when CU cost at depth "n" is lessthan sum of 60% of avgcost of all CU's
and 40% of avgcost of neighbour CU's at same depth.
Subject: [x265] asm: pixel_avg_32x(64,32,24,8)
details: http://hg.videolan.org/x265/rev/5b0e1731f776
branches:
changeset: 5038:5b0e1731f776
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Nov 12 10:25:21 2013 +0530
description:
asm: pixel_avg_32x(64,32,24,8)
Subject: [x265] asm: pixel_avg_64x(64,48,16)
details: http://hg.videolan.org/x265/rev/9c92947860e0
branches:
changeset: 5039:9c92947860e0
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Nov 12 11:03:42 2013 +0530
description:
asm: pixel_avg_64x(64,48,16)
Subject: [x265] asm: asm: pixel_avg_24x32
details: http://hg.videolan.org/x265/rev/56642525d09e
branches:
changeset: 5040:56642525d09e
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Nov 12 11:44:58 2013 +0530
description:
asm: asm: pixel_avg_24x32
Subject: [x265] asm: pixel_avg_48x64, pixel_avg_8x32
details: http://hg.videolan.org/x265/rev/4a4fd61e98e6
branches:
changeset: 5041:4a4fd61e98e6
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Tue Nov 12 11:56:18 2013 +0530
description:
asm: pixel_avg_48x64, pixel_avg_8x32
Subject: [x265] cleanup: hardcoded m_qtTempTComYuv[qtLayer].m_width to MAX_CU_SIZE
details: http://hg.videolan.org/x265/rev/12053d6bf759
branches:
changeset: 5042:12053d6bf759
user: Min Chen <chenm003 at 163.com>
date: Tue Nov 12 16:14:09 2013 +0800
description:
cleanup: hardcoded m_qtTempTComYuv[qtLayer].m_width to MAX_CU_SIZE
Subject: [x265] Backout: Causing non-determinism in rd 0 and 1. Needs to be further investigated.
details: http://hg.videolan.org/x265/rev/ab0968b4b65d
branches:
changeset: 5043:ab0968b4b65d
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Nov 12 16:36:54 2013 +0530
description:
Backout: Causing non-determinism in rd 0 and 1. Needs to be further investigated.
Subject: [x265] TEncSearch: use luma block copy (luma part size) if bChromaSame
details: http://hg.videolan.org/x265/rev/ea4f939478ed
branches:
changeset: 5044:ea4f939478ed
user: Steve Borho <steve at borho.org>
date: Mon Nov 11 22:29:22 2013 -0600
description:
TEncSearch: use luma block copy (luma part size) if bChromaSame
Subject: [x265] compress: fix shadow warning from GCC
details: http://hg.videolan.org/x265/rev/58bdb05da194
branches:
changeset: 5045:58bdb05da194
user: Steve Borho <steve at borho.org>
date: Mon Nov 11 22:30:32 2013 -0600
description:
compress: fix shadow warning from GCC
Subject: [x265] asm: assembly code for pixel_satd_32x24 and rearranged the functions
details: http://hg.videolan.org/x265/rev/085d5c625c53
branches:
changeset: 5046:085d5c625c53
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Tue Nov 12 12:19:28 2013 +0530
description:
asm: assembly code for pixel_satd_32x24 and rearranged the functions
Subject: [x265] asm: assembly code for pixel_satd_16x12
details: http://hg.videolan.org/x265/rev/2baf62a8e47d
branches:
changeset: 5047:2baf62a8e47d
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Tue Nov 12 12:51:37 2013 +0530
description:
asm: assembly code for pixel_satd_16x12
Subject: [x265] asm: assembly code for pixel_satd_16x4
details: http://hg.videolan.org/x265/rev/7818f5b7cc25
branches:
changeset: 5048:7818f5b7cc25
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Tue Nov 12 13:16:19 2013 +0530
description:
asm: assembly code for pixel_satd_16x4
Subject: [x265] asm: assembly code for satd_16x32, satd_16x64, satd_8x32
details: http://hg.videolan.org/x265/rev/d636952ed093
branches:
changeset: 5049:d636952ed093
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Tue Nov 12 16:34:37 2013 +0530
description:
asm: assembly code for satd_16x32, satd_16x64, satd_8x32
Subject: [x265] asm: assembly code for pixel_satd_12x16
details: http://hg.videolan.org/x265/rev/c56ce77dc081
branches:
changeset: 5050:c56ce77dc081
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Tue Nov 12 19:26:01 2013 +0530
description:
asm: assembly code for pixel_satd_12x16
Subject: [x265] TComYuv::copyToPicLuma, blockcopy_pp asm code integration
details: http://hg.videolan.org/x265/rev/04c28af13c4d
branches:
changeset: 5051:04c28af13c4d
user: Praveen Tiwari
date: Tue Nov 12 14:14:04 2013 +0530
description:
TComYuv::copyToPicLuma, blockcopy_pp asm code integration
Subject: [x265] TComYuv::copyFromPicLuma, blockcopy_pp luma asm code integration
details: http://hg.videolan.org/x265/rev/c56ea57ce3ab
branches:
changeset: 5052:c56ea57ce3ab
user: Praveen Tiwari
date: Tue Nov 12 17:07:14 2013 +0530
description:
TComYuv::copyFromPicLuma, blockcopy_pp luma asm code integration
Subject: [x265] TComYuv.cpp, use new blockcopy_pp luma primitives where feasible
details: http://hg.videolan.org/x265/rev/8708689dcca2
branches:
changeset: 5053:8708689dcca2
user: Praveen Tiwari
date: Tue Nov 12 17:41:21 2013 +0530
description:
TComYuv.cpp, use new blockcopy_pp luma primitives where feasible
Subject: [x265] TComYuv.cpp, use new luma_copy_ps asm primitives where feasible
details: http://hg.videolan.org/x265/rev/31528c277c64
branches:
changeset: 5054:31528c277c64
user: Praveen Tiwari
date: Tue Nov 12 17:58:13 2013 +0530
description:
TComYuv.cpp, use new luma_copy_ps asm primitives where feasible
Subject: [x265] asm: assembly code for x265_pixel_avg_12x16
details: http://hg.videolan.org/x265/rev/d0f80f375c3b
branches:
changeset: 5055:d0f80f375c3b
user: Min Chen <chenm003 at 163.com>
date: Tue Nov 12 19:27:06 2013 +0800
description:
asm: assembly code for x265_pixel_avg_12x16
Subject: [x265] Adding function pointer array and initializations for chroma vsp filter functions.
details: http://hg.videolan.org/x265/rev/e676cbd86238
branches:
changeset: 5056:e676cbd86238
user: Nabajit Deka
date: Tue Nov 12 16:07:05 2013 +0530
description:
Adding function pointer array and initializations for chroma vsp filter functions.
Subject: [x265] Adding test bench code for chroma vsp filter functions.
details: http://hg.videolan.org/x265/rev/ed8a6cd4d8ec
branches:
changeset: 5057:ed8a6cd4d8ec
user: Nabajit Deka
date: Tue Nov 12 16:16:14 2013 +0530
description:
Adding test bench code for chroma vsp filter functions.
Subject: [x265] asm: routines for chroma vsp filter functions for all block sizes.
details: http://hg.videolan.org/x265/rev/4844849073b7
branches:
changeset: 5058:4844849073b7
user: Nabajit Deka
date: Tue Nov 12 16:21:30 2013 +0530
description:
asm: routines for chroma vsp filter functions for all block sizes.
Subject: [x265] Adding asm function declarations for chroma vsp filter functions.
details: http://hg.videolan.org/x265/rev/8fe8d8f9f7cb
branches:
changeset: 5059:8fe8d8f9f7cb
user: Nabajit Deka
date: Tue Nov 12 16:23:13 2013 +0530
description:
Adding asm function declarations for chroma vsp filter functions.
Subject: [x265] Adding function pointer initializations for asm chroma vsp functions.
details: http://hg.videolan.org/x265/rev/028b911ae623
branches:
changeset: 5060:028b911ae623
user: Nabajit Deka
date: Tue Nov 12 16:25:54 2013 +0530
description:
Adding function pointer initializations for asm chroma vsp functions.
Subject: [x265] Adding function pointer array and initializations for chroma hps filter functions.
details: http://hg.videolan.org/x265/rev/8a8b967500e5
branches:
changeset: 5061:8a8b967500e5
user: Nabajit Deka
date: Tue Nov 12 17:30:35 2013 +0530
description:
Adding function pointer array and initializations for chroma hps filter functions.
Subject: [x265] Adding test bench code for chroma hps filter functions.
details: http://hg.videolan.org/x265/rev/e6d26209c45f
branches:
changeset: 5062:e6d26209c45f
user: Nabajit Deka
date: Tue Nov 12 17:34:19 2013 +0530
description:
Adding test bench code for chroma hps filter functions.
Subject: [x265] asm: routines for chroma hps filter functions for 2xN, 4xN, 6x8 and 12x16 block sizes.
details: http://hg.videolan.org/x265/rev/533bca3ec7e9
branches:
changeset: 5063:533bca3ec7e9
user: Nabajit Deka
date: Tue Nov 12 20:24:34 2013 +0530
description:
asm: routines for chroma hps filter functions for 2xN, 4xN, 6x8 and 12x16 block sizes.
Subject: [x265] Adding function pointer array and C primitive initializations for chroma vps filter functions.
details: http://hg.videolan.org/x265/rev/1ddacfd89112
branches:
changeset: 5064:1ddacfd89112
user: Nabajit Deka
date: Tue Nov 12 20:51:20 2013 +0530
description:
Adding function pointer array and C primitive initializations for chroma vps filter functions.
Subject: [x265] Adding test bench code for chroma vps filter functions.
details: http://hg.videolan.org/x265/rev/2185b81ae35b
branches:
changeset: 5065:2185b81ae35b
user: Nabajit Deka
date: Tue Nov 12 20:52:13 2013 +0530
description:
Adding test bench code for chroma vps filter functions.
Subject: [x265] Adding initialisation for ssd/sum values for lowress frame
details: http://hg.videolan.org/x265/rev/a19ba09c1fd7
branches:
changeset: 5066:a19ba09c1fd7
user: Shazeb Nawaz Khan <shazeb at multicorewareinc.com>
date: Tue Nov 12 17:06:03 2013 +0530
description:
Adding initialisation for ssd/sum values for lowress frame
Subject: [x265] Bug fix : In ipfilter for 10 bit yuv support
details: http://hg.videolan.org/x265/rev/90c2763ee027
branches:
changeset: 5067:90c2763ee027
user: sagarkotecha
date: Tue Nov 12 16:55:09 2013 +0530
description:
Bug fix : In ipfilter for 10 bit yuv support
diffstat:
source/Lib/TLibCommon/TComYuv.cpp | 19 +-
source/Lib/TLibEncoder/TEncSearch.cpp | 111 +++--
source/common/TShortYUV.cpp | 4 -
source/common/ipfilter.cpp | 11 +-
source/common/primitives.h | 3 +
source/common/x86/asm-primitives.cpp | 74 ++-
source/common/x86/ipfilter8.asm | 627 ++++++++++++++++++++++++++++++++++
source/common/x86/ipfilter8.h | 33 +
source/common/x86/mc-a.asm | 103 +++++-
source/common/x86/pixel-a.asm | 300 +++++++++++++--
source/common/x86/pixel.h | 11 +
source/encoder/compress.cpp | 2 +-
source/encoder/ratecontrol.cpp | 5 +
source/test/ipfilterharness.cpp | 108 +++++-
source/test/ipfilterharness.h | 2 +
15 files changed, 1268 insertions(+), 145 deletions(-)
diffs (truncated from 2113 to 300 lines):
diff -r 1ca01c82609f -r 90c2763ee027 source/Lib/TLibCommon/TComYuv.cpp
--- a/source/Lib/TLibCommon/TComYuv.cpp Mon Nov 11 15:46:00 2013 +0530
+++ b/source/Lib/TLibCommon/TComYuv.cpp Tue Nov 12 16:55:09 2013 +0530
@@ -111,13 +111,15 @@ void TComYuv::copyToPicLuma(TComPicYuv*
width = m_width >> partDepth;
height = m_height >> partDepth;
+ int part = partitionFromSizes(width, height);
+
Pel* src = getLumaAddr(partIdx, width);
Pel* dst = destPicYuv->getLumaAddr(cuAddr, absZOrderIdx);
uint32_t srcstride = getStride();
uint32_t dststride = destPicYuv->getStride();
- primitives.blockcpy_pp(width, height, dst, dststride, src, srcstride);
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
}
void TComYuv::copyToPicChroma(TComPicYuv* destPicYuv, uint32_t cuAddr, uint32_t absZOrderIdx, uint32_t partDepth, uint32_t partIdx)
@@ -153,7 +155,8 @@ void TComYuv::copyFromPicLuma(TComPicYuv
uint32_t dststride = getStride();
uint32_t srcstride = srcPicYuv->getStride();
- primitives.blockcpy_pp(m_width, m_height, dst, dststride, src, srcstride);
+ int part = partitionFromSizes(m_width, m_height);
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
}
void TComYuv::copyFromPicChroma(TComPicYuv* srcPicYuv, uint32_t cuAddr, uint32_t absZOrderIdx)
@@ -184,7 +187,8 @@ void TComYuv::copyToPartLuma(TComYuv* ds
uint32_t srcstride = getStride();
uint32_t dststride = dstPicYuv->getStride();
- primitives.blockcpy_pp(m_width, m_height, dst, dststride, src, srcstride);
+ int part = partitionFromSizes(m_width, m_height);
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
}
void TComYuv::copyToPartChroma(TComYuv* dstPicYuv, uint32_t uiDstPartIdx)
@@ -218,7 +222,8 @@ void TComYuv::copyPartToLuma(TComYuv* ds
uint32_t height = dstPicYuv->getHeight();
uint32_t width = dstPicYuv->getWidth();
- primitives.blockcpy_pp(width, height, dst, dststride, src, srcstride);
+ int part = partitionFromSizes(width, height);
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
}
void TComYuv::copyPartToChroma(TComYuv* dstPicYuv, uint32_t partIdx)
@@ -264,7 +269,8 @@ void TComYuv::copyPartToPartLuma(TComYuv
uint32_t srcstride = getStride();
uint32_t dststride = dstPicYuv->getStride();
- primitives.blockcpy_pp(width, height, dst, dststride, src, srcstride);
+ int part = partitionFromSizes(width, height);
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
}
void TComYuv::copyPartToPartLuma(TShortYUV* dstPicYuv, uint32_t partIdx, uint32_t width, uint32_t height)
@@ -275,7 +281,8 @@ void TComYuv::copyPartToPartLuma(TShortY
uint32_t srcstride = getStride();
uint32_t dststride = dstPicYuv->m_width;
- primitives.blockcpy_sp(width, height, dst, dststride, src, srcstride);
+ int part = partitionFromSizes(width, height);
+ primitives.luma_copy_ps[part](dst, dststride, src, srcstride);
}
void TComYuv::copyPartToPartChroma(TComYuv* dstPicYuv, uint32_t partIdx, uint32_t width, uint32_t height)
diff -r 1ca01c82609f -r 90c2763ee027 source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp Mon Nov 11 15:46:00 2013 +0530
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp Tue Nov 12 16:55:09 2013 +0530
@@ -436,7 +436,7 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
TCoeff* coeff = m_qtTempCoeffY[qtLayer] + numCoeffPerInc * absPartIdx;
int16_t* reconQt = m_qtTempTComYuv[qtLayer].getLumaAddr(absPartIdx);
- uint32_t reconQtStride = m_qtTempTComYuv[qtLayer].m_width;
+ assert(m_qtTempTComYuv[qtLayer].m_width == MAX_CU_SIZE);
uint32_t zorder = cu->getZorderIdxInCU() + absPartIdx;
Pel* reconIPred = cu->getPic()->getPicYuvRec()->getLumaAddr(cu->getAddr(), zorder);
@@ -502,7 +502,7 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
}
//===== reconstruction =====
- primitives.calcrecon[size](pred, residual, recon, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
+ primitives.calcrecon[size](pred, residual, recon, reconQt, reconIPred, stride, MAX_CU_SIZE, reconIPredStride);
//===== update distortion =====
outDist += primitives.sse_pp[part](fenc, stride, recon, stride);
@@ -548,7 +548,7 @@ void TEncSearch::xIntraCodingChromaBlk(T
uint32_t numCoeffPerInc = (cu->getSlice()->getSPS()->getMaxCUWidth() * cu->getSlice()->getSPS()->getMaxCUHeight() >> (cu->getSlice()->getSPS()->getMaxCUDepth() << 1)) >> 2;
TCoeff* coeff = (chromaId > 0 ? m_qtTempCoeffCr[qtlayer] : m_qtTempCoeffCb[qtlayer]) + numCoeffPerInc * absPartIdx;
int16_t* reconQt = (chromaId > 0 ? m_qtTempTComYuv[qtlayer].getCrAddr(absPartIdx) : m_qtTempTComYuv[qtlayer].getCbAddr(absPartIdx));
- uint32_t reconQtStride = m_qtTempTComYuv[qtlayer].m_cwidth;
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
uint32_t zorder = cu->getZorderIdxInCU() + absPartIdx;
Pel* reconIPred = (chromaId > 0 ? cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zorder) : cu->getPic()->getPicYuvRec()->getCbAddr(cu->getAddr(), zorder));
@@ -636,7 +636,7 @@ void TEncSearch::xIntraCodingChromaBlk(T
}
//===== reconstruction =====
- primitives.calcrecon[size](pred, residual, recon, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
+ primitives.calcrecon[size](pred, residual, recon, reconQt, reconIPred, stride, MAX_CU_SIZE / 2, reconIPredStride);
//===== update distortion =====
uint32_t dist = primitives.sse_pp[part](fenc, stride, recon, stride);
@@ -954,24 +954,24 @@ void TEncSearch::xRecurIntraCodingQT(TCo
uint32_t qtLayer = cu->getSlice()->getSPS()->getQuadtreeTULog2MaxSize() - trSizeLog2;
uint32_t zorder = cu->getZorderIdxInCU() + absPartIdx;
int16_t* src = m_qtTempTComYuv[qtLayer].getLumaAddr(absPartIdx);
- uint32_t srcstride = m_qtTempTComYuv[qtLayer].m_width;
+ assert(m_qtTempTComYuv[qtLayer].m_width == MAX_CU_SIZE);
Pel* dst = cu->getPic()->getPicYuvRec()->getLumaAddr(cu->getAddr(), zorder);
uint32_t dststride = cu->getPic()->getPicYuvRec()->getStride();
- primitives.blockcpy_ps(width, height, dst, dststride, src, srcstride);
+ primitives.blockcpy_ps(width, height, dst, dststride, src, MAX_CU_SIZE);
if (!bLumaOnly)
{
width >>= 1;
height >>= 1;
src = m_qtTempTComYuv[qtLayer].getCbAddr(absPartIdx);
- srcstride = m_qtTempTComYuv[qtLayer].m_cwidth;
+ assert(m_qtTempTComYuv[qtLayer].m_cwidth == MAX_CU_SIZE / 2);
dst = cu->getPic()->getPicYuvRec()->getCbAddr(cu->getAddr(), zorder);
dststride = cu->getPic()->getPicYuvRec()->getCStride();
- primitives.blockcpy_ps(width, height, dst, dststride, src, srcstride);
+ primitives.blockcpy_ps(width, height, dst, dststride, src, MAX_CU_SIZE / 2);
src = m_qtTempTComYuv[qtLayer].getCrAddr(absPartIdx);
dst = cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zorder);
- primitives.blockcpy_ps(width, height, dst, dststride, src, srcstride);
+ primitives.blockcpy_ps(width, height, dst, dststride, src, MAX_CU_SIZE / 2);
}
}
@@ -1134,10 +1134,10 @@ void TEncSearch::xLoadIntraResultQT(TCom
Pel* reconIPred = cu->getPic()->getPicYuvRec()->getLumaAddr(cu->getAddr(), zOrder);
uint32_t reconIPredStride = cu->getPic()->getPicYuvRec()->getStride();
int16_t* reconQt = m_qtTempTComYuv[qtlayer].getLumaAddr(absPartIdx);
- uint32_t reconQtStride = m_qtTempTComYuv[qtlayer].m_width;
+ assert(m_qtTempTComYuv[qtlayer].m_width == MAX_CU_SIZE);
uint32_t width = cu->getWidth(0) >> trDepth;
uint32_t height = cu->getHeight(0) >> trDepth;
- primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, reconQtStride);
+ primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, MAX_CU_SIZE);
if (!bLumaOnly && !bSkipChroma)
{
@@ -1146,12 +1146,12 @@ void TEncSearch::xLoadIntraResultQT(TCom
reconIPred = cu->getPic()->getPicYuvRec()->getCbAddr(cu->getAddr(), zOrder);
reconIPredStride = cu->getPic()->getPicYuvRec()->getCStride();
reconQt = m_qtTempTComYuv[qtlayer].getCbAddr(absPartIdx);
- reconQtStride = m_qtTempTComYuv[qtlayer].m_cwidth;
- primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, reconQtStride);
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
+ primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, MAX_CU_SIZE / 2);
reconIPred = cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zOrder);
reconQt = m_qtTempTComYuv[qtlayer].getCrAddr(absPartIdx);
- primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, reconQtStride);
+ primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, MAX_CU_SIZE / 2);
}
}
@@ -1255,20 +1255,20 @@ void TEncSearch::xLoadIntraResultChromaQ
uint32_t zorder = cu->getZorderIdxInCU() + absPartIdx;
uint32_t width = cu->getWidth(0) >> (trDepth + 1);
uint32_t height = cu->getHeight(0) >> (trDepth + 1);
- uint32_t reconQtStride = m_qtTempTComYuv[qtlayer].m_cwidth;
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
uint32_t reconIPredStride = cu->getPic()->getPicYuvRec()->getCStride();
if (stateU0V1Both2 == 0 || stateU0V1Both2 == 2)
{
Pel* reconIPred = cu->getPic()->getPicYuvRec()->getCbAddr(cu->getAddr(), zorder);
int16_t* reconQt = m_qtTempTComYuv[qtlayer].getCbAddr(absPartIdx);
- primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, reconQtStride);
+ primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, MAX_CU_SIZE / 2);
}
if (stateU0V1Both2 == 1 || stateU0V1Both2 == 2)
{
Pel* reconIPred = cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zorder);
int16_t* reconQt = m_qtTempTComYuv[qtlayer].getCrAddr(absPartIdx);
- primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, reconQtStride);
+ primitives.blockcpy_ps(width, height, reconIPred, reconIPredStride, reconQt, MAX_CU_SIZE / 2);
}
}
}
@@ -1809,11 +1809,17 @@ void TEncSearch::estIntraPredQT(TComData
dststride = cu->getPic()->getPicYuvRec()->getCStride();
src = reconYuv->getCbAddr(partOffset);
srcstride = reconYuv->getCStride();
- primitives.chroma_copy_pp[part](dst, dststride, src, srcstride);
+ if (bChromaSame)
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
+ else
+ primitives.chroma_copy_pp[part](dst, dststride, src, srcstride);
dst = cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zorder);
src = reconYuv->getCrAddr(partOffset);
- primitives.chroma_copy_pp[part](dst, dststride, src, srcstride);
+ if (bChromaSame)
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
+ else
+ primitives.chroma_copy_pp[part](dst, dststride, src, srcstride);
}
}
@@ -3182,10 +3188,10 @@ void TEncSearch::xEstimateResidualQT(TCo
int scalingListType = 3 + g_eTTable[(int)TEXT_LUMA];
assert(scalingListType < 6);
- m_trQuant->invtransformNxN(cu->getCUTransquantBypass(absPartIdx), REG_DCT, curResiY, m_qtTempTComYuv[qtlayer].m_width, coeffCurY, trWidth, trHeight, scalingListType, false, lastPosY); //this is for inter mode only
-
- const uint32_t nonZeroDistY = primitives.sse_ss[partSize](resiYuv->getLumaAddr(absTUPartIdx), resiYuv->m_width, m_qtTempTComYuv[qtlayer].getLumaAddr(absTUPartIdx),
- m_qtTempTComYuv[qtlayer].m_width);
+ assert(m_qtTempTComYuv[qtlayer].m_width == MAX_CU_SIZE);
+ m_trQuant->invtransformNxN(cu->getCUTransquantBypass(absPartIdx), REG_DCT, curResiY, MAX_CU_SIZE, coeffCurY, trWidth, trHeight, scalingListType, false, lastPosY); //this is for inter mode only
+
+ const uint32_t nonZeroDistY = primitives.sse_ss[partSize](resiYuv->getLumaAddr(absTUPartIdx), resiYuv->m_width, m_qtTempTComYuv[qtlayer].getLumaAddr(absTUPartIdx), MAX_CU_SIZE);
if (cu->isLosslessCoded(0))
{
distY = nonZeroDistY;
@@ -3227,10 +3233,10 @@ void TEncSearch::xEstimateResidualQT(TCo
if (!absSumY)
{
int16_t *ptr = m_qtTempTComYuv[qtlayer].getLumaAddr(absTUPartIdx);
- const uint32_t stride = m_qtTempTComYuv[qtlayer].m_width;
+ assert(m_qtTempTComYuv[qtlayer].m_width == MAX_CU_SIZE);
assert(trWidth == trHeight);
- primitives.blockfill_s[(int)g_convertToBit[trWidth]](ptr, stride, 0);
+ primitives.blockfill_s[(int)g_convertToBit[trWidth]](ptr, MAX_CU_SIZE, 0);
}
uint32_t distU = 0;
@@ -3254,11 +3260,12 @@ void TEncSearch::xEstimateResidualQT(TCo
int scalingListType = 3 + g_eTTable[(int)TEXT_CHROMA_U];
assert(scalingListType < 6);
- m_trQuant->invtransformNxN(cu->getCUTransquantBypass(absPartIdx), REG_DCT, pcResiCurrU, m_qtTempTComYuv[qtlayer].m_cwidth, coeffCurU, trWidthC, trHeightC, scalingListType, false, lastPosU);
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
+ m_trQuant->invtransformNxN(cu->getCUTransquantBypass(absPartIdx), REG_DCT, pcResiCurrU, MAX_CU_SIZE / 2, coeffCurU, trWidthC, trHeightC, scalingListType, false, lastPosU);
uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCbAddr(absTUPartIdxC), resiYuv->m_cwidth,
m_qtTempTComYuv[qtlayer].getCbAddr(absTUPartIdxC),
- m_qtTempTComYuv[qtlayer].m_cwidth);
+ MAX_CU_SIZE / 2);
const uint32_t nonZeroDistU = m_rdCost->scaleChromaDistCb(dist);
if (cu->isLosslessCoded(0))
@@ -3301,10 +3308,10 @@ void TEncSearch::xEstimateResidualQT(TCo
if (!absSumU)
{
int16_t *ptr = m_qtTempTComYuv[qtlayer].getCbAddr(absTUPartIdxC);
- const uint32_t stride = m_qtTempTComYuv[qtlayer].m_cwidth;
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
assert(trWidthC == trHeightC);
- primitives.blockfill_s[(int)g_convertToBit[trWidthC]](ptr, stride, 0);
+ primitives.blockfill_s[(int)g_convertToBit[trWidthC]](ptr, MAX_CU_SIZE / 2, 0);
}
distV = m_rdCost->scaleChromaDistCr(primitives.sse_sp[partSizeC](resiYuv->getCrAddr(absTUPartIdxC), resiYuv->m_cwidth, m_tempPel, trWidthC));
@@ -3320,11 +3327,12 @@ void TEncSearch::xEstimateResidualQT(TCo
int scalingListType = 3 + g_eTTable[(int)TEXT_CHROMA_V];
assert(scalingListType < 6);
- m_trQuant->invtransformNxN(cu->getCUTransquantBypass(absPartIdx), REG_DCT, curResiV, m_qtTempTComYuv[qtlayer].m_cwidth, coeffCurV, trWidthC, trHeightC, scalingListType, false, lastPosV);
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
+ m_trQuant->invtransformNxN(cu->getCUTransquantBypass(absPartIdx), REG_DCT, curResiV, MAX_CU_SIZE / 2, coeffCurV, trWidthC, trHeightC, scalingListType, false, lastPosV);
uint32_t dist = primitives.sse_ss[partSizeC](resiYuv->getCrAddr(absTUPartIdxC), resiYuv->m_cwidth,
m_qtTempTComYuv[qtlayer].getCrAddr(absTUPartIdxC),
- m_qtTempTComYuv[qtlayer].m_cwidth);
+ MAX_CU_SIZE / 2);
const uint32_t nonZeroDistV = m_rdCost->scaleChromaDistCr(dist);
if (cu->isLosslessCoded(0))
@@ -3367,10 +3375,10 @@ void TEncSearch::xEstimateResidualQT(TCo
if (!absSumV)
{
int16_t *ptr = m_qtTempTComYuv[qtlayer].getCrAddr(absTUPartIdxC);
- const uint32_t stride = m_qtTempTComYuv[qtlayer].m_cwidth;
+ assert(m_qtTempTComYuv[qtlayer].m_cwidth == MAX_CU_SIZE / 2);
assert(trWidthC == trHeightC);
- primitives.blockfill_s[(int)g_convertToBit[trWidthC]](ptr, stride, 0);
+ primitives.blockfill_s[(int)g_convertToBit[trWidthC]](ptr, MAX_CU_SIZE / 2, 0);
}
}
cu->setCbfSubParts(absSumY ? setCbf : 0, TEXT_LUMA, absPartIdx, depth);
@@ -3387,7 +3395,7 @@ void TEncSearch::xEstimateResidualQT(TCo
UInt64 singleCostY = MAX_INT64;
int16_t *curResiY = m_qtTempTComYuv[qtlayer].getLumaAddr(absTUPartIdx);
More information about the x265-commits
mailing list