[x265-commits] [x265] 16bpp primitives: disabling dct/idct/dst/idst primitives
Deepthi Nandakumar
deepthi at multicorewareinc.com
Tue Nov 12 04:01:01 CET 2013
details: http://hg.videolan.org/x265/rev/8ca334701a92
branches:
changeset: 4992:8ca334701a92
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Nov 11 14:34:27 2013 +0530
description:
16bpp primitives: disabling dct/idct/dst/idst primitives
Subject: [x265] Adding function pointer type & array definition for luma vsp filter functions.
details: http://hg.videolan.org/x265/rev/8d496292dd1d
branches:
changeset: 4993:8d496292dd1d
user: Nabajit Deka
date: Mon Nov 11 11:10:32 2013 +0530
description:
Adding function pointer type & array definition for luma vsp filter functions.
Subject: [x265] Adding C primitive for luma vsp filter functions.
details: http://hg.videolan.org/x265/rev/d2b3aefb522e
branches:
changeset: 4994:d2b3aefb522e
user: Nabajit Deka
date: Mon Nov 11 11:15:01 2013 +0530
description:
Adding C primitive for luma vsp filter functions.
Subject: [x265] Adding test bench code for luma vsp filter functions.
details: http://hg.videolan.org/x265/rev/51358e3422b7
branches:
changeset: 4995:51358e3422b7
user: Nabajit Deka
date: Mon Nov 11 11:20:09 2013 +0530
description:
Adding test bench code for luma vsp filter functions.
Subject: [x265] added blockcopy_ps c primitive and function pointes
details: http://hg.videolan.org/x265/rev/7f3164f16551
branches:
changeset: 4996:7f3164f16551
user: Praveen Tiwari
date: Mon Nov 11 11:41:51 2013 +0530
description:
added blockcopy_ps c primitive and function pointes
Subject: [x265] unit test code for block_copy_ps function
details: http://hg.videolan.org/x265/rev/eab2cd89e813
branches:
changeset: 4997:eab2cd89e813
user: Praveen Tiwari
date: Mon Nov 11 12:30:32 2013 +0530
description:
unit test code for block_copy_ps function
Subject: [x265] asm code for blockcopy_ps_8x2
details: http://hg.videolan.org/x265/rev/11b09a9fa32f
branches:
changeset: 4998:11b09a9fa32f
user: Praveen Tiwari
date: Mon Nov 11 13:07:57 2013 +0530
description:
asm code for blockcopy_ps_8x2
Subject: [x265] asm code for blockcopy_ps_8x4
details: http://hg.videolan.org/x265/rev/25300bdf7bbe
branches:
changeset: 4999:25300bdf7bbe
user: Praveen Tiwari
date: Mon Nov 11 13:35:11 2013 +0530
description:
asm code for blockcopy_ps_8x4
Subject: [x265] re-enable asm code for pixel_avg, the problem is miss EMMS
details: http://hg.videolan.org/x265/rev/a1577003ee96
branches:
changeset: 5000:a1577003ee96
user: Min Chen <chenm003 at 163.com>
date: Mon Nov 11 16:21:00 2013 +0800
description:
re-enable asm code for pixel_avg, the problem is miss EMMS
Subject: [x265] bugfix: PixelHarness::check_pixelavg_pp() output buffer did not initialize
details: http://hg.videolan.org/x265/rev/9642b5b6500b
branches:
changeset: 5001:9642b5b6500b
user: Min Chen <chenm003 at 163.com>
date: Mon Nov 11 17:41:32 2013 +0800
description:
bugfix: PixelHarness::check_pixelavg_pp() output buffer did not initialize
Subject: [x265] TEncCu: cleanup xComputeCostIntraInInter to use 32x32 logic for 64x64
details: http://hg.videolan.org/x265/rev/2e90d81098af
branches:
changeset: 5002:2e90d81098af
user: Mahesh Doijade <maheshdoijade at multicorewareinc.com>
date: Mon Nov 11 13:16:52 2013 +0530
description:
TEncCu: cleanup xComputeCostIntraInInter to use 32x32 logic for 64x64
Subject: [x265] compress: white-space nits
details: http://hg.videolan.org/x265/rev/c94d51359a5f
branches:
changeset: 5003:c94d51359a5f
user: Steve Borho <steve at borho.org>
date: Mon Nov 11 17:46:48 2013 -0600
description:
compress: white-space nits
Subject: [x265] asm code for blockcopy_ps_8x6
details: http://hg.videolan.org/x265/rev/1fbaef13feb7
branches:
changeset: 5004:1fbaef13feb7
user: Praveen Tiwari
date: Mon Nov 11 14:36:21 2013 +0530
description:
asm code for blockcopy_ps_8x6
Subject: [x265] asm code for blockcopy_ps, 8x6, 8x16 and 8x32
details: http://hg.videolan.org/x265/rev/7d74ee88f3fe
branches:
changeset: 5005:7d74ee88f3fe
user: Praveen Tiwari
date: Mon Nov 11 14:58:09 2013 +0530
description:
asm code for blockcopy_ps, 8x6, 8x16 and 8x32
Subject: [x265] asm code for blockcopy_ps_16x4
details: http://hg.videolan.org/x265/rev/cb378330b31b
branches:
changeset: 5006:cb378330b31b
user: Praveen Tiwari
date: Mon Nov 11 16:00:59 2013 +0530
description:
asm code for blockcopy_ps_16x4
Subject: [x265] asm code for asm code for blockcopy_ps,16x8, 16x12, 16x16, 16x32
details: http://hg.videolan.org/x265/rev/e5567a4eeec5
branches:
changeset: 5007:e5567a4eeec5
user: Praveen Tiwari
date: Mon Nov 11 16:29:44 2013 +0530
description:
asm code for asm code for blockcopy_ps,16x8, 16x12, 16x16, 16x32
Subject: [x265] eliminated register copy from BLOCKCOPY_PS_W16_H4 macro
details: http://hg.videolan.org/x265/rev/7a0afcd7c4c9
branches:
changeset: 5008:7a0afcd7c4c9
user: Praveen Tiwari
date: Mon Nov 11 16:44:45 2013 +0530
description:
eliminated register copy from BLOCKCOPY_PS_W16_H4 macro
Subject: [x265] blockcopy_ps_16x4, asm code is now sse4
details: http://hg.videolan.org/x265/rev/1365b796a75e
branches:
changeset: 5009:1365b796a75e
user: Praveen Tiwari
date: Mon Nov 11 16:54:27 2013 +0530
description:
blockcopy_ps_16x4, asm code is now sse4
Subject: [x265] asm code for blockcopy_ps_32xN
details: http://hg.videolan.org/x265/rev/badcc7920c91
branches:
changeset: 5010:badcc7920c91
user: Praveen Tiwari
date: Mon Nov 11 17:13:25 2013 +0530
description:
asm code for blockcopy_ps_32xN
Subject: [x265] asm code for blockcopy_ps_12x16
details: http://hg.videolan.org/x265/rev/c09ba17002c0
branches:
changeset: 5011:c09ba17002c0
user: Praveen Tiwari
date: Mon Nov 11 17:50:45 2013 +0530
description:
asm code for blockcopy_ps_12x16
Subject: [x265] asm code for blockcopy_ps_4x2
details: http://hg.videolan.org/x265/rev/4c45ee313c3c
branches:
changeset: 5012:4c45ee313c3c
user: Praveen Tiwari
date: Mon Nov 11 18:01:16 2013 +0530
description:
asm code for blockcopy_ps_4x2
Subject: [x265] asm code for blockcopy_ps_4x4
details: http://hg.videolan.org/x265/rev/953fe27840b6
branches:
changeset: 5013:953fe27840b6
user: Praveen Tiwari
date: Mon Nov 11 18:10:26 2013 +0530
description:
asm code for blockcopy_ps_4x4
Subject: [x265] asm code for blockcopy_ps_4x8
details: http://hg.videolan.org/x265/rev/332793211a8d
branches:
changeset: 5014:332793211a8d
user: Praveen Tiwari
date: Mon Nov 11 18:23:10 2013 +0530
description:
asm code for blockcopy_ps_4x8
Subject: [x265] asm code for blockcopy_ps_24x32
details: http://hg.videolan.org/x265/rev/c8e0d150b111
branches:
changeset: 5015:c8e0d150b111
user: Praveen Tiwari
date: Mon Nov 11 17:34:06 2013 +0530
description:
asm code for blockcopy_ps_24x32
Subject: [x265] asm code for blockcopy_ps_2x4
details: http://hg.videolan.org/x265/rev/cf089f73913d
branches:
changeset: 5016:cf089f73913d
user: Praveen Tiwari
date: Mon Nov 11 18:56:06 2013 +0530
description:
asm code for blockcopy_ps_2x4
Subject: [x265] asm code for blockcopy_ps_2x8
details: http://hg.videolan.org/x265/rev/c047d5898b59
branches:
changeset: 5017:c047d5898b59
user: Praveen Tiwari
date: Mon Nov 11 19:20:41 2013 +0530
description:
asm code for blockcopy_ps_2x8
Subject: [x265] asm code for blockcopy_ps_6x8
details: http://hg.videolan.org/x265/rev/b208adfaaba6
branches:
changeset: 5018:b208adfaaba6
user: Praveen Tiwari
date: Mon Nov 11 20:24:33 2013 +0530
description:
asm code for blockcopy_ps_6x8
Subject: [x265] added asm code blockcopy_ps_4x16 and invoked function pointer initialization with macro
details: http://hg.videolan.org/x265/rev/67fb80ee548a
branches:
changeset: 5019:67fb80ee548a
user: Praveen Tiwari
date: Mon Nov 11 20:35:55 2013 +0530
description:
added asm code blockcopy_ps_4x16 and invoked function pointer initialization with macro
Subject: [x265] added asm function for luma blockcopy_ps_16x64
details: http://hg.videolan.org/x265/rev/8e20f3c1dbb4
branches:
changeset: 5020:8e20f3c1dbb4
user: Praveen Tiwari
date: Mon Nov 11 20:50:50 2013 +0530
description:
added asm function for luma blockcopy_ps_16x64
Subject: [x265] asm code for luma blockcopy_ps_32x64
details: http://hg.videolan.org/x265/rev/15b705145e15
branches:
changeset: 5021:15b705145e15
user: Praveen Tiwari
date: Mon Nov 11 20:55:03 2013 +0530
description:
asm code for luma blockcopy_ps_32x64
Subject: [x265] asm code for luma blockcopy_ps_48x64
details: http://hg.videolan.org/x265/rev/c19168acd391
branches:
changeset: 5022:c19168acd391
user: Praveen Tiwari
date: Mon Nov 11 21:06:11 2013 +0530
description:
asm code for luma blockcopy_ps_48x64
Subject: [x265] asm code for blockcopy_ps_64xN
details: http://hg.videolan.org/x265/rev/ed32ed5a0785
branches:
changeset: 5023:ed32ed5a0785
user: Praveen Tiwari
date: Mon Nov 11 21:22:38 2013 +0530
description:
asm code for blockcopy_ps_64xN
Subject: [x265] added macro call for luma partition blockcopy_ps function
details: http://hg.videolan.org/x265/rev/18dd57c38254
branches:
changeset: 5024:18dd57c38254
user: Praveen Tiwari
date: Mon Nov 11 21:36:21 2013 +0530
description:
added macro call for luma partition blockcopy_ps function
Subject: [x265] asm: pixel_avg[32x16]
details: http://hg.videolan.org/x265/rev/79a452bec247
branches:
changeset: 5025:79a452bec247
user: Min Chen <chenm003 at 163.com>
date: Mon Nov 11 20:51:58 2013 +0800
description:
asm: pixel_avg[32x16]
Subject: [x265] use fixed stride/size on m_qtTempTComYuv, to reduce number of calcRecon() parameters
details: http://hg.videolan.org/x265/rev/0f9c6391fa19
branches:
changeset: 5026:0f9c6391fa19
user: Min Chen <chenm003 at 163.com>
date: Mon Nov 11 21:59:22 2013 +0800
description:
use fixed stride/size on m_qtTempTComYuv, to reduce number of calcRecon() parameters
Subject: [x265] asm: enabled pixel_avg_16x(64,32,12,4) assembly functions
details: http://hg.videolan.org/x265/rev/1990e66030d1
branches:
changeset: 5027:1990e66030d1
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Mon Nov 11 16:50:59 2013 +0530
description:
asm: enabled pixel_avg_16x(64,32,12,4) assembly functions
Subject: [x265] asm: assembly code for x265_pixel_satd_32x8
details: http://hg.videolan.org/x265/rev/da13148e7c6e
branches:
changeset: 5028:da13148e7c6e
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Mon Nov 11 17:01:26 2013 +0530
description:
asm: assembly code for x265_pixel_satd_32x8
Subject: [x265] asm: assembly code for x265_pixel_satd_32x16
details: http://hg.videolan.org/x265/rev/27b97bc50331
branches:
changeset: 5029:27b97bc50331
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Mon Nov 11 20:06:04 2013 +0530
description:
asm: assembly code for x265_pixel_satd_32x16
Subject: [x265] asm: routines for luma vsp filter functions for all block sizes.
details: http://hg.videolan.org/x265/rev/1eae34eb5995
branches:
changeset: 5030:1eae34eb5995
user: Nabajit Deka
date: Mon Nov 11 15:01:29 2013 +0530
description:
asm: routines for luma vsp filter functions for all block sizes.
Subject: [x265] Adding asm function declarations for luma vsp filter functions.
details: http://hg.videolan.org/x265/rev/937ac0c1bac4
branches:
changeset: 5031:937ac0c1bac4
user: Nabajit Deka
date: Mon Nov 11 15:14:31 2013 +0530
description:
Adding asm function declarations for luma vsp filter functions.
Subject: [x265] Adding function pointer initializations for luma vsp functions.
details: http://hg.videolan.org/x265/rev/d11de5be8e25
branches:
changeset: 5032:d11de5be8e25
user: Nabajit Deka
date: Mon Nov 11 15:15:46 2013 +0530
description:
Adding function pointer initializations for luma vsp functions.
Subject: [x265] asm: hookup luma_vsp primitive, drop asm and intrinsic non-block versions
details: http://hg.videolan.org/x265/rev/904b788b09e2
branches:
changeset: 5033:904b788b09e2
user: Steve Borho <steve at borho.org>
date: Mon Nov 11 19:15:32 2013 -0600
description:
asm: hookup luma_vsp primitive, drop asm and intrinsic non-block versions
Subject: [x265] asm: use new block copy primitives where feasible
details: http://hg.videolan.org/x265/rev/1c95568c7143
branches:
changeset: 5034:1c95568c7143
user: Steve Borho <steve at borho.org>
date: Mon Nov 11 19:35:16 2013 -0600
description:
asm: use new block copy primitives where feasible
Subject: [x265] TComYuv: de-hungarian nits
details: http://hg.videolan.org/x265/rev/d1d716083aa7
branches:
changeset: 5035:d1d716083aa7
user: Steve Borho <steve at borho.org>
date: Mon Nov 11 19:43:33 2013 -0600
description:
TComYuv: de-hungarian nits
Subject: [x265] no-rdo: cleanups. Remove unnecessary memsets, rearrange computations.
details: http://hg.videolan.org/x265/rev/1ca01c82609f
branches:
changeset: 5036:1ca01c82609f
user: Deepthi Devaki <deepthidevaki at multicorewareinc.com>
date: Mon Nov 11 15:46:00 2013 +0530
description:
no-rdo: cleanups. Remove unnecessary memsets, rearrange computations.
diffstat:
source/Lib/TLibCommon/TComPrediction.cpp | 2 +-
source/Lib/TLibCommon/TComYuv.h | 6 +-
source/Lib/TLibEncoder/TEncSearch.cpp | 70 +--
source/common/ipfilter.cpp | 45 +-
source/common/pixel.cpp | 22 +-
source/common/primitives.h | 6 +-
source/common/vec/dct-sse3.cpp | 6 +-
source/common/vec/dct-sse41.cpp | 2 +
source/common/vec/ipfilter-sse41.cpp | 1 -
source/common/x86/asm-primitives.cpp | 103 +++-
source/common/x86/blockcopy8.asm | 704 +++++++++++++++++++++++++++++++
source/common/x86/blockcopy8.h | 49 ++
source/common/x86/ipfilter8.asm | 285 +++++++-----
source/common/x86/ipfilter8.h | 34 +-
source/common/x86/mc-a.asm | 35 +-
source/common/x86/pixel-a.asm | 102 ++++
source/common/x86/pixel.h | 5 +
source/encoder/compress.cpp | 156 +++---
source/encoder/motion.cpp | 8 +-
source/encoder/ratecontrol.cpp | 2 +-
source/test/ipfilterharness.cpp | 46 ++
source/test/ipfilterharness.h | 1 +
source/test/pixelharness.cpp | 60 ++-
source/test/pixelharness.h | 1 +
24 files changed, 1445 insertions(+), 306 deletions(-)
diffs (truncated from 2413 to 300 lines):
diff -r 9d74638c3640 -r 1ca01c82609f source/Lib/TLibCommon/TComPrediction.cpp
--- a/source/Lib/TLibCommon/TComPrediction.cpp Sat Nov 09 20:14:24 2013 -0600
+++ b/source/Lib/TLibCommon/TComPrediction.cpp Mon Nov 11 15:46:00 2013 +0530
@@ -500,7 +500,7 @@ void TComPrediction::xPredInterLumaBlk(T
int filterSize = NTAPS_LUMA;
int halfFilterSize = (filterSize >> 1);
primitives.ipfilter_ps[FILTER_H_P_S_8](src - (halfFilterSize - 1) * srcStride, srcStride, m_immedVals, tmpStride, width, height + filterSize - 1, g_lumaFilter[xFrac]);
- primitives.ipfilter_sp[FILTER_V_S_P_8](m_immedVals + (halfFilterSize - 1) * tmpStride, tmpStride, dst, dstStride, width, height, yFrac);
+ primitives.luma_vsp[partEnum](m_immedVals + (halfFilterSize - 1) * tmpStride, tmpStride, dst, dstStride, yFrac);
}
}
diff -r 9d74638c3640 -r 1ca01c82609f source/Lib/TLibCommon/TComYuv.h
--- a/source/Lib/TLibCommon/TComYuv.h Sat Nov 09 20:14:24 2013 -0600
+++ b/source/Lib/TLibCommon/TComYuv.h Mon Nov 11 15:46:00 2013 +0530
@@ -129,9 +129,9 @@ public:
void copyToPartChroma(TComYuv* dstPicYuv, uint32_t uiDstPartIdx);
// Copy the part of Big YUV buffer to other Small YUV buffer
- void copyPartToYuv(TComYuv* dstPicYuv, uint32_t uiSrcPartIdx);
- void copyPartToLuma(TComYuv* dstPicYuv, uint32_t uiSrcPartIdx);
- void copyPartToChroma(TComYuv* dstPicYuv, uint32_t uiSrcPartIdx);
+ void copyPartToYuv(TComYuv* dstPicYuv, uint32_t srcPartIdx);
+ void copyPartToLuma(TComYuv* dstPicYuv, uint32_t srcPartIdx);
+ void copyPartToChroma(TComYuv* dstPicYuv, uint32_t srcPartIdx);
// Copy YUV partition buffer to other YUV partition buffer
void copyPartToPartYuv(TComYuv* dstPicYuv, uint32_t partIdx, uint32_t width, uint32_t height, bool bLuma = true, bool bChroma = true);
diff -r 9d74638c3640 -r 1ca01c82609f source/Lib/TLibEncoder/TEncSearch.cpp
--- a/source/Lib/TLibEncoder/TEncSearch.cpp Sat Nov 09 20:14:24 2013 -0600
+++ b/source/Lib/TLibEncoder/TEncSearch.cpp Mon Nov 11 15:46:00 2013 +0530
@@ -176,7 +176,7 @@ void TEncSearch::init(TEncCfg* cfg, TCom
m_qtTempCoeffCb[i] = new TCoeff[(g_maxCUWidth >> m_hChromaShift) * (g_maxCUHeight >> m_vChromaShift)];
m_qtTempCoeffCr[i] = new TCoeff[(g_maxCUWidth >> m_hChromaShift) * (g_maxCUHeight >> m_vChromaShift)];
- m_qtTempTComYuv[i].create(g_maxCUWidth, g_maxCUHeight, cfg->getColorFormat());
+ m_qtTempTComYuv[i].create(MAX_CU_SIZE, MAX_CU_SIZE, cfg->getColorFormat());
}
m_sharedPredTransformSkip[0] = new Pel[MAX_TS_WIDTH * MAX_TS_HEIGHT];
@@ -428,6 +428,7 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
Pel* pred = predYuv->getLumaAddr(absPartIdx);
int16_t* residual = resiYuv->getLumaAddr(absPartIdx);
Pel* recon = predYuv->getLumaAddr(absPartIdx);
+ int part = partitionFromSizes(width, height);
uint32_t trSizeLog2 = g_convertToBit[cu->getSlice()->getSPS()->getMaxCUWidth() >> fullDepth] + 2;
uint32_t qtLayer = cu->getSlice()->getSPS()->getQuadtreeTULog2MaxSize() - trSizeLog2;
@@ -453,12 +454,12 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
// save prediction
if (default0Save1Load2 == 1)
{
- primitives.blockcpy_pp(width, height, m_sharedPredTransformSkip[0], width, pred, stride);
+ primitives.luma_copy_pp[part](m_sharedPredTransformSkip[0], width, pred, stride);
}
}
else
{
- primitives.blockcpy_pp(width, height, pred, stride, m_sharedPredTransformSkip[0], width);
+ primitives.luma_copy_pp[part](pred, stride, m_sharedPredTransformSkip[0], width);
}
//===== get residual signal =====
@@ -504,7 +505,6 @@ void TEncSearch::xIntraCodingLumaBlk(TCo
primitives.calcrecon[size](pred, residual, recon, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
//===== update distortion =====
- int part = partitionFromSizes(width, height);
outDist += primitives.sse_pp[part](fenc, stride, recon, stride);
}
@@ -554,6 +554,7 @@ void TEncSearch::xIntraCodingChromaBlk(T
Pel* reconIPred = (chromaId > 0 ? cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zorder) : cu->getPic()->getPicYuvRec()->getCbAddr(cu->getAddr(), zorder));
uint32_t reconIPredStride = cu->getPic()->getPicYuvRec()->getCStride();
bool useTransformSkipChroma = cu->getTransformSkip(absPartIdx, ttype);
+ int part = partitionFromSizes(width, height);
//===== update chroma mode =====
if (chromaPredMode == DM_CHROMA_IDX)
@@ -576,14 +577,14 @@ void TEncSearch::xIntraCodingChromaBlk(T
if (default0Save1Load2 == 1)
{
Pel* predbuf = m_sharedPredTransformSkip[1 + chromaId];
- primitives.blockcpy_pp(width, height, predbuf, width, pred, stride);
+ primitives.luma_copy_pp[part](predbuf, width, pred, stride);
}
}
else
{
// load prediction
Pel* predbuf = m_sharedPredTransformSkip[1 + chromaId];
- primitives.blockcpy_pp(width, height, pred, stride, predbuf, width);
+ primitives.luma_copy_pp[part](pred, stride, predbuf, width);
}
//===== get residual signal =====
@@ -638,7 +639,6 @@ void TEncSearch::xIntraCodingChromaBlk(T
primitives.calcrecon[size](pred, residual, recon, reconQt, reconIPred, stride, reconQtStride, reconIPredStride);
//===== update distortion =====
- int part = partitionFromSizes(width, height);
uint32_t dist = primitives.sse_pp[part](fenc, stride, recon, stride);
if (ttype == TEXT_CHROMA_U)
{
@@ -1610,7 +1610,7 @@ void TEncSearch::estIntraPredQT(TComData
// Filtered and Unfiltered refAbove and refLeft pointing to above and left.
above = aboveScale;
left = leftScale;
- aboveFiltered = aboveScale;
+ aboveFiltered = aboveScale;
leftFiltered = leftScale;
}
@@ -1796,28 +1796,24 @@ void TEncSearch::estIntraPredQT(TComData
uint32_t compWidth = cu->getWidth(0) >> initTrDepth;
uint32_t compHeight = cu->getHeight(0) >> initTrDepth;
uint32_t zorder = cu->getZorderIdxInCU() + partOffset;
+ int part = partitionFromSizes(compWidth, compHeight);
Pel* dst = cu->getPic()->getPicYuvRec()->getLumaAddr(cu->getAddr(), zorder);
uint32_t dststride = cu->getPic()->getPicYuvRec()->getStride();
Pel* src = reconYuv->getLumaAddr(partOffset);
uint32_t srcstride = reconYuv->getStride();
- primitives.blockcpy_pp(compWidth, compHeight, dst, dststride, src, srcstride);
+ primitives.luma_copy_pp[part](dst, dststride, src, srcstride);
if (!bLumaOnly && !bSkipChroma)
{
- if (!bChromaSame)
- {
- compWidth >>= 1;
- compHeight >>= 1;
- }
dst = cu->getPic()->getPicYuvRec()->getCbAddr(cu->getAddr(), zorder);
dststride = cu->getPic()->getPicYuvRec()->getCStride();
src = reconYuv->getCbAddr(partOffset);
srcstride = reconYuv->getCStride();
- primitives.blockcpy_pp(compWidth, compHeight, dst, dststride, src, srcstride);
+ primitives.chroma_copy_pp[part](dst, dststride, src, srcstride);
dst = cu->getPic()->getPicYuvRec()->getCrAddr(cu->getAddr(), zorder);
src = reconYuv->getCrAddr(partOffset);
- primitives.blockcpy_pp(compWidth, compHeight, dst, dststride, src, srcstride);
+ primitives.chroma_copy_pp[part](dst, dststride, src, srcstride);
}
}
@@ -1851,7 +1847,7 @@ void TEncSearch::estIntraPredQT(TComData
m_rdGoOnSbacCoder->load(m_rdSbacCoders[depth][CI_CURR_BEST]);
//===== set distortion (rate and r-d costs are determined later) =====
- outDistC = overallDistC;
+ outDistC = overallDistC;
cu->m_totalDistortion = overallDistY + overallDistC;
}
@@ -2940,34 +2936,29 @@ void TEncSearch::estimateRDInterCU(TComD
if (zerocost < cost)
{
const uint32_t qpartnum = cu->getPic()->getNumPartInCU() >> (cu->getDepth(0) << 1);
- ::memset(cu->getTransformIdx(), 0, qpartnum * sizeof(UChar));
::memset(cu->getCbf(TEXT_LUMA), 0, qpartnum * sizeof(UChar));
::memset(cu->getCbf(TEXT_CHROMA_U), 0, qpartnum * sizeof(UChar));
::memset(cu->getCbf(TEXT_CHROMA_V), 0, qpartnum * sizeof(UChar));
- ::memset(cu->getCoeffY(), 0, width * height * sizeof(TCoeff));
- ::memset(cu->getCoeffCb(), 0, width * height * sizeof(TCoeff) >> 2);
- ::memset(cu->getCoeffCr(), 0, width * height * sizeof(TCoeff) >> 2);
- cu->setTransformSkipSubParts(0, 0, 0, 0, cu->getDepth(0));
if (cu->getMergeFlag(0) && cu->getPartitionSize(0) == SIZE_2Nx2N)
{
- cu->setSkipFlagSubParts(true, 0, cu->getDepth(0));
+ cu->getSkipFlag()[0] = true;
}
bits = zerobits;
- outBestResiYuv->clear();
generateRecon(cu, predYuv, outBestResiYuv, outReconYuv, true);
+ distortion = zerodistortion;
}
else
{
xSetResidualQTData(cu, 0, 0, outBestResiYuv, cu->getDepth(0), true);
generateRecon(cu, predYuv, outBestResiYuv, outReconYuv, false);
+
+ int part = partitionFromSizes(width, height);
+ distortion = primitives.sse_pp[part](fencYuv->getLumaAddr(), fencYuv->getStride(), outReconYuv->getLumaAddr(), outReconYuv->getStride());
+ part = partitionFromSizes(width >> 1, height >> 1);
+ distortion += m_rdCost->scaleChromaDistCb(primitives.sse_pp[part](fencYuv->getCbAddr(), fencYuv->getCStride(), outReconYuv->getCbAddr(), outReconYuv->getCStride()));
+ distortion += m_rdCost->scaleChromaDistCr(primitives.sse_pp[part](fencYuv->getCrAddr(), fencYuv->getCStride(), outReconYuv->getCrAddr(), outReconYuv->getCStride()));
}
- int part = partitionFromSizes(width, height);
- distortion = primitives.sse_pp[part](fencYuv->getLumaAddr(), fencYuv->getStride(), outReconYuv->getLumaAddr(), outReconYuv->getStride());
- part = partitionFromSizes(width >> 1, height >> 1);
- distortion += m_rdCost->scaleChromaDistCb(primitives.sse_pp[part](fencYuv->getCbAddr(), fencYuv->getCStride(), outReconYuv->getCbAddr(), outReconYuv->getCStride()));
- distortion += m_rdCost->scaleChromaDistCr(primitives.sse_pp[part](fencYuv->getCrAddr(), fencYuv->getCStride(), outReconYuv->getCrAddr(), outReconYuv->getCStride()));
-
cu->m_totalBits = bits;
cu->m_totalDistortion = distortion;
cu->m_totalCost = m_rdCost->calcRdCost(distortion, bits);
@@ -2975,25 +2966,13 @@ void TEncSearch::estimateRDInterCU(TComD
uint32_t TEncSearch::estimateZerobits(TComDataCU* cu)
{
- if (cu->isIntra(0))
- {
- return 0;
- }
-
uint32_t zeroResiBits = 0;
- uint32_t width = cu->getWidth(0);
- uint32_t height = cu->getHeight(0);
-
const uint32_t qpartnum = cu->getPic()->getNumPartInCU() >> (cu->getDepth(0) << 1);
- ::memset(cu->getTransformIdx(), 0, qpartnum * sizeof(UChar));
+
::memset(cu->getCbf(TEXT_LUMA), 0, qpartnum * sizeof(UChar));
::memset(cu->getCbf(TEXT_CHROMA_U), 0, qpartnum * sizeof(UChar));
::memset(cu->getCbf(TEXT_CHROMA_V), 0, qpartnum * sizeof(UChar));
- ::memset(cu->getCoeffY(), 0, width * height * sizeof(TCoeff));
- ::memset(cu->getCoeffCb(), 0, width * height * sizeof(TCoeff) >> 2);
- ::memset(cu->getCoeffCr(), 0, width * height * sizeof(TCoeff) >> 2);
- cu->setTransformSkipSubParts(0, 0, 0, 0, cu->getDepth(0));
m_rdGoOnSbacCoder->load(m_rdSbacCoders[cu->getDepth(0)][CI_CURR_BEST]);
zeroResiBits = xSymbolBitsInter(cu);
@@ -3035,11 +3014,6 @@ void TEncSearch::generateRecon(TComDataC
void TEncSearch::estimateBitsDist(TComDataCU* cu, TShortYUV* resiYuv, uint32_t& bits, uint32_t& distortion, bool curUseRDOQ)
{
- if (cu->isIntra(0))
- {
- return;
- }
-
bits = 0;
distortion = 0;
uint64_t cost = 0;
diff -r 9d74638c3640 -r 1ca01c82609f source/common/ipfilter.cpp
--- a/source/common/ipfilter.cpp Sat Nov 09 20:14:24 2013 -0600
+++ b/source/common/ipfilter.cpp Mon Nov 11 15:46:00 2013 +0530
@@ -425,6 +425,49 @@ void interp_vert_ps_c(pixel *src, intptr
}
}
+template<int N, int width, int height>
+void interp_vert_sp_c(int16_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx)
+{
+ int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
+ int shift = IF_FILTER_PREC + headRoom;
+ int offset = (1 << (shift - 1)) + (IF_INTERNAL_OFFS << IF_FILTER_PREC);
+ uint16_t maxVal = (1 << X265_DEPTH) - 1;
+ const int16_t *coeff = (N == 8 ? g_lumaFilter[coeffIdx] : g_chromaFilter[coeffIdx]);
+
+ src -= (N / 2 - 1) * srcStride;
+
+ int row, col;
+ for (row = 0; row < height; row++)
+ {
+ for (col = 0; col < width; col++)
+ {
+ int sum;
+
+ sum = src[col + 0 * srcStride] * coeff[0];
+ sum += src[col + 1 * srcStride] * coeff[1];
+ sum += src[col + 2 * srcStride] * coeff[2];
+ sum += src[col + 3 * srcStride] * coeff[3];
+ if (N == 8)
+ {
+ sum += src[col + 4 * srcStride] * coeff[4];
+ sum += src[col + 5 * srcStride] * coeff[5];
+ sum += src[col + 6 * srcStride] * coeff[6];
+ sum += src[col + 7 * srcStride] * coeff[7];
+ }
+
+ int16_t val = (int16_t)((sum + offset) >> shift);
+
+ val = (val < 0) ? 0 : val;
+ val = (val > maxVal) ? maxVal : val;
+
+ dst[col] = (pixel)val;
+ }
+
+ src += srcStride;
+ dst += dstStride;
+ }
+}
+
typedef void (*ipfilter_ps_t)(pixel *src, intptr_t srcStride, short *dst, intptr_t dstStride, int width, int height, const short *coeff);
typedef void (*ipfilter_sp_t)(short *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int width, int height, const short *coeff);
@@ -450,6 +493,7 @@ namespace x265 {
p.luma_hps[LUMA_ ## W ## x ## H] = interp_horiz_ps_c<8, W, H>;\
p.luma_vpp[LUMA_ ## W ## x ## H] = interp_vert_pp_c<8, W, H>; \
p.luma_vps[LUMA_ ## W ## x ## H] = interp_vert_ps_c<8, W, H>; \
+ p.luma_vsp[LUMA_ ## W ## x ## H] = interp_vert_sp_c<8, W, H>; \
p.luma_hvpp[LUMA_ ## W ## x ## H] = interp_hv_pp_c<8, W, H>;
void Setup_C_IPFilterPrimitives(EncoderPrimitives& p)
@@ -506,7 +550,6 @@ void Setup_C_IPFilterPrimitives(EncoderP
More information about the x265-commits
mailing list