[x265-commits] [x265] dct: modified block copy used in dct8 with convert16to32 ...
Yuvaraj Venkatesh
yuvaraj at multicorewareinc.com
Sat Oct 12 06:25:52 CEST 2013
details: http://hg.videolan.org/x265/rev/855757691efc
branches:
changeset: 4385:855757691efc
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Fri Oct 11 12:42:16 2013 +0530
description:
dct: modified block copy used in dct8 with convert16to32 inline function
Subject: [x265] dct: manually inline convert16to32, for 10% improvement
details: http://hg.videolan.org/x265/rev/ab9f6ad97d30
branches:
changeset: 4386:ab9f6ad97d30
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 02:31:33 2013 -0500
description:
dct: manually inline convert16to32, for 10% improvement
Subject: [x265] intra-sse3.cpp: Created common macros PRED_INTRA_ANGLE_4_START, PRED_INTRA_ANGLE_4_END for PredIntraAng4_[ANGLE] function.
details: http://hg.videolan.org/x265/rev/ee4f9ae07523
branches:
changeset: 4387:ee4f9ae07523
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 12:41:47 2013 +0530
description:
intra-sse3.cpp: Created common macros PRED_INTRA_ANGLE_4_START, PRED_INTRA_ANGLE_4_END for PredIntraAng4_[ANGLE] function.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_26 vector class function with intrinsic using intrinsic macros PRED_INTRA_ANGLE_4_START and PRED_INTRA_ANGLE_4_END.
details: http://hg.videolan.org/x265/rev/295973cbc020
branches:
changeset: 4388:295973cbc020
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 12:59:03 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_26 vector class function with intrinsic using intrinsic macros PRED_INTRA_ANGLE_4_START and PRED_INTRA_ANGLE_4_END.
Subject: [x265] asm: fix bug in filterHorizontal_p_p_4 with width less than 8 (seed 0x52578C72)
details: http://hg.videolan.org/x265/rev/953a4e9f3d57
branches:
changeset: 4389:953a4e9f3d57
user: Min Chen <chenm003 at 163.com>
date: Fri Oct 11 13:51:46 2013 +0800
description:
asm: fix bug in filterHorizontal_p_p_4 with width less than 8 (seed 0x52578C72)
Subject: [x265] asm: improvement filterHorizontal_p_p_4 by reorder intermedia data
details: http://hg.videolan.org/x265/rev/080a9fdada2c
branches:
changeset: 4390:080a9fdada2c
user: Min Chen <chenm003 at 163.com>
date: Fri Oct 11 14:51:58 2013 +0800
description:
asm: improvement filterHorizontal_p_p_4 by reorder intermedia data
1. repleace phaddw to paddw
2. use extra load operator to split data dependency and reduce table size
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_21 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/e9b401f5c655
branches:
changeset: 4391:e9b401f5c655
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 13:16:57 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_21 vector class function with intrinsic.
Subject: [x265] dct: Replaced partialButterfly16 vector class function to intrinsic
details: http://hg.videolan.org/x265/rev/f760de7f5596
branches:
changeset: 4392:f760de7f5596
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Fri Oct 11 14:09:28 2013 +0530
description:
dct: Replaced partialButterfly16 vector class function to intrinsic
Subject: [x265] dct: move dct8 to dct-sse41.cpp, inline convert16to32
details: http://hg.videolan.org/x265/rev/f0eebdf90a58
branches:
changeset: 4393:f0eebdf90a58
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 13:57:03 2013 -0500
description:
dct: move dct8 to dct-sse41.cpp, inline convert16to32
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_17 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/17c772394df3
branches:
changeset: 4394:17c772394df3
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 14:15:43 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_17 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_13 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/f3d0ced4a4f1
branches:
changeset: 4395:f3d0ced4a4f1
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 14:17:41 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_13 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_9 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/e65e3714bbb9
branches:
changeset: 4396:e65e3714bbb9
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 14:20:11 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_9 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_5 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/2b9f94e11cc5
branches:
changeset: 4397:2b9f94e11cc5
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 14:22:21 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_5 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_2 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/bd335e21744d
branches:
changeset: 4398:bd335e21744d
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 14:24:31 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_2 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_2 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/e4efd408f394
branches:
changeset: 4399:e4efd408f394
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 15:38:38 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_2 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_5 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/87a56e0ff6a9
branches:
changeset: 4400:87a56e0ff6a9
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 15:42:44 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_5 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_9 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/5c6f7106c918
branches:
changeset: 4401:5c6f7106c918
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 15:46:32 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_9 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_13 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/f1013117efab
branches:
changeset: 4402:f1013117efab
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 15:49:33 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_13 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_17 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/263acbde8ec1
branches:
changeset: 4403:263acbde8ec1
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 15:52:48 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_17 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_21 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/90b34ae5e8de
branches:
changeset: 4404:90b34ae5e8de
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 16:06:09 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_21 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_26 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/267fa83cd7b9
branches:
changeset: 4405:267fa83cd7b9
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 16:08:52 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_26 vector class function with intrinsic.
Subject: [x265] intra-sse3.cpp: Replace PredIntraAng4_m_32 vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/4824f15116e6
branches:
changeset: 4406:4824f15116e6
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 16:13:21 2013 +0530
description:
intra-sse3.cpp: Replace PredIntraAng4_m_32 vector class function with intrinsic.
Subject: [x265] dct: Replaced partialButterfly32 vector class function to intrinsic
details: http://hg.videolan.org/x265/rev/ca00db64f5bb
branches:
changeset: 4407:ca00db64f5bb
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Fri Oct 11 16:52:58 2013 +0530
description:
dct: Replaced partialButterfly32 vector class function to intrinsic
Subject: [x265] dct: move dct32 to dct-sse41.cpp, inline convert16to32
details: http://hg.videolan.org/x265/rev/def1551c14f0
branches:
changeset: 4408:def1551c14f0
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 14:23:14 2013 -0500
description:
dct: move dct32 to dct-sse41.cpp, inline convert16to32
Subject: [x265] pixel-sse3.cpp: Replace convert32to16_shr vector class function with intrinsic.
details: http://hg.videolan.org/x265/rev/efb230642757
branches:
changeset: 4409:efb230642757
user: Dnyaneshwar Gorade <dnyaneshwar at multicorewareinc.com>
date: Fri Oct 11 17:18:44 2013 +0530
description:
pixel-sse3.cpp: Replace convert32to16_shr vector class function with intrinsic.
Subject: [x265] pixel-sse3: move convert32to16_shr to top of file, remove vector class includes
details: http://hg.videolan.org/x265/rev/9f37e3d7818c
branches:
changeset: 4410:9f37e3d7818c
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 14:27:11 2013 -0500
description:
pixel-sse3: move convert32to16_shr to top of file, remove vector class includes
Subject: [x265] dct: Replaced inversedst vector class function to intrinsic
details: http://hg.videolan.org/x265/rev/df024b91ffd6
branches:
changeset: 4411:df024b91ffd6
user: Yuvaraj Venkatesh <yuvaraj at multicorewareinc.com>
date: Fri Oct 11 18:30:11 2013 +0530
description:
dct: Replaced inversedst vector class function to intrinsic
Subject: [x265] dct-sse3: remove idst4; it uses SSE4.1 but dct-sse41.cpp already has idst4
details: http://hg.videolan.org/x265/rev/839a9ba551e4
branches:
changeset: 4412:839a9ba551e4
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 14:38:41 2013 -0500
description:
dct-sse3: remove idst4; it uses SSE4.1 but dct-sse41.cpp already has idst4
Subject: [x265] dct-sse41: reorder functions for clarity - no code change
details: http://hg.videolan.org/x265/rev/d6dc4ebb5cbe
branches:
changeset: 4413:d6dc4ebb5cbe
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 14:41:28 2013 -0500
description:
dct-sse41: reorder functions for clarity - no code change
Subject: [x265] dct-sse3: don't compile dct4 for 16bpp builds when it is not used
details: http://hg.videolan.org/x265/rev/2267068cc7e1
branches:
changeset: 4414:2267068cc7e1
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 14:43:29 2013 -0500
description:
dct-sse3: don't compile dct4 for 16bpp builds when it is not used
Subject: [x265] dct-ssse3: remove vector class includes; dct files are now clean
details: http://hg.videolan.org/x265/rev/1cd3bc5e6881
branches:
changeset: 4415:1cd3bc5e6881
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 14:45:31 2013 -0500
description:
dct-ssse3: remove vector class includes; dct files are now clean
Subject: [x265] Some fixes in applyWeight() function
details: http://hg.videolan.org/x265/rev/b70432f7b275
branches:
changeset: 4416:b70432f7b275
user: Shazeb Nawaz Khan <shazeb at multicorewareinc.com>
date: Fri Oct 11 18:28:54 2013 +0530
description:
Some fixes in applyWeight() function
These wont fix the PSNR drop but are necessary
Subject: [x265] rc: added TEncCfg instance to RateControl to reuse all the rc params directly.
details: http://hg.videolan.org/x265/rev/ce889cef37be
branches:
changeset: 4417:ce889cef37be
user: Aarthi Thirumalai
date: Fri Oct 11 16:31:21 2013 +0530
description:
rc: added TEncCfg instance to RateControl to reuse all the rc params directly.
Subject: [x265] param: added rc states for setting Aq mode and Aq strength
details: http://hg.videolan.org/x265/rev/73d085da8533
branches:
changeset: 4418:73d085da8533
user: Aarthi Thirumalai
date: Fri Oct 11 16:37:59 2013 +0530
description:
param: added rc states for setting Aq mode and Aq strength
Subject: [x265] primitves: add c primitives for the following :
details: http://hg.videolan.org/x265/rev/725ac176cd13
branches:
changeset: 4419:725ac176cd13
user: Aarthi Thirumalai
date: Fri Oct 11 16:10:11 2013 +0530
description:
primitves: add c primitives for the following :
compute AC energy for each block
copy pixels of chroma plane
Subject: [x265] intra: prevent variable shadow warnings from GCC
details: http://hg.videolan.org/x265/rev/d97cf152f620
branches:
changeset: 4420:d97cf152f620
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 22:45:22 2013 -0500
description:
intra: prevent variable shadow warnings from GCC
Subject: [x265] blockcopy-sse3: consistent naming convention
details: http://hg.videolan.org/x265/rev/0be273b5f082
branches:
changeset: 4421:0be273b5f082
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 22:55:01 2013 -0500
description:
blockcopy-sse3: consistent naming convention
Subject: [x265] blockcopy-sse3: remove vector class use from last 16bpp intrinsic
details: http://hg.videolan.org/x265/rev/41b7ceea1e32
branches:
changeset: 4422:41b7ceea1e32
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 22:58:00 2013 -0500
description:
blockcopy-sse3: remove vector class use from last 16bpp intrinsic
blockcopy files are now vector class clean
Subject: [x265] blockcopy-sse3: consistent naming convention
details: http://hg.videolan.org/x265/rev/8518e39a2b74
branches:
changeset: 4423:8518e39a2b74
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 22:59:53 2013 -0500
description:
blockcopy-sse3: consistent naming convention
Subject: [x265] intra: remove vector class header include from intra-sse41.cpp
details: http://hg.videolan.org/x265/rev/f77efd501767
branches:
changeset: 4424:f77efd501767
user: Steve Borho <steve at borho.org>
date: Fri Oct 11 23:13:03 2013 -0500
description:
intra: remove vector class header include from intra-sse41.cpp
intra-sse3.cpp is the last file with 8bpp (non-AVX2) vector class primitives
diffstat:
source/common/common.cpp | 2 +
source/common/pixel.cpp | 31 +
source/common/primitives.h | 4 +
source/common/reference.cpp | 12 +-
source/common/vec/blockcopy-sse3.cpp | 90 ++--
source/common/vec/dct-sse3.cpp | 554 +---------------------------
source/common/vec/dct-sse41.cpp | 702 ++++++++++++++++++++++++++++++----
source/common/vec/dct-ssse3.cpp | 6 +-
source/common/vec/intra-sse3.cpp | 682 +++++++++++++++++++--------------
source/common/vec/intra-sse41.cpp | 5 +-
source/common/vec/pixel-sse3.cpp | 45 +-
source/common/x86/ipfilter8.asm | 50 +-
source/encoder/encoder.cpp | 2 +-
source/encoder/motion.cpp | 10 +-
source/encoder/ratecontrol.cpp | 27 +-
source/encoder/ratecontrol.h | 5 +-
source/x265.h | 2 +
17 files changed, 1162 insertions(+), 1067 deletions(-)
diffs (truncated from 2748 to 300 lines):
diff -r c6d89dc62e19 -r f77efd501767 source/common/common.cpp
--- a/source/common/common.cpp Fri Oct 11 01:47:53 2013 -0500
+++ b/source/common/common.cpp Fri Oct 11 23:13:03 2013 -0500
@@ -169,6 +169,8 @@ void x265_param_default(x265_param_t *pa
param->rc.qpStep = 4;
param->rc.rateControlMode = X265_RC_CQP;
param->rc.qp = 32;
+ param->rc.aqMode = 0;
+ param->rc.aqStrength = 1.0;
/* Quality Measurement Metrics */
param->bEnablePsnr = 1;
diff -r c6d89dc62e19 -r f77efd501767 source/common/pixel.cpp
--- a/source/common/pixel.cpp Fri Oct 11 01:47:53 2013 -0500
+++ b/source/common/pixel.cpp Fri Oct 11 23:13:03 2013 -0500
@@ -688,6 +688,33 @@ float ssim_end_4(ssim_t sum0[5][4], ssim
}
return ssim;
}
+
+template<int w, int h>
+uint64_t pixel_var(pixel *pix, intptr_t i_stride)
+{
+ uint32_t sum = 0, sqr = 0;
+ for (int y = 0; y < h; y++)
+ {
+ for (int x = 0; x < w; x++)
+ {
+ sum += pix[x];
+ sqr += pix[x] * pix[x];
+ }
+ pix += i_stride;
+ }
+ return sum + ((uint64_t)sqr << 32);
+}
+
+void plane_copy_deinterleave_chroma(pixel *dstu, intptr_t dstuStride, pixel *dstv, intptr_t dstvStride,
+ pixel *src, intptr_t srcStride, int w, int h)
+{
+ for (int y = 0; y < h; y++, dstu += dstuStride, dstv += dstvStride, src += srcStride)
+ for (int x = 0; x < w; x++)
+ {
+ dstu[x] = src[2 * x];
+ dstv[x] = src[2 * x + 1];
+ }
+}
} // end anonymous namespace
namespace x265 {
@@ -905,5 +932,9 @@ void Setup_C_PixelPrimitives(EncoderPrim
p.frame_init_lowres_core = frame_init_lowres_core;
p.ssim_4x4x2_core = ssim_4x4x2_core;
p.ssim_end_4 = ssim_end_4;
+
+ p.var[PARTITION_16x16] = pixel_var<16,16>;
+ p.var[PARTITION_8x8] = pixel_var<8,8>;
+ p.plane_copy_deinterleave_c = plane_copy_deinterleave_chroma;
}
}
diff -r c6d89dc62e19 -r f77efd501767 source/common/primitives.h
--- a/source/common/primitives.h Fri Oct 11 01:47:53 2013 -0500
+++ b/source/common/primitives.h Fri Oct 11 23:13:03 2013 -0500
@@ -202,6 +202,8 @@ typedef void (*downscale_t)(pixel *src0,
typedef void (*extendCURowBorder_t)(pixel* txt, intptr_t stride, int width, int height, int marginX);
typedef void (*ssim_4x4x2_core_t)(const pixel *pix1, intptr_t stride1, const pixel *pix2, intptr_t stride2, ssim_t sums[2][4]);
typedef float (*ssim_end4_t)(ssim_t sum0[5][4], ssim_t sum1[5][4], int width);
+typedef uint64_t (*var_t)(pixel *pix, intptr_t stride);
+typedef void (*plane_copy_deinterleave_t)(pixel *dstu, intptr_t dstuStride, pixel *dstv, intptr_t dstvStride, pixel *src, intptr_t srcStride, int w, int h);
/* Define a structure containing function pointers to optimized encoder
* primitives. Each pointer can reference either an assembly routine,
@@ -261,6 +263,8 @@ struct EncoderPrimitives
downscale_t frame_init_lowres_core;
ssim_4x4x2_core_t ssim_4x4x2_core;
ssim_end4_t ssim_end_4;
+ var_t var[NUM_PARTITIONS];
+ plane_copy_deinterleave_t plane_copy_deinterleave_c;
};
/* This copy of the table is what gets used by the encoder.
diff -r c6d89dc62e19 -r f77efd501767 source/common/reference.cpp
--- a/source/common/reference.cpp Fri Oct 11 01:47:53 2013 -0500
+++ b/source/common/reference.cpp Fri Oct 11 23:13:03 2013 -0500
@@ -92,7 +92,7 @@ MotionReference::~MotionReference()
void MotionReference::applyWeight(int rows, int numRows)
{
- rows = X265_MIN(rows, numRows-1);
+ rows = X265_MIN(rows, numRows);
if (m_numWeightedRows >= rows)
return;
int marginX = m_reconPic->m_lumaMarginX;
@@ -101,15 +101,15 @@ void MotionReference::applyWeight(int ro
pixel* dst = fpelPlane + ((m_numWeightedRows * (int)g_maxCUHeight) * lumaStride);
int width = m_reconPic->getWidth();
int height = ((rows - m_numWeightedRows) * g_maxCUHeight);
- if (rows == numRows - 1)
+ if (rows == numRows)
height = ((m_reconPic->getHeight() % g_maxCUHeight) ? (m_reconPic->getHeight() % g_maxCUHeight) : g_maxCUHeight);
size_t dstStride = lumaStride;
// Computing weighted CU rows
int shiftNum = IF_INTERNAL_PREC - X265_DEPTH;
- shift = shift + shiftNum;
- round = shift ? (1 << (shift - 1)) : 0;
- primitives.weightpUniPixel(src, dst, lumaStride, dstStride, width, height, weight, round, shift, offset);
+ int local_shift = shift + shiftNum;
+ int local_round = local_shift ? (1 << (local_shift - 1)) : 0;
+ primitives.weightpUniPixel(src, dst, lumaStride, dstStride, width, height, weight, local_round, local_shift, offset);
// Extending Left & Right
primitives.extendRowBorder(dst, dstStride, width, height, marginX);
@@ -125,7 +125,7 @@ void MotionReference::applyWeight(int ro
}
// Extending Bottom
- if (rows == (numRows - 1))
+ if (rows == numRows)
{
pixel *pixY = fpelPlane - marginX + (m_reconPic->getHeight() - 1) * dstStride;
for (int y = 0; y < marginY; y++)
diff -r c6d89dc62e19 -r f77efd501767 source/common/vec/blockcopy-sse3.cpp
--- a/source/common/vec/blockcopy-sse3.cpp Fri Oct 11 01:47:53 2013 -0500
+++ b/source/common/vec/blockcopy-sse3.cpp Fri Oct 11 23:13:03 2013 -0500
@@ -28,8 +28,37 @@
#include <cstring>
namespace {
-#if !HIGH_BIT_DEPTH
-void blockcopy_p_p(int bx, int by, pixel *dst, intptr_t dstride, pixel *src, intptr_t sstride)
+#if HIGH_BIT_DEPTH
+void blockcopy_pp(int bx, int by, pixel *dst, intptr_t dstride, pixel *src, intptr_t sstride)
+{
+ if ((bx & 7) || (((size_t)dst | (size_t)src | sstride | dstride) & 15))
+ {
+ // slow path, irregular memory alignments or sizes
+ for (int y = 0; y < by; y++)
+ {
+ memcpy(dst, src, bx * sizeof(pixel));
+ src += sstride;
+ dst += dstride;
+ }
+ }
+ else
+ {
+ // fast path, multiples of 8 pixel wide blocks
+ for (int y = 0; y < by; y++)
+ {
+ for (int x = 0; x < bx; x += 8)
+ {
+ __m128i word = _mm_load_si128((__m128i const*)(src + x));
+ _mm_store_si128((__m128i*)&dst[x], word);
+ }
+
+ src += sstride;
+ dst += dstride;
+ }
+ }
+}
+#else
+void blockcopy_pp(int bx, int by, pixel *dst, intptr_t dstride, pixel *src, intptr_t sstride)
{
size_t aligncheck = (size_t)dst | (size_t)src | bx | sstride | dstride;
@@ -60,7 +89,7 @@ void blockcopy_p_p(int bx, int by, pixel
}
}
-void blockcopy_p_s(int bx, int by, pixel *dst, intptr_t dstride, short *src, intptr_t sstride)
+void blockcopy_ps(int bx, int by, pixel *dst, intptr_t dstride, short *src, intptr_t sstride)
{
size_t aligncheck = (size_t)dst | (size_t)src | bx | sstride | dstride;
if (!(aligncheck & 15))
@@ -173,7 +202,7 @@ void pixeladd_pp(int bx, int by, pixel *
}
#endif /* if HIGH_BIT_DEPTH */
-void blockcopy_s_p(int bx, int by, short *dst, intptr_t dstride, uint8_t *src, intptr_t sstride)
+void blockcopy_sp(int bx, int by, short *dst, intptr_t dstride, uint8_t *src, intptr_t sstride)
{
size_t aligncheck = (size_t)dst | (size_t)src | bx | sstride | dstride;
if (!(aligncheck & 15))
@@ -339,64 +368,27 @@ void pixeladd_ss(int bx, int by, short *
}
}
-#define INSTRSET 3
-#include "vectorclass.h"
-
-namespace {
-#if HIGH_BIT_DEPTH
-void blockcopy_p_p(int bx, int by, pixel *dst, intptr_t dstride, pixel *src, intptr_t sstride)
-{
- if ((bx & 7) || (((size_t)dst | (size_t)src | sstride | dstride) & 15))
- {
- // slow path, irregular memory alignments or sizes
- for (int y = 0; y < by; y++)
- {
- memcpy(dst, src, bx * sizeof(pixel));
- src += sstride;
- dst += dstride;
- }
- }
- else
- {
- // fast path, multiples of 8 pixel wide blocks
- for (int y = 0; y < by; y++)
- {
- for (int x = 0; x < bx; x += 8)
- {
- Vec8s word;
- word.load_a(src + x);
- word.store_a(dst + x);
- }
-
- src += sstride;
- dst += dstride;
- }
- }
-}
-#endif
-}
-
namespace x265 {
void Setup_Vec_BlockCopyPrimitives_sse3(EncoderPrimitives &p)
{
#if HIGH_BIT_DEPTH
- p.blockcpy_pp = blockcopy_p_p;
- p.blockcpy_ps = (blockcpy_ps_t)blockcopy_p_p;
- p.blockcpy_sp = (blockcpy_sp_t)blockcopy_p_p;
+ p.blockcpy_pp = blockcopy_pp;
+ p.blockcpy_ps = (blockcpy_ps_t)blockcopy_pp;
+ p.blockcpy_sp = (blockcpy_sp_t)blockcopy_pp;
#else
p.pixeladd_pp = pixeladd_pp;
#endif
#if HIGH_BIT_DEPTH
// At high bit depth, a pixel is a short
- p.blockcpy_sc = (blockcpy_sc_t)blockcopy_s_p;
+ p.blockcpy_sc = (blockcpy_sc_t)blockcopy_sp;
p.pixeladd_pp = (pixeladd_pp_t)pixeladd_ss;
p.pixeladd_ss = pixeladd_ss;
#else
- p.blockcpy_pp = blockcopy_p_p;
- p.blockcpy_ps = blockcopy_p_s;
- p.blockcpy_sp = blockcopy_s_p;
- p.blockcpy_sc = blockcopy_s_p;
+ p.blockcpy_pp = blockcopy_pp;
+ p.blockcpy_ps = blockcopy_ps;
+ p.blockcpy_sp = blockcopy_sp;
+ p.blockcpy_sc = blockcopy_sp;
p.pixelsub_sp = pixelsub_sp;
p.pixeladd_ss = pixeladd_ss;
#endif
diff -r c6d89dc62e19 -r f77efd501767 source/common/vec/dct-sse3.cpp
--- a/source/common/vec/dct-sse3.cpp Fri Oct 11 01:47:53 2013 -0500
+++ b/source/common/vec/dct-sse3.cpp Fri Oct 11 23:13:03 2013 -0500
@@ -40,6 +40,7 @@
using namespace x265;
namespace {
+#if !HIGH_BIT_DEPTH
ALIGN_VAR_32(static const short, tab_dct_4[][8]) =
{
{ 64, 64, 64, 64, 64, 64, 64, 64 },
@@ -120,6 +121,7 @@ void dct4(short *src, int *dst, intptr_t
_mm_storeu_si128((__m128i*)&dst[2 * 4], T72);
_mm_storeu_si128((__m128i*)&dst[3 * 4], T73);
}
+#endif
ALIGN_VAR_32(static const short, tab_idct_4x4[4][8]) =
{
@@ -1730,565 +1732,13 @@ void idct32(int *src, short *dst, intptr
}
}
-
-/* Vector class primitives */
-#define INSTRSET 3
-#include "vectorclass.h"
-namespace {
-inline void partialButterfly16(short *src, short *dst, int shift, int line)
-{
- int j;
- int add = 1 << (shift - 1);
-
- Vec4i zero_row(64, 64, 0, 0);
- Vec4i four_row(83, 36, 0, 0);
- Vec4i eight_row(64, -64, 0, 0);
- Vec4i twelve_row(36, -83, 0, 0);
-
- Vec4i two_row(89, 75, 50, 18);
- Vec4i six_row(75, -18, -89, -50);
- Vec4i ten_row(50, -89, 18, 75);
- Vec4i fourteen_row(18, -50, 75, -89);
-
- Vec4i one_row_first_half(90, 87, 80, 70);
- Vec4i one_row_second_half(57, 43, 25, 9);
More information about the x265-commits
mailing list