[x265-commits] [x265] doc: fix formatting of code sample
Steve Borho
steve at borho.org
Wed May 20 18:52:29 CEST 2015
details: http://hg.videolan.org/x265/rev/9b31a8a7bd57
branches:
changeset: 10487:9b31a8a7bd57
user: Steve Borho <steve at borho.org>
date: Tue May 19 19:51:56 2015 -0500
description:
doc: fix formatting of code sample
Subject: [x265] asm: avx2 code for sad_x4[48x64] (33937 -> 15279) for 10 bpp
details: http://hg.videolan.org/x265/rev/384d01eb7142
branches:
changeset: 10488:384d01eb7142
user: Sumalatha Polureddy
date: Wed May 20 11:05:15 2015 +0530
description:
asm: avx2 code for sad_x4[48x64] (33937 -> 15279) for 10 bpp
sse2
sad_x4[48x64] 2.55x 33937.88 86421.41
avx2
sad_x4[48x64] 5.67x 15279.31 86572.20
Subject: [x265] param: tune grain disables rdoq-level. This provides better visual quality results
details: http://hg.videolan.org/x265/rev/6fd44bfcb696
branches:
changeset: 10489:6fd44bfcb696
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed May 20 12:08:05 2015 +0530
description:
param: tune grain disables rdoq-level. This provides better visual quality results
Subject: [x265] asm: removed some duplicate constants in intrapred16.asm 16bpp
details: http://hg.videolan.org/x265/rev/98279c718374
branches:
changeset: 10490:98279c718374
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Wed May 20 12:49:59 2015 +0530
description:
asm: removed some duplicate constants in intrapred16.asm 16bpp
also, renamed pw_planar4_1, pw_planar8_1 & pw_planar32_1 to pw_3, pw_7 & pd_31 resp. & moved into comman const-a.asm file
Subject: [x265] asm: removed duplicate constants in intrapred8.asm 8bpp, these constants are already defined into const-a.asm
details: http://hg.videolan.org/x265/rev/27f6dd7d3aca
branches:
changeset: 10491:27f6dd7d3aca
user: Dnyaneshwar G <dnyaneshwar at multicorewareinc.com>
date: Wed May 20 12:52:40 2015 +0530
description:
asm: removed duplicate constants in intrapred8.asm 8bpp, these constants are already defined into const-a.asm
Subject: [x265] asm: avx2 10bit code for luma_hpp[4xN]
details: http://hg.videolan.org/x265/rev/f1493e1c6edf
branches:
changeset: 10492:f1493e1c6edf
user: Rajesh Paulraj<rajesh at multicorewareinc.com>
date: Wed May 20 14:06:53 2015 +0530
description:
asm: avx2 10bit code for luma_hpp[4xN]
avx2:
luma_hpp[ 4x4] 4.59x 423.90 1944.66
luma_hpp[ 4x8] 4.74x 803.53 3806.63
luma_hpp[ 4x16] 4.73x 1574.01 7442.57
sse4:
luma_hpp[ 4x4] 3.69x 527.97 1946.47
luma_hpp[ 4x8] 3.93x 961.48 3780.20
luma_hpp[ 4x16] 4.06x 1833.63 7445.62
Subject: [x265] asm: filter_vpp, filter_vps for 64xN in avx2
details: http://hg.videolan.org/x265/rev/cf3396fa2220
branches:
changeset: 10493:cf3396fa2220
user: Divya Manivannan <divya at multicorewareinc.com>
date: Thu May 14 11:36:52 2015 +0530
description:
asm: filter_vpp, filter_vps for 64xN in avx2
filter_vpp[64x64, 64x48, 64x32, 64x16]: 15007c->7349c, 20465c->5519c, 7448c->3752c, 3705c->1917c
filter_vps[64x64, 64x48, 64x32, 64x16]: 15449c->9899c, 11674c->7483c, 7568c->4892c, 3892c->2483c
Subject: [x265] analysis: re-order RD 0/4 analysis to do splits before ME or intra
details: http://hg.videolan.org/x265/rev/61a6bc52debf
branches:
changeset: 10494:61a6bc52debf
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 12:46:18 2015 +0530
description:
analysis: re-order RD 0/4 analysis to do splits before ME or intra
Subject: [x265] analysis: at RD 0/4 avoid motion references if not used by split blocks
details: http://hg.videolan.org/x265/rev/b3ddacfe1e35
branches:
changeset: 10495:b3ddacfe1e35
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:18:54 2015 +0530
description:
analysis: at RD 0/4 avoid motion references if not used by split blocks
Subject: [x265] stats: profile effectiveness of reference limit masks
details: http://hg.videolan.org/x265/rev/937b2a26dc1f
branches:
changeset: 10496:937b2a26dc1f
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:20:54 2015 +0530
description:
stats: profile effectiveness of reference limit masks
Subject: [x265] analysis: skip intra in RD 0/4 if split was analyzed and no split CUs used intra
details: http://hg.videolan.org/x265/rev/ab01c9c7c6fd
branches:
changeset: 10497:ab01c9c7c6fd
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:22:19 2015 +0530
description:
analysis: skip intra in RD 0/4 if split was analyzed and no split CUs used intra
Subject: [x265] stats: RD 0/4 profile effectiveness of avoiding intra if split CUs did not select it
details: http://hg.videolan.org/x265/rev/04fee4b299f6
branches:
changeset: 10498:04fee4b299f6
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:24:42 2015 +0530
description:
stats: RD 0/4 profile effectiveness of avoiding intra if split CUs did not select it
Subject: [x265] analysis: respect X265_REF_LIMIT_DEPTH with RD 0/4
details: http://hg.videolan.org/x265/rev/7a00289539c0
branches:
changeset: 10499:7a00289539c0
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:31:59 2015 +0530
description:
analysis: respect X265_REF_LIMIT_DEPTH with RD 0/4
When this flag is not set, we do not restrict references used by parent CUs
Subject: [x265] cli: connect --limit-refs to param.limitReferences
details: http://hg.videolan.org/x265/rev/3567484c8607
branches:
changeset: 10500:3567484c8607
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:34:33 2015 +0530
description:
cli: connect --limit-refs to param.limitReferences
Subject: [x265] stats: with the CU reference limit, even 8x8 can have skipped motion searches
details: http://hg.videolan.org/x265/rev/79293e0515e0
branches:
changeset: 10501:79293e0515e0
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:36:26 2015 +0530
description:
stats: with the CU reference limit, even 8x8 can have skipped motion searches
Subject: [x265] analysis: model the effectiveness of --limit-ref with RD 0/4
details: http://hg.videolan.org/x265/rev/899c9d889e79
branches:
changeset: 10502:899c9d889e79
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Mon May 18 13:26:33 2015 +0530
description:
analysis: model the effectiveness of --limit-ref with RD 0/4
Subject: [x265] analysis: re-order cost calculation for early-outs
details: http://hg.videolan.org/x265/rev/aba0ec72510c
branches:
changeset: 10503:aba0ec72510c
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed May 20 16:10:42 2015 +0530
description:
analysis: re-order cost calculation for early-outs
Subject: [x265] docs: document --limit-refs
details: http://hg.videolan.org/x265/rev/a28531c13d95
branches:
changeset: 10504:a28531c13d95
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Mar 16 20:19:33 2015 -0500
description:
docs: document --limit-refs
This option is currently available only for rdLevels 0-4. It will be enhanced to
rdLevels 5,6 pretty shortly.
Subject: [x265] cli: delay calling showHelp until a param is allocated and defaulted
details: http://hg.videolan.org/x265/rev/b30f39f374f1
branches:
changeset: 10505:b30f39f374f1
user: Steve Borho <steve at borho.org>
date: Wed May 20 10:29:09 2015 -0500
description:
cli: delay calling showHelp until a param is allocated and defaulted
Removes an old 'help' variable that was initialized to zero but never set
Subject: [x265] asm: interp_4tap_vert_pX_4xN sse2
details: http://hg.videolan.org/x265/rev/35dd4bea0bc7
branches:
changeset: 10506:35dd4bea0bc7
user: David T Yuen <dtyx265 at gmail.com>
date: Tue May 19 18:29:06 2015 -0700
description:
asm: interp_4tap_vert_pX_4xN sse2
Improved register usage for addressing of output. This improvement helps 64-bit .7% to 2.5%.
Also added interp_4tap_vert_ps_4x32 in primitives setup.
diffstat:
doc/reST/api.rst | 2 +-
doc/reST/cli.rst | 24 ++
source/common/param.cpp | 11 +-
source/common/x86/asm-primitives.cpp | 13 +
source/common/x86/const-a.asm | 3 +
source/common/x86/intrapred16.asm | 131 +++++++--------
source/common/x86/intrapred8.asm | 101 ++++-------
source/common/x86/ipfilter16.asm | 78 +++++++++
source/common/x86/ipfilter8.asm | 167 +++++++++++++++++++-
source/common/x86/sad16-a.asm | 1 +
source/encoder/analysis.cpp | 281 ++++++++++++++++++++++++----------
source/encoder/analysis.h | 4 +-
source/encoder/encoder.cpp | 12 +
source/encoder/entropy.cpp | 2 -
source/encoder/entropy.h | 2 +
source/encoder/search.cpp | 14 +-
source/encoder/search.h | 11 +-
source/x265.cpp | 11 +-
source/x265cli.h | 6 +-
19 files changed, 631 insertions(+), 243 deletions(-)
diffs (truncated from 1819 to 300 lines):
diff -r 58309953273e -r 35dd4bea0bc7 doc/reST/api.rst
--- a/doc/reST/api.rst Tue May 19 17:04:04 2015 +0530
+++ b/doc/reST/api.rst Tue May 19 18:29:06 2015 -0700
@@ -455,7 +455,7 @@ it was compiled against.
A number of validations must be performed on the returned API structure
in order to determine if it is safe for use by your application. If you
-do not perform these checks, your application is liable to crash.
+do not perform these checks, your application is liable to crash::
if (api->api_major_version != X265_MAJOR_VERSION) /* do not use */
if (api->sizeof_param != sizeof(x265_param)) /* do not use */
diff -r 58309953273e -r 35dd4bea0bc7 doc/reST/cli.rst
--- a/doc/reST/cli.rst Tue May 19 17:04:04 2015 +0530
+++ b/doc/reST/cli.rst Tue May 19 18:29:06 2015 -0700
@@ -581,6 +581,30 @@ the prediction quad-tree.
be consistent for all of them since the encoder configures several
key global data structures based on this range.
+.. option:: --limit-refs <0|1|2|3>
+
+ When set to X265_REF_LIMIT_DEPTH (1) x265 will limit the references
+ analyzed at the current depth based on the references used to code
+ the 4 sub-blocks at the next depth. For example, a 16x16 CU will
+ only use the references used to code its four 8x8 CUs.
+
+ When set to X265_REF_LIMIT_CU (2), the rectangular and asymmetrical
+ partitions will only use references selected by the 2Nx2N motion
+ search (including at the lowest depth which is otherwise unaffected
+ by the depth limit).
+
+ When set to 3 (X265_REF_LIMIT_DEPTH && X265_REF_LIMIT_CU), the 2Nx2N
+ motion search at each depth will only use references from the split
+ CUs and the rect/amp motion searches at that depth will only use the
+ reference(s) selected by 2Nx2N.
+
+ You can often increase the number of references you are using
+ (within your decoder level limits) if you enable one or
+ both of these flags.
+
+ This feature is EXPERIMENTAL and currently only functional at RD
+ levels 0 through 4
+
.. option:: --rect, --no-rect
Enable analysis of rectangular motion partitions Nx2N and 2NxN
diff -r 58309953273e -r 35dd4bea0bc7 source/common/param.cpp
--- a/source/common/param.cpp Tue May 19 17:04:04 2015 +0530
+++ b/source/common/param.cpp Tue May 19 18:29:06 2015 -0700
@@ -151,6 +151,7 @@ void x265_param_default(x265_param* para
param->subpelRefine = 2;
param->searchRange = 57;
param->maxNumMergeCand = 2;
+ param->limitReferences = 0;
param->bEnableWeightedPred = 1;
param->bEnableWeightedBiPred = 0;
param->bEnableEarlySkip = 0;
@@ -430,8 +431,8 @@ int x265_param_default_preset(x265_param
param->deblockingFilterBetaOffset = -2;
param->deblockingFilterTCOffset = -2;
param->bIntraInBFrames = 0;
- param->rdoqLevel = 1;
- param->psyRdoq = 30;
+ param->rdoqLevel = 0;
+ param->psyRdoq = 0;
param->psyRd = 0.5;
param->rc.ipFactor = 1.1;
param->rc.pbFactor = 1.1;
@@ -641,6 +642,7 @@ int x265_param_parse(x265_param* p, cons
}
}
OPT("ref") p->maxNumReferences = atoi(value);
+ OPT("limit-refs") p->limitReferences = atoi(value);
OPT("weightp") p->bEnableWeightedPred = atobool(value);
OPT("weightb") p->bEnableWeightedBiPred = atobool(value);
OPT("cbqpoffs") p->cbQpOffset = atoi(value);
@@ -1026,6 +1028,8 @@ int x265_check_params(x265_param* param)
"subme must be less than or equal to X265_MAX_SUBPEL_LEVEL (7)");
CHECK(param->subpelRefine < 0,
"subme must be greater than or equal to 0");
+ CHECK(param->limitReferences > 3,
+ "limitReferences must be 0, 1, 2 or 3");
CHECK(param->frameNumThreads < 0 || param->frameNumThreads > X265_MAX_FRAME_THREADS,
"frameNumThreads (--frame-threads) must be [0 .. X265_MAX_FRAME_THREADS)");
CHECK(param->cbQpOffset < -12, "Min. Chroma Cb QP Offset is -12");
@@ -1277,6 +1281,8 @@ void x265_print_params(x265_param* param
if (param->rc.aqMode)
x265_log(param, X265_LOG_INFO, "AQ: mode / str / qg-size / cu-tree : %d / %0.1f / %d / %d\n", param->rc.aqMode,
param->rc.aqStrength, param->rc.qgSize, param->rc.cuTree);
+ x265_log(param, X265_LOG_INFO, "References / ref-limit cu / depth : %d / %d / %d\n",
+ param->maxNumReferences, !!(param->limitReferences & X265_REF_LIMIT_CU), !!(param->limitReferences & X265_REF_LIMIT_DEPTH));
if (param->bLossless)
x265_log(param, X265_LOG_INFO, "Rate Control : Lossless\n");
@@ -1420,6 +1426,7 @@ char *x265_param2string(x265_param* p)
s += sprintf(s, " bframe-bias=%d", p->bFrameBias);
s += sprintf(s, " b-adapt=%d", p->bFrameAdaptive);
s += sprintf(s, " ref=%d", p->maxNumReferences);
+ s += sprintf(s, " limit-refs=%d", p->limitReferences);
BOOL(p->bEnableWeightedPred, "weightp");
BOOL(p->bEnableWeightedBiPred, "weightb");
s += sprintf(s, " aq-mode=%d", p->rc.aqMode);
diff -r 58309953273e -r 35dd4bea0bc7 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Tue May 19 17:04:04 2015 +0530
+++ b/source/common/x86/asm-primitives.cpp Tue May 19 18:29:06 2015 -0700
@@ -1359,6 +1359,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_32x24].sad_x4 = x265_pixel_sad_x4_32x24_avx2;
p.pu[LUMA_32x32].sad_x4 = x265_pixel_sad_x4_32x32_avx2;
p.pu[LUMA_32x64].sad_x4 = x265_pixel_sad_x4_32x64_avx2;
+ p.pu[LUMA_48x64].sad_x4 = x265_pixel_sad_x4_48x64_avx2;
p.pu[LUMA_64x16].sad_x4 = x265_pixel_sad_x4_64x16_avx2;
p.pu[LUMA_64x32].sad_x4 = x265_pixel_sad_x4_64x32_avx2;
p.pu[LUMA_64x48].sad_x4 = x265_pixel_sad_x4_64x48_avx2;
@@ -1407,6 +1408,9 @@ void setupAssemblyPrimitives(EncoderPrim
p.pu[LUMA_4x8].luma_hps = x265_interp_8tap_horiz_ps_4x8_avx2;
p.pu[LUMA_4x16].luma_hps = x265_interp_8tap_horiz_ps_4x16_avx2;
+ p.pu[LUMA_4x4].luma_hpp = x265_interp_8tap_horiz_pp_4x4_avx2;
+ p.pu[LUMA_4x8].luma_hpp = x265_interp_8tap_horiz_pp_4x8_avx2;
+ p.pu[LUMA_4x16].luma_hpp = x265_interp_8tap_horiz_pp_4x16_avx2;
p.pu[LUMA_8x4].luma_hpp = x265_interp_8tap_horiz_pp_8x4_avx2;
p.pu[LUMA_8x8].luma_hpp = x265_interp_8tap_horiz_pp_8x8_avx2;
p.pu[LUMA_8x16].luma_hpp = x265_interp_8tap_horiz_pp_8x16_avx2;
@@ -1524,6 +1528,7 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_sse2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_sse2;
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_sse2;
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].filter_vps = x265_interp_4tap_vert_ps_4x32_sse2;
p.chroma[X265_CSP_I444].pu[LUMA_4x4].filter_vps = x265_interp_4tap_vert_ps_4x4_sse2;
p.chroma[X265_CSP_I444].pu[LUMA_4x8].filter_vps = x265_interp_4tap_vert_ps_4x8_sse2;
p.chroma[X265_CSP_I444].pu[LUMA_4x16].filter_vps = x265_interp_4tap_vert_ps_4x16_sse2;
@@ -2771,6 +2776,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I444].pu[LUMA_16x64].filter_vps = x265_interp_4tap_vert_ps_16x64_avx2;
p.chroma[X265_CSP_I444].pu[LUMA_32x64].filter_vps = x265_interp_4tap_vert_ps_32x64_avx2;
p.chroma[X265_CSP_I444].pu[LUMA_48x64].filter_vps = x265_interp_4tap_vert_ps_48x64_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_vps = x265_interp_4tap_vert_ps_64x64_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_vps = x265_interp_4tap_vert_ps_64x48_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_vps = x265_interp_4tap_vert_ps_64x32_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_vps = x265_interp_4tap_vert_ps_64x16_avx2;
//i422 for chroma_vpp
p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].filter_vpp = x265_interp_4tap_vert_pp_4x8_avx2;
@@ -2820,6 +2829,10 @@ void setupAssemblyPrimitives(EncoderPrim
p.chroma[X265_CSP_I444].pu[LUMA_16x64].filter_vpp = x265_interp_4tap_vert_pp_16x64_avx2;
p.chroma[X265_CSP_I444].pu[LUMA_32x64].filter_vpp = x265_interp_4tap_vert_pp_32x64_avx2;
p.chroma[X265_CSP_I444].pu[LUMA_48x64].filter_vpp = x265_interp_4tap_vert_pp_48x64_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x64].filter_vpp = x265_interp_4tap_vert_pp_64x64_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x48].filter_vpp = x265_interp_4tap_vert_pp_64x48_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x32].filter_vpp = x265_interp_4tap_vert_pp_64x32_avx2;
+ p.chroma[X265_CSP_I444].pu[LUMA_64x16].filter_vpp = x265_interp_4tap_vert_pp_64x16_avx2;
if (cpuMask & X265_CPU_BMI2)
p.scanPosLast = x265_scanPosLast_avx2_bmi2;
diff -r 58309953273e -r 35dd4bea0bc7 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm Tue May 19 17:04:04 2015 +0530
+++ b/source/common/x86/const-a.asm Tue May 19 18:29:06 2015 -0700
@@ -63,6 +63,8 @@ const pb_000000000000000F, db
const pw_1, times 16 dw 1
const pw_2, times 16 dw 2
+const pw_3, times 16 dw 3
+const pw_7, times 16 dw 7
const pw_m2, times 8 dw -2
const pw_4, times 8 dw 4
const pw_8, times 8 dw 8
@@ -112,6 +114,7 @@ const pd_2, times 8 dd
const pd_4, times 4 dd 4
const pd_8, times 4 dd 8
const pd_16, times 4 dd 16
+const pd_31, times 4 dd 31
const pd_32, times 8 dd 32
const pd_64, times 4 dd 64
const pd_128, times 4 dd 128
diff -r 58309953273e -r 35dd4bea0bc7 source/common/x86/intrapred16.asm
--- a/source/common/x86/intrapred16.asm Tue May 19 17:04:04 2015 +0530
+++ b/source/common/x86/intrapred16.asm Tue May 19 18:29:06 2015 -0700
@@ -44,7 +44,6 @@ const shuf_mode32_18, db 14, 15, 1
const pw_punpcklwd, db 0, 1, 2, 3, 2, 3, 4, 5, 4, 5, 6, 7, 6, 7, 8, 9
const c_mode32_10_0, db 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
-const pw_unpackwdq, times 8 db 0,1
const pw_ang8_12, db 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 13, 0, 1
const pw_ang8_13, db 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 15, 8, 9, 0, 1
const pw_ang8_14, db 0, 0, 0, 0, 0, 0, 0, 0, 14, 15, 10, 11, 4, 5, 0, 1
@@ -58,16 +57,6 @@ const pw_ang16_16, db 0, 0, 0, 0, 0, 0
;; (blkSize - 1 - x)
pw_planar4_0: dw 3, 2, 1, 0, 3, 2, 1, 0
-pw_planar4_1: dw 3, 3, 3, 3, 3, 3, 3, 3
-pw_planar8_0: dw 7, 6, 5, 4, 3, 2, 1, 0
-pw_planar8_1: dw 7, 7, 7, 7, 7, 7, 7, 7
-pw_planar16_0: dw 15, 14, 13, 12, 11, 10, 9, 8
-pw_planar16_1: dw 15, 15, 15, 15, 15, 15, 15, 15
-pd_planar32_1: dd 31, 31, 31, 31
-
-pw_planar32_1: dw 31, 31, 31, 31, 31, 31, 31, 31
-pw_planar32_L: dw 31, 30, 29, 28, 27, 26, 25, 24
-pw_planar32_H: dw 23, 22, 21, 20, 19, 18, 17, 16
const planar32_table
%assign x 31
@@ -85,8 +74,11 @@ const planar32_table1
SECTION .text
+cextern pb_01
cextern pw_1
cextern pw_2
+cextern pw_3
+cextern pw_7
cextern pw_4
cextern pw_8
cextern pw_15
@@ -95,6 +87,7 @@ cextern pw_31
cextern pw_32
cextern pw_1023
cextern pd_16
+cextern pd_31
cextern pd_32
cextern pw_4096
cextern multiL
@@ -681,7 +674,7 @@ cglobal intra_pred_planar8, 3,3,5
pshufd m4, m4, 0 ; v_bottomLeft
pmullw m3, [multiL] ; (x + 1) * topRight
- pmullw m0, m1, [pw_planar8_1] ; (blkSize - 1 - y) * above[x]
+ pmullw m0, m1, [pw_7] ; (blkSize - 1 - y) * above[x]
paddw m3, [pw_8]
paddw m3, m4
paddw m3, m0
@@ -695,7 +688,7 @@ cglobal intra_pred_planar8, 3,3,5
pshufhw m1, m2, 0x55 * (%1 - 4)
pshufd m1, m1, 0xAA
%endif
- pmullw m1, [pw_planar8_0]
+ pmullw m1, [pw_planar16_mul + mmsize]
paddw m1, m3
psraw m1, 4
movu [r0], m1
@@ -733,8 +726,8 @@ cglobal intra_pred_planar16, 3,3,8
pmullw m4, m3, [multiH] ; (x + 1) * topRight
pmullw m3, [multiL] ; (x + 1) * topRight
- pmullw m1, m2, [pw_planar16_1] ; (blkSize - 1 - y) * above[x]
- pmullw m5, m7, [pw_planar16_1] ; (blkSize - 1 - y) * above[x]
+ pmullw m1, m2, [pw_15] ; (blkSize - 1 - y) * above[x]
+ pmullw m5, m7, [pw_15] ; (blkSize - 1 - y) * above[x]
paddw m4, [pw_16]
paddw m3, [pw_16]
paddw m4, m6
@@ -770,8 +763,8 @@ cglobal intra_pred_planar16, 3,3,8
paddw m4, m1
lea r0, [r0 + r1 * 2]
%endif
- pmullw m0, m5, [pw_planar8_0]
- pmullw m5, [pw_planar16_0]
+ pmullw m0, m5, [pw_planar16_mul + mmsize]
+ pmullw m5, [pw_planar16_mul]
paddw m0, m4
paddw m5, m3
psraw m5, 5
@@ -827,7 +820,7 @@ cglobal intra_pred_planar32, 3,3,16
mova m9, m6
mova m10, m6
- mova m12, [pw_planar32_1]
+ mova m12, [pw_31]
movu m4, [r2 + 2]
psubw m8, m4
pmullw m4, m12
@@ -848,10 +841,10 @@ cglobal intra_pred_planar32, 3,3,16
pmullw m5, m12
paddw m3, m5
- mova m12, [pw_planar32_L]
- mova m13, [pw_planar32_H]
- mova m14, [pw_planar16_0]
- mova m15, [pw_planar8_0]
+ mova m12, [pw_planar32_mul]
+ mova m13, [pw_planar32_mul + mmsize]
+ mova m14, [pw_planar16_mul]
+ mova m15, [pw_planar16_mul + mmsize]
add r1, r1
%macro PROCESS 1
@@ -1596,7 +1589,7 @@ cglobal intra_pred_planar4, 3,3,5
pshufd m4, m4, 0xAA
pmullw m3, [multi_2Row] ; (x + 1) * topRight
- pmullw m0, m1, [pw_planar4_1] ; (blkSize - 1 - y) * above[x]
+ pmullw m0, m1, [pw_3] ; (blkSize - 1 - y) * above[x]
paddw m3, [pw_4]
paddw m3, m4
@@ -1934,7 +1927,7 @@ cglobal intra_pred_planar4, 3,3,5
pshufd m4, m4, 0xAA
pmullw m3, [multi_2Row] ; (x + 1) * topRight
- pmullw m0, m1, [pw_planar4_1] ; (blkSize - 1 - y) * above[x]
+ pmullw m0, m1, [pw_3] ; (blkSize - 1 - y) * above[x]
More information about the x265-commits
mailing list