[x265-commits] [x265] analysis: allow intra mode in RD-0/4
Ashok Kumar Mishra
ashok at multicorewareinc.com
Mon Jun 22 16:00:51 CEST 2015
details: http://hg.videolan.org/x265/rev/8aa2bedda740
branches:
changeset: 10675:8aa2bedda740
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Thu Jun 04 16:40:19 2015 +0530
description:
analysis: allow intra mode in RD-0/4
Output wiil be changed for --limit-refs 0 command line
Subject: [x265] doc: update limit-refs behaviour for intra modes
details: http://hg.videolan.org/x265/rev/44b6b2df7016
branches:
changeset: 10676:44b6b2df7016
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Fri Jun 19 16:43:29 2015 +0530
description:
doc: update limit-refs behaviour for intra modes
Subject: [x265] param: move x265_atof into namespace "X265_NS"
details: http://hg.videolan.org/x265/rev/10f8683f725d
branches:
changeset: 10677:10f8683f725d
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Fri Jun 19 18:58:01 2015 +0530
description:
param: move x265_atof into namespace "X265_NS"
Subject: [x265] testbench: costCoeffNxN and enable asm code (based on Sumalatha's patch)
details: http://hg.videolan.org/x265/rev/43ae2f789af1
branches:
changeset: 10678:43ae2f789af1
user: Min Chen <chenm003 at 163.com>
date: Fri Jun 19 17:44:56 2015 -0700
description:
testbench: costCoeffNxN and enable asm code (based on Sumalatha's patch)
Subject: [x265] asm: intrapred_angX_4x4 sse2 performance tweaks
details: http://hg.videolan.org/x265/rev/3f004e9a1159
branches:
changeset: 10679:3f004e9a1159
user: David T Yuen <dtyx265 at gmail.com>
date: Sun Jun 21 18:33:58 2015 -0700
description:
asm: intrapred_angX_4x4 sse2 performance tweaks
Created individual primitives for angles 19-25 and 27-33 to allow
individual tweaking of each angle for about 20% performance improvement
intra_ang_4x4[ 3] 3.66x 542.46 1986.43
intra_ang_4x4[ 4] 4.21x 507.58 2135.09
intra_ang_4x4[ 5] 4.16x 510.05 2119.99
intra_ang_4x4[ 6] 4.43x 482.52 2135.18
intra_ang_4x4[ 7] 4.09x 477.58 1955.19
intra_ang_4x4[ 8] 4.53x 460.03 2085.06
intra_ang_4x4[ 9] 4.51x 462.54 2084.99
intra_ang_4x4[11] 4.53x 480.05 2176.00
intra_ang_4x4[12] 4.66x 480.00 2235.34
intra_ang_4x4[13] 4.24x 550.06 2330.84
intra_ang_4x4[14] 4.13x 567.51 2345.12
intra_ang_4x4[15] 4.08x 567.53 2315.21
intra_ang_4x4[16] 4.17x 567.52 2365.42
intra_ang_4x4[17] 3.98x 610.05 2425.51
intra_ang_4x4[19] 3.54x 514.99 1825.34
intra_ang_4x4[20] 3.88x 452.49 1755.41
intra_ang_4x4[21] 3.72x 452.66 1684.99
intra_ang_4x4[22] 3.79x 460.04 1745.36
intra_ang_4x4[23] 3.65x 470.09 1715.27
intra_ang_4x4[24] 4.60x 362.51 1666.24
intra_ang_4x4[25] 4.32x 362.62 1565.41
intra_ang_4x4[27] 4.24x 352.69 1496.58
intra_ang_4x4[28] 4.24x 352.60 1495.93
intra_ang_4x4[29] 3.66x 365.34 1336.02
intra_ang_4x4[30] 3.96x 377.61 1495.37
intra_ang_4x4[31] 3.68x 420.17 1545.37
intra_ang_4x4[32] 3.86x 400.19 1545.37
intra_ang_4x4[33] 3.12x 427.53 1335.37
Subject: [x265] asm: intrapred_angX_4x4 sse2 performance tweaks 10-bit
details: http://hg.videolan.org/x265/rev/fd899b282f19
branches:
changeset: 10680:fd899b282f19
user: David T Yuen <dtyx265 at gmail.com>
date: Sun Jun 21 20:52:16 2015 -0700
description:
asm: intrapred_angX_4x4 sse2 performance tweaks 10-bit
Created individual primitives for angles 19-25 and 27-33 to allow
individual tweaking of each angle for about 5% performance improvement
intra_ang_4x4[ 3] 3.90x 487.44 1900.97
intra_ang_4x4[ 4] 4.51x 454.99 2050.33
intra_ang_4x4[ 5] 4.51x 455.00 2049.97
intra_ang_4x4[ 6] 4.82x 425.00 2049.97
intra_ang_4x4[ 7] 4.44x 427.50 1899.97
intra_ang_4x4[ 8] 4.71x 425.00 1999.97
intra_ang_4x4[ 9] 4.71x 425.00 1999.97
intra_ang_4x4[11] 4.76x 410.00 1951.26
intra_ang_4x4[12] 5.00x 410.00 2050.27
intra_ang_4x4[13] 4.48x 482.50 2160.44
intra_ang_4x4[14] 4.70x 462.50 2172.89
intra_ang_4x4[15] 4.57x 460.00 2100.26
intra_ang_4x4[16] 4.83x 455.00 2199.91
intra_ang_4x4[17] 3.96x 562.50 2230.17
intra_ang_4x4[19] 3.67x 475.00 1742.82
intra_ang_4x4[20] 4.32x 397.49 1715.35
intra_ang_4x4[21] 3.88x 402.49 1562.49
intra_ang_4x4[22] 4.08x 410.00 1672.74
intra_ang_4x4[23] 3.91x 415.00 1622.59
intra_ang_4x4[24] 4.09x 370.00 1513.66
intra_ang_4x4[25] 3.79x 372.50 1412.90
intra_ang_4x4[27] 4.00x 365.01 1460.97
intra_ang_4x4[28] 3.85x 380.01 1462.66
intra_ang_4x4[29] 3.73x 365.00 1359.97
intra_ang_4x4[30] 4.11x 367.50 1509.97
intra_ang_4x4[31] 4.00x 377.50 1509.97
intra_ang_4x4[32] 4.00x 377.50 1509.97
intra_ang_4x4[33] 3.44x 395.00 1359.97
Subject: [x265] winxp: partial fix for Issue #146, rename x265 to X265_NS
details: http://hg.videolan.org/x265/rev/e8dc042008fa
branches:
changeset: 10681:e8dc042008fa
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Jun 22 11:08:01 2015 +0530
description:
winxp: partial fix for Issue #146, rename x265 to X265_NS
Subject: [x265] winxp: fix typo
details: http://hg.videolan.org/x265/rev/83a7d8244424
branches:
changeset: 10682:83a7d8244424
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Mon Jun 22 15:15:33 2015 +0530
description:
winxp: fix typo
diffstat:
doc/reST/cli.rst | 4 +
source/common/param.cpp | 20 +-
source/common/param.h | 2 +
source/common/winxp.h | 12 +-
source/common/x86/asm-primitives.cpp | 58 +-
source/common/x86/intrapred16.asm | 987 +++++++++++++++----------
source/common/x86/intrapred8.asm | 1293 +++++++++++++++++++++++----------
source/encoder/analysis.cpp | 4 +-
source/test/pixelharness.cpp | 164 ++++
source/test/pixelharness.h | 2 +
10 files changed, 1686 insertions(+), 860 deletions(-)
diffs (truncated from 2808 to 300 lines):
diff -r 1c6de5ac3883 -r 83a7d8244424 doc/reST/cli.rst
--- a/doc/reST/cli.rst Thu Jun 18 15:29:11 2015 -0500
+++ b/doc/reST/cli.rst Mon Jun 22 15:15:33 2015 +0530
@@ -620,6 +620,10 @@ the prediction quad-tree.
CUs and the rect/amp motion searches at that depth will only use the
reference(s) selected by 2Nx2N.
+ For all non-zero values of limit-refs, the current depth will evaluate
+ intra mode (in inter slices), only if intra mode was chosen as the best
+ mode for atleast one of the 4 sub-blocks.
+
You can often increase the number of references you are using
(within your decoder level limits) if you enable one or
both of these flags.
diff -r 1c6de5ac3883 -r 83a7d8244424 source/common/param.cpp
--- a/source/common/param.cpp Thu Jun 18 15:29:11 2015 -0500
+++ b/source/common/param.cpp Mon Jun 22 15:15:33 2015 +0530
@@ -471,16 +471,6 @@ static int x265_atobool(const char* str,
return 0;
}
-static double x265_atof(const char* str, bool& bError)
-{
- char *end;
- double v = strtod(str, &end);
-
- if (end == str || *end != '\0')
- bError = true;
- return v;
-}
-
static int parseName(const char* arg, const char* const* names, bool& bError)
{
for (int i = 0; names[i]; i++)
@@ -890,6 +880,16 @@ int x265_atoi(const char* str, bool& bEr
return v;
}
+double x265_atof(const char* str, bool& bError)
+{
+ char *end;
+ double v = strtod(str, &end);
+
+ if (end == str || *end != '\0')
+ bError = true;
+ return v;
+}
+
/* cpu name can be:
* auto || true - x265::cpu_detect()
* false || no - disabled
diff -r 1c6de5ac3883 -r 83a7d8244424 source/common/param.h
--- a/source/common/param.h Thu Jun 18 15:29:11 2015 -0500
+++ b/source/common/param.h Mon Jun 22 15:15:33 2015 +0530
@@ -2,6 +2,7 @@
* Copyright (C) 2013 x265 project
*
* Authors: Deepthi Nandakumar <deepthi at multicorewareinc.com>
+ * Praveen Kumar Tiwari <praveen at multicorewareinc.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -33,6 +34,7 @@ void x265_print_reconfigured_params(x26
void x265_param_apply_fastfirstpass(x265_param *p);
char* x265_param2string(x265_param *param);
int x265_atoi(const char *str, bool& bError);
+double x265_atof(const char *str, bool& bError);
int parseCpuName(const char *value, bool& bError);
void setParamAspectRatio(x265_param *p, int width, int height);
void getParamAspectRatio(x265_param *p, int& width, int& height);
diff -r 1c6de5ac3883 -r 83a7d8244424 source/common/winxp.h
--- a/source/common/winxp.h Thu Jun 18 15:29:11 2015 -0500
+++ b/source/common/winxp.h Mon Jun 22 15:15:33 2015 +0530
@@ -49,12 +49,12 @@ BOOL WINAPI cond_wait(ConditionVariable
void cond_destroy(ConditionVariable *cond);
/* map missing API symbols to our structure and functions */
-#define CONDITION_VARIABLE x265::ConditionVariable
-#define InitializeConditionVariable x265::cond_init
-#define SleepConditionVariableCS x265::cond_wait
-#define WakeConditionVariable x265::cond_signal
-#define WakeAllConditionVariable x265::cond_broadcast
-#define XP_CONDITION_VAR_FREE x265::cond_destroy
+#define CONDITION_VARIABLE X265_NS::ConditionVariable
+#define InitializeConditionVariable X265_NS::cond_init
+#define SleepConditionVariableCS X265_NS::cond_wait
+#define WakeConditionVariable X265_NS::cond_signal
+#define WakeAllConditionVariable X265_NS::cond_broadcast
+#define XP_CONDITION_VAR_FREE X265_NS::cond_destroy
} // namespace X265_NS
diff -r 1c6de5ac3883 -r 83a7d8244424 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Thu Jun 18 15:29:11 2015 -0500
+++ b/source/common/x86/asm-primitives.cpp Mon Jun 22 15:15:33 2015 +0530
@@ -977,21 +977,21 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_4x4].intra_pred[16] = PFX(intra_pred_ang4_16_sse2);
p.cu[BLOCK_4x4].intra_pred[17] = PFX(intra_pred_ang4_17_sse2);
p.cu[BLOCK_4x4].intra_pred[18] = PFX(intra_pred_ang4_18_sse2);
- p.cu[BLOCK_4x4].intra_pred[19] = PFX(intra_pred_ang4_17_sse2);
- p.cu[BLOCK_4x4].intra_pred[20] = PFX(intra_pred_ang4_16_sse2);
- p.cu[BLOCK_4x4].intra_pred[21] = PFX(intra_pred_ang4_15_sse2);
- p.cu[BLOCK_4x4].intra_pred[22] = PFX(intra_pred_ang4_14_sse2);
- p.cu[BLOCK_4x4].intra_pred[23] = PFX(intra_pred_ang4_13_sse2);
- p.cu[BLOCK_4x4].intra_pred[24] = PFX(intra_pred_ang4_12_sse2);
- p.cu[BLOCK_4x4].intra_pred[25] = PFX(intra_pred_ang4_11_sse2);
+ p.cu[BLOCK_4x4].intra_pred[19] = PFX(intra_pred_ang4_19_sse2);
+ p.cu[BLOCK_4x4].intra_pred[20] = PFX(intra_pred_ang4_20_sse2);
+ p.cu[BLOCK_4x4].intra_pred[21] = PFX(intra_pred_ang4_21_sse2);
+ p.cu[BLOCK_4x4].intra_pred[22] = PFX(intra_pred_ang4_22_sse2);
+ p.cu[BLOCK_4x4].intra_pred[23] = PFX(intra_pred_ang4_23_sse2);
+ p.cu[BLOCK_4x4].intra_pred[24] = PFX(intra_pred_ang4_24_sse2);
+ p.cu[BLOCK_4x4].intra_pred[25] = PFX(intra_pred_ang4_25_sse2);
p.cu[BLOCK_4x4].intra_pred[26] = PFX(intra_pred_ang4_26_sse2);
- p.cu[BLOCK_4x4].intra_pred[27] = PFX(intra_pred_ang4_9_sse2);
- p.cu[BLOCK_4x4].intra_pred[28] = PFX(intra_pred_ang4_8_sse2);
- p.cu[BLOCK_4x4].intra_pred[29] = PFX(intra_pred_ang4_7_sse2);
- p.cu[BLOCK_4x4].intra_pred[30] = PFX(intra_pred_ang4_6_sse2);
- p.cu[BLOCK_4x4].intra_pred[31] = PFX(intra_pred_ang4_5_sse2);
- p.cu[BLOCK_4x4].intra_pred[32] = PFX(intra_pred_ang4_4_sse2);
- p.cu[BLOCK_4x4].intra_pred[33] = PFX(intra_pred_ang4_3_sse2);
+ p.cu[BLOCK_4x4].intra_pred[27] = PFX(intra_pred_ang4_27_sse2);
+ p.cu[BLOCK_4x4].intra_pred[28] = PFX(intra_pred_ang4_28_sse2);
+ p.cu[BLOCK_4x4].intra_pred[29] = PFX(intra_pred_ang4_29_sse2);
+ p.cu[BLOCK_4x4].intra_pred[30] = PFX(intra_pred_ang4_30_sse2);
+ p.cu[BLOCK_4x4].intra_pred[31] = PFX(intra_pred_ang4_31_sse2);
+ p.cu[BLOCK_4x4].intra_pred[32] = PFX(intra_pred_ang4_32_sse2);
+ p.cu[BLOCK_4x4].intra_pred[33] = PFX(intra_pred_ang4_33_sse2);
p.cu[BLOCK_4x4].sse_ss = PFX(pixel_ssd_ss_4x4_mmx2);
ALL_LUMA_CU(sse_ss, pixel_ssd_ss, sse2);
@@ -2208,21 +2208,21 @@ void setupAssemblyPrimitives(EncoderPrim
p.cu[BLOCK_4x4].intra_pred[16] = PFX(intra_pred_ang4_16_sse2);
p.cu[BLOCK_4x4].intra_pred[17] = PFX(intra_pred_ang4_17_sse2);
p.cu[BLOCK_4x4].intra_pred[18] = PFX(intra_pred_ang4_18_sse2);
- p.cu[BLOCK_4x4].intra_pred[19] = PFX(intra_pred_ang4_17_sse2);
- p.cu[BLOCK_4x4].intra_pred[20] = PFX(intra_pred_ang4_16_sse2);
- p.cu[BLOCK_4x4].intra_pred[21] = PFX(intra_pred_ang4_15_sse2);
- p.cu[BLOCK_4x4].intra_pred[22] = PFX(intra_pred_ang4_14_sse2);
- p.cu[BLOCK_4x4].intra_pred[23] = PFX(intra_pred_ang4_13_sse2);
- p.cu[BLOCK_4x4].intra_pred[24] = PFX(intra_pred_ang4_12_sse2);
- p.cu[BLOCK_4x4].intra_pred[25] = PFX(intra_pred_ang4_11_sse2);
+ p.cu[BLOCK_4x4].intra_pred[19] = PFX(intra_pred_ang4_19_sse2);
+ p.cu[BLOCK_4x4].intra_pred[20] = PFX(intra_pred_ang4_20_sse2);
+ p.cu[BLOCK_4x4].intra_pred[21] = PFX(intra_pred_ang4_21_sse2);
+ p.cu[BLOCK_4x4].intra_pred[22] = PFX(intra_pred_ang4_22_sse2);
+ p.cu[BLOCK_4x4].intra_pred[23] = PFX(intra_pred_ang4_23_sse2);
+ p.cu[BLOCK_4x4].intra_pred[24] = PFX(intra_pred_ang4_24_sse2);
+ p.cu[BLOCK_4x4].intra_pred[25] = PFX(intra_pred_ang4_25_sse2);
p.cu[BLOCK_4x4].intra_pred[26] = PFX(intra_pred_ang4_26_sse2);
- p.cu[BLOCK_4x4].intra_pred[27] = PFX(intra_pred_ang4_9_sse2);
- p.cu[BLOCK_4x4].intra_pred[28] = PFX(intra_pred_ang4_8_sse2);
- p.cu[BLOCK_4x4].intra_pred[29] = PFX(intra_pred_ang4_7_sse2);
- p.cu[BLOCK_4x4].intra_pred[30] = PFX(intra_pred_ang4_6_sse2);
- p.cu[BLOCK_4x4].intra_pred[31] = PFX(intra_pred_ang4_5_sse2);
- p.cu[BLOCK_4x4].intra_pred[32] = PFX(intra_pred_ang4_4_sse2);
- p.cu[BLOCK_4x4].intra_pred[33] = PFX(intra_pred_ang4_3_sse2);
+ p.cu[BLOCK_4x4].intra_pred[27] = PFX(intra_pred_ang4_27_sse2);
+ p.cu[BLOCK_4x4].intra_pred[28] = PFX(intra_pred_ang4_28_sse2);
+ p.cu[BLOCK_4x4].intra_pred[29] = PFX(intra_pred_ang4_29_sse2);
+ p.cu[BLOCK_4x4].intra_pred[30] = PFX(intra_pred_ang4_30_sse2);
+ p.cu[BLOCK_4x4].intra_pred[31] = PFX(intra_pred_ang4_31_sse2);
+ p.cu[BLOCK_4x4].intra_pred[32] = PFX(intra_pred_ang4_32_sse2);
+ p.cu[BLOCK_4x4].intra_pred[33] = PFX(intra_pred_ang4_33_sse2);
p.cu[BLOCK_4x4].intra_pred_allangs = PFX(all_angs_pred_4x4_sse2);
@@ -2451,7 +2451,7 @@ void setupAssemblyPrimitives(EncoderPrim
ALL_LUMA_CU(psy_cost_ss, psyCost_ss, sse4);
// TODO: it is passed smoke test, but we need testbench, so temporary disable
- //p.costCoeffNxN = PFX(costCoeffNxN_sse4);
+ p.costCoeffNxN = PFX(costCoeffNxN_sse4);
#endif
// TODO: it is passed smoke test, but we need testbench to active it, so temporary disable
//p.costCoeffRemain = x265_costCoeffRemain_sse4;
diff -r 1c6de5ac3883 -r 83a7d8244424 source/common/x86/intrapred16.asm
--- a/source/common/x86/intrapred16.asm Thu Jun 18 15:29:11 2015 -0500
+++ b/source/common/x86/intrapred16.asm Mon Jun 22 15:15:33 2015 +0530
@@ -1030,6 +1030,43 @@ cglobal intra_pred_planar16, 3,3,4
%undef INTRA_PRED_PLANAR16_AVX2
RET
+%macro TRANSPOSE_4x4 0
+ punpckhwd m0, m1, m3
+ punpcklwd m1, m3
+ punpckhwd m3, m1, m0
+ punpcklwd m1, m0
+%endmacro
+
+%macro STORE_4x4 0
+ add r1, r1
+ movh [r0], m1
+ movhps [r0 + r1], m1
+ movh [r0 + r1 * 2], m3
+ lea r1, [r1 * 3]
+ movhps [r0 + r1], m3
+%endmacro
+
+%macro CALC_4x4 4
+ mova m0, [pd_16]
+ pmaddwd m1, [ang_table + %1 * 16]
+ paddd m1, m0
+ psrld m1, 5
+
+ pmaddwd m2, [ang_table + %2 * 16]
+ paddd m2, m0
+ psrld m2, 5
+ packssdw m1, m2
+
+ pmaddwd m3, [ang_table + %3 * 16]
+ paddd m3, m0
+ psrld m3, 5
+
+ pmaddwd m4, [ang_table + %4 * 16]
+ paddd m4, m0
+ psrld m4, 5
+ packssdw m3, m4
+%endmacro
+
;-----------------------------------------------------------------------------------------
; void intraPredAng4(pixel* dst, intptr_t dstStride, pixel* src, int dirMode, int bFilter)
;-----------------------------------------------------------------------------------------
@@ -1052,216 +1089,140 @@ cglobal intra_pred_ang4_2, 3,5,4
movh [r0 + r1], m0
RET
-cglobal intra_pred_ang4_3, 3,5,8
- mov r4d, 2
- cmp r3m, byte 33
- mov r3d, 18
- cmove r3d, r4d
-
- movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
-
+cglobal intra_pred_ang4_3, 3,3,5
+ movu m0, [r2 + 18] ;[8 7 6 5 4 3 2 1]
+ mova m1, m0
+ psrldq m0, 2
+ punpcklwd m1, m0 ;[5 4 4 3 3 2 2 1]
mova m2, m0
psrldq m0, 2
- punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
+ punpcklwd m2, m0 ;[6 5 5 4 4 3 3 2]
mova m3, m0
psrldq m0, 2
- punpcklwd m3, m0 ; [6 5 5 4 4 3 3 2]
+ punpcklwd m3, m0 ;[7 6 6 5 5 4 4 3]
mova m4, m0
psrldq m0, 2
- punpcklwd m4, m0 ; [7 6 6 5 5 4 4 3]
- mova m5, m0
+ punpcklwd m4, m0 ;[8 7 7 6 6 5 5 4]
+
+ CALC_4x4 26, 20, 14, 8
+
+ TRANSPOSE_4x4
+
+ STORE_4x4
+ RET
+
+cglobal intra_pred_ang4_33, 3,3,5
+ movu m0, [r2 + 2] ;[8 7 6 5 4 3 2 1]
+ mova m1, m0
psrldq m0, 2
- punpcklwd m5, m0 ; [8 7 7 6 6 5 5 4]
-
-
- lea r3, [ang_table + 20 * 16]
- mova m0, [r3 + 6 * 16] ; [26]
- mova m1, [r3] ; [20]
- mova m6, [r3 - 6 * 16] ; [14]
- mova m7, [r3 - 12 * 16] ; [ 8]
- jmp .do_filter4x4
-
-
-ALIGN 16
-.do_filter4x4:
- lea r4, [pd_16]
- pmaddwd m2, m0
- paddd m2, [r4]
- psrld m2, 5
-
- pmaddwd m3, m1
- paddd m3, [r4]
- psrld m3, 5
- packssdw m2, m3
-
- pmaddwd m4, m6
- paddd m4, [r4]
- psrld m4, 5
-
- pmaddwd m5, m7
- paddd m5, [r4]
- psrld m5, 5
- packssdw m4, m5
-
- jz .store
-
- ; transpose 4x4
More information about the x265-commits
mailing list