[x265-commits] [x265] encoder: combine create() and init() functions
Steve Borho
steve at borho.org
Thu Dec 18 22:54:20 CET 2014
details: http://hg.videolan.org/x265/rev/6ba7be7b1697
branches:
changeset: 8987:6ba7be7b1697
user: Steve Borho <steve at borho.org>
date: Sat Dec 13 01:03:19 2014 -0600
description:
encoder: combine create() and init() functions
They were always called back-to-back() and their functionality was non-distinct.
It also now checks for abort errors at startup and returns a NULL from the
encoder open function (early aborts are usually malloc failures)
Subject: [x265] asm: chroma_hpp[32x32] for colorspace i420 in avx2 improve 6189c->3537c
details: http://hg.videolan.org/x265/rev/619c0e654f5b
branches:
changeset: 8988:619c0e654f5b
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Tue Dec 16 09:31:26 2014 +0530
description:
asm: chroma_hpp[32x32] for colorspace i420 in avx2 improve 6189c->3537c
Subject: [x265] asm: chroma_hpp[16x16] for colorspace i420 in avx2 improve 1540c->969c
details: http://hg.videolan.org/x265/rev/775ebb4694ad
branches:
changeset: 8989:775ebb4694ad
user: Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date: Tue Dec 16 09:40:00 2014 +0530
description:
asm: chroma_hpp[16x16] for colorspace i420 in avx2 improve 1540c->969c
Subject: [x265] fix: output wrong WppEntryOffset when emulating start code at end of WPP row
details: http://hg.videolan.org/x265/rev/295d033cb091
branches:
changeset: 8990:295d033cb091
user: Min Chen <chenm003 at 163.com>
date: Tue Dec 16 15:53:14 2014 -0800
description:
fix: output wrong WppEntryOffset when emulating start code at end of WPP row
Subject: [x265] doc: improve documentation for --stats and multi-pass in general
details: http://hg.videolan.org/x265/rev/42fb030a4c43
branches:
changeset: 8991:42fb030a4c43
user: Steve Borho <steve at borho.org>
date: Wed Dec 17 13:16:48 2014 -0600
description:
doc: improve documentation for --stats and multi-pass in general
Subject: [x265] ppa: minimize code foot-print of profiling events
details: http://hg.videolan.org/x265/rev/3315d6c0ced1
branches:
changeset: 8992:3315d6c0ced1
user: Steve Borho <steve at borho.org>
date: Wed Dec 17 13:28:38 2014 -0600
description:
ppa: minimize code foot-print of profiling events
This will allow us to add support for more profiling systems without littering
the code
Subject: [x265] ppa: simplify interfaces, enforce coding style
details: http://hg.videolan.org/x265/rev/952a2a361fcb
branches:
changeset: 8993:952a2a361fcb
user: Steve Borho <steve at borho.org>
date: Wed Dec 17 13:42:35 2014 -0600
description:
ppa: simplify interfaces, enforce coding style
Subject: [x265] ppa: refine event names
details: http://hg.videolan.org/x265/rev/6cbd7d26b2a1
branches:
changeset: 8994:6cbd7d26b2a1
user: Steve Borho <steve at borho.org>
date: Wed Dec 17 13:54:42 2014 -0600
description:
ppa: refine event names
Drop the unused names, remove uninteresting events. Try to cover the main
thread pool tasks and the frame encoder times.
Subject: [x265] ppa: emit one event per CTU for more clarity, disable frame threads events
details: http://hg.videolan.org/x265/rev/78ae7996a1ce
branches:
changeset: 8995:78ae7996a1ce
user: Steve Borho <steve at borho.org>
date: Wed Dec 17 14:31:50 2014 -0600
description:
ppa: emit one event per CTU for more clarity, disable frame threads events
The frame threads are generally uninteresting when WPP is in use
diffstat:
doc/reST/cli.rst | 11 ++-
source/PPA/ppa.cpp | 4 +-
source/PPA/ppa.h | 38 +---------
source/PPA/ppaApi.h | 11 +++
source/PPA/ppaCPUEvents.h | 31 +-------
source/common/common.h | 9 ++
source/common/x86/asm-primitives.cpp | 2 +
source/common/x86/ipfilter8.asm | 120 +++++++++++++++++++++++++++++++++++
source/encoder/analysis.cpp | 2 -
source/encoder/api.cpp | 6 +-
source/encoder/encoder.cpp | 78 +++++++++++-----------
source/encoder/encoder.h | 2 -
source/encoder/frameencoder.cpp | 9 +-
source/encoder/framefilter.cpp | 3 +-
source/encoder/nal.cpp | 6 +-
source/encoder/slicetype.cpp | 4 +
source/x265.cpp | 5 +-
source/x265.h | 5 +-
18 files changed, 223 insertions(+), 123 deletions(-)
diffs (truncated from 638 to 300 lines):
diff -r ee36b6311aaf -r 78ae7996a1ce doc/reST/cli.rst
--- a/doc/reST/cli.rst Sat Dec 13 00:24:11 2014 -0600
+++ b/doc/reST/cli.rst Wed Dec 17 14:31:50 2014 -0600
@@ -956,7 +956,7 @@ Quality, rate control and rate distortio
.. option:: --pass <integer>
- Enable multipass rate control mode. Input is encoded multiple times,
+ Enable multi-pass rate control mode. Input is encoded multiple times,
storing the encoded information of each pass in a stats file from which
the consecutive pass tunes the qp of each frame to improve the quality
of the output. Default disabled
@@ -967,12 +967,17 @@ Quality, rate control and rate distortio
**Range of values:** 1 to 3
+.. option:: --stats <filename>
+
+ Specify file name of of the multi-pass stats file. If unspecified
+ the encoder will use x265_2pass.log
+
.. option:: --slow-firstpass, --no-slow-firstpass
- Enable a slow and more detailed first pass encode in Multipass rate
+ Enable a slow and more detailed first pass encode in multi-pass rate
control mode. Speed of the first pass encode is slightly lesser and
quality midly improved when compared to the default settings in a
- multipass encode. Default disabled (turbo mode enabled)
+ multi-pass encode. Default disabled (turbo mode enabled)
When **turbo** first pass is not disabled, these options are
set on the first pass to improve performance:
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppa.cpp
--- a/source/PPA/ppa.cpp Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppa.cpp Wed Dec 17 14:31:50 2014 -0600
@@ -41,8 +41,10 @@ typedef ppa::Base *(FUNC_PPALibInit)(con
typedef void (FUNC_PPALibRelease)(ppa::Base* &);
}
+using namespace ppa;
+
static FUNC_PPALibRelease *_pfuncPpaRelease;
-ppa::Base *ppabase;
+ppa::Base *ppa::ppabase;
static void _ppaReleaseAtExit()
{
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppa.h
--- a/source/PPA/ppa.h Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppa.h Wed Dec 17 14:31:50 2014 -0600
@@ -21,17 +21,8 @@
* For more information, contact us at license @ x265.com.
*****************************************************************************/
-#ifndef _PPA_H_
-#define _PPA_H_
-
-#if !defined(ENABLE_PPA)
-
-#define PPA_INIT()
-#define PPAStartCpuEventFunc(e)
-#define PPAStopCpuEventFunc(e)
-#define PPAScopeEvent(e)
-
-#else
+#ifndef PPA_H
+#define PPA_H
/* declare enum list of users CPU events */
#define PPA_REGISTER_CPU_EVENT(x) x,
@@ -40,32 +31,13 @@ enum PPACpuEventEnum
#include "ppaCPUEvents.h"
PPACpuGroupNums
};
-
#undef PPA_REGISTER_CPU_EVENT
-#define PPA_INIT() initializePPA()
-#define PPAStartCpuEventFunc(e) if (ppabase) ppabase->triggerStartEvent(ppabase->getEventId(e))
-#define PPAStopCpuEventFunc(e) if (ppabase) ppabase->triggerEndEvent(ppabase->getEventId(e))
-#define PPAScopeEvent(e) _PPAScope __scope_(e)
-
#include "ppaApi.h"
void initializePPA();
-extern ppa::Base *ppabase;
-class _PPAScope
-{
-protected:
+#define PPA_INIT() initializePPA()
+#define PPAScopeEvent(e) ppa::ProfileScope ppaScope_(e)
- ppa::EventID m_id;
-
-public:
-
- _PPAScope(int e) { if (ppabase) { m_id = ppabase->getEventId(e); ppabase->triggerStartEvent(m_id); } else m_id = 0; }
-
- ~_PPAScope() { if (ppabase) ppabase->triggerEndEvent(m_id); }
-};
-
-#endif // if !defined(ENABLE_PPA)
-
-#endif /* _PPA_H_ */
+#endif /* PPA_H */
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppaApi.h
--- a/source/PPA/ppaApi.h Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppaApi.h Wed Dec 17 14:31:50 2014 -0600
@@ -54,6 +54,17 @@ protected:
virtual void init(const char **pNames, int eventCount) = 0;
};
+
+extern ppa::Base *ppabase;
+
+struct ProfileScope
+{
+ ppa::EventID id;
+
+ ProfileScope(int e) { if (ppabase) { id = ppabase->getEventId(e); ppabase->triggerStartEvent(id); } else id = 0; }
+ ~ProfileScope() { if (ppabase) ppabase->triggerEndEvent(id); }
+};
+
}
#endif //_PPA_API_H_
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppaCPUEvents.h
--- a/source/PPA/ppaCPUEvents.h Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppaCPUEvents.h Wed Dec 17 14:31:50 2014 -0600
@@ -1,25 +1,6 @@
-PPA_REGISTER_CPU_EVENT(encode_block)
-PPA_REGISTER_CPU_EVENT(bitstream_write)
-PPA_REGISTER_CPU_EVENT(DPB_prepareEncode)
-PPA_REGISTER_CPU_EVENT(FrameEncoder_compressFrame)
-PPA_REGISTER_CPU_EVENT(FrameEncoder_compressRows)
-PPA_REGISTER_CPU_EVENT(CompressCU)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth1)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth2)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth3)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth4)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth1)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth2)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth3)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth4)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth1)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth2)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth3)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth4)
-PPA_REGISTER_CPU_EVENT(CalcRDCostIntra)
-PPA_REGISTER_CPU_EVENT(Thread_ProcessRow)
-PPA_REGISTER_CPU_EVENT(Thread_compressCU)
-PPA_REGISTER_CPU_EVENT(Thread_encodeCU)
-PPA_REGISTER_CPU_EVENT(Thread_filterCU)
+PPA_REGISTER_CPU_EVENT(bitstreamWrite)
+PPA_REGISTER_CPU_EVENT(frameThread)
+PPA_REGISTER_CPU_EVENT(encodeCTU)
+PPA_REGISTER_CPU_EVENT(filterCTURow)
+PPA_REGISTER_CPU_EVENT(slicetypeDecideEV)
+PPA_REGISTER_CPU_EVENT(costEstimateRow)
diff -r ee36b6311aaf -r 78ae7996a1ce source/common/common.h
--- a/source/common/common.h Sat Dec 13 00:24:11 2014 -0600
+++ b/source/common/common.h Wed Dec 17 14:31:50 2014 -0600
@@ -41,6 +41,15 @@
#include "x265.h"
+#if ENABLE_PPA
+#include "PPA/ppa.h"
+#define ProfileScopeEvent(x) PPAScopeEvent(x)
+#define PROFILE_INIT() PPA_INIT()
+#else
+#define ProfileScopeEvent(x)
+#define PROFILE_INIT()
+#endif
+
#define FENC_STRIDE 64
#define NUM_INTRA_MODE 35
diff -r ee36b6311aaf -r 78ae7996a1ce source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Sat Dec 13 00:24:11 2014 -0600
+++ b/source/common/x86/asm-primitives.cpp Wed Dec 17 14:31:50 2014 -0600
@@ -1875,6 +1875,8 @@ void Setup_Assembly_Primitives(EncoderPr
p.chroma[X265_CSP_I420].filter_hpp[CHROMA_8x8] = x265_interp_4tap_horiz_pp_8x8_avx2;
p.chroma[X265_CSP_I420].filter_hpp[CHROMA_4x4] = x265_interp_4tap_horiz_pp_4x4_avx2;
+ p.chroma[X265_CSP_I420].filter_hpp[CHROMA_32x32] = x265_interp_4tap_horiz_pp_32x32_avx2;
+ p.chroma[X265_CSP_I420].filter_hpp[CHROMA_16x16] = x265_interp_4tap_horiz_pp_16x16_avx2;
p.luma_vpp[LUMA_4x4] = x265_interp_8tap_vert_pp_4x4_avx2;
diff -r ee36b6311aaf -r 78ae7996a1ce source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm Sat Dec 13 00:24:11 2014 -0600
+++ b/source/common/x86/ipfilter8.asm Wed Dec 17 14:31:50 2014 -0600
@@ -179,6 +179,10 @@ tab_c_64_n64: times 8 db 64, -64
const interp4_shuf, times 2 db 0, 1, 8, 9, 4, 5, 12, 13, 2, 3, 10, 11, 6, 7, 14, 15
+ALIGN 32
+interp4_horiz_shuf1: db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
+ db 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14
+
SECTION .text
cextern pb_128
@@ -1451,6 +1455,122 @@ cglobal interp_4tap_horiz_pp_4x4, 4,6,6
pextrd [r2+r0], xm3, 3
RET
+INIT_YMM avx2
+cglobal interp_4tap_horiz_pp_32x32, 4,6,7
+ mov r4d, r4m
+
+%ifdef PIC
+ lea r5, [tab_ChromaCoeff]
+ vpbroadcastd m0, [r5 + r4 * 4]
+%else
+ vpbroadcastd m0, [tab_ChromaCoeff + r4 * 4]
+%endif
+
+ mova m1, [interp4_horiz_shuf1]
+ vpbroadcastd m2, [pw_1]
+ mova m6, [pw_512]
+ ; register map
+ ; m0 - interpolate coeff
+ ; m1 - shuffle order table
+ ; m2 - constant word 1
+
+ dec r0
+ mov r4d, 32
+
+.loop:
+ ; Row 0
+ vbroadcasti128 m3, [r0] ; [x x x x x A 9 8 7 6 5 4 3 2 1 0]
+ pshufb m3, m1
+ pmaddubsw m3, m0
+ pmaddwd m3, m2
+ vbroadcasti128 m4, [r0 + 4]
+ pshufb m4, m1
+ pmaddubsw m4, m0
+ pmaddwd m4, m2
+ packssdw m3, m4
+ pmulhrsw m3, m6
+
+ vbroadcasti128 m4, [r0 + 16]
+ pshufb m4, m1
+ pmaddubsw m4, m0
+ pmaddwd m4, m2
+ vbroadcasti128 m5, [r0 + 20]
+ pshufb m5, m1
+ pmaddubsw m5, m0
+ pmaddwd m5, m2
+ packssdw m4, m5
+ pmulhrsw m4, m6
+
+ packuswb m3, m4
+ vpermq m3, m3, 11011000b
+
+ movu [r2], m3
+ lea r2, [r2 + r3]
+ lea r0, [r0 + r1]
+ dec r4d
+ jnz .loop
+ RET
+
+
+INIT_YMM avx2
+cglobal interp_4tap_horiz_pp_16x16, 4, 6, 7
+ mov r4d, r4m
+
+%ifdef PIC
+ lea r5, [tab_ChromaCoeff]
+ vpbroadcastd m0, [r5 + r4 * 4]
+%else
+ vpbroadcastd m0, [tab_ChromaCoeff + r4 * 4]
+%endif
+
+ mova m6, [pw_512]
+ mova m1, [interp4_horiz_shuf1]
+ vpbroadcastd m2, [pw_1]
+
+ ; register map
+ ; m0 - interpolate coeff
+ ; m1 - shuffle order table
+ ; m2 - constant word 1
+
+ dec r0
+ mov r4d, 8
+
+.loop:
+ ; Row 0
+ vbroadcasti128 m3, [r0] ; [x x x x x A 9 8 7 6 5 4 3 2 1 0]
+ pshufb m3, m1
+ pmaddubsw m3, m0
+ pmaddwd m3, m2
+ vbroadcasti128 m4, [r0 + 4] ; [x x x x x A 9 8 7 6 5 4 3 2 1 0]
+ pshufb m4, m1
+ pmaddubsw m4, m0
More information about the x265-commits
mailing list