[x265-commits] [x265] encoder: combine create() and init() functions

Steve Borho steve at borho.org
Thu Dec 18 22:54:20 CET 2014


details:   http://hg.videolan.org/x265/rev/6ba7be7b1697
branches:  
changeset: 8987:6ba7be7b1697
user:      Steve Borho <steve at borho.org>
date:      Sat Dec 13 01:03:19 2014 -0600
description:
encoder: combine create() and init() functions

They were always called back-to-back() and their functionality was non-distinct.
It also now checks for abort errors at startup and returns a NULL from the
encoder open function (early aborts are usually malloc failures)
Subject: [x265] asm: chroma_hpp[32x32] for colorspace i420 in avx2 improve 6189c->3537c

details:   http://hg.videolan.org/x265/rev/619c0e654f5b
branches:  
changeset: 8988:619c0e654f5b
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Tue Dec 16 09:31:26 2014 +0530
description:
asm: chroma_hpp[32x32] for colorspace i420 in avx2 improve 6189c->3537c
Subject: [x265] asm: chroma_hpp[16x16] for colorspace i420 in avx2 improve 1540c->969c

details:   http://hg.videolan.org/x265/rev/775ebb4694ad
branches:  
changeset: 8989:775ebb4694ad
user:      Aasaipriya Chandran <aasaipriya at multicorewareinc.com>
date:      Tue Dec 16 09:40:00 2014 +0530
description:
asm: chroma_hpp[16x16] for colorspace i420 in avx2 improve 1540c->969c
Subject: [x265] fix: output wrong WppEntryOffset when emulating start code at end of WPP row

details:   http://hg.videolan.org/x265/rev/295d033cb091
branches:  
changeset: 8990:295d033cb091
user:      Min Chen <chenm003 at 163.com>
date:      Tue Dec 16 15:53:14 2014 -0800
description:
fix: output wrong WppEntryOffset when emulating start code at end of WPP row
Subject: [x265] doc: improve documentation for --stats and multi-pass in general

details:   http://hg.videolan.org/x265/rev/42fb030a4c43
branches:  
changeset: 8991:42fb030a4c43
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 17 13:16:48 2014 -0600
description:
doc: improve documentation for --stats and multi-pass in general
Subject: [x265] ppa: minimize code foot-print of profiling events

details:   http://hg.videolan.org/x265/rev/3315d6c0ced1
branches:  
changeset: 8992:3315d6c0ced1
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 17 13:28:38 2014 -0600
description:
ppa: minimize code foot-print of profiling events

This will allow us to add support for more profiling systems without littering
the code
Subject: [x265] ppa: simplify interfaces, enforce coding style

details:   http://hg.videolan.org/x265/rev/952a2a361fcb
branches:  
changeset: 8993:952a2a361fcb
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 17 13:42:35 2014 -0600
description:
ppa: simplify interfaces, enforce coding style
Subject: [x265] ppa: refine event names

details:   http://hg.videolan.org/x265/rev/6cbd7d26b2a1
branches:  
changeset: 8994:6cbd7d26b2a1
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 17 13:54:42 2014 -0600
description:
ppa: refine event names

Drop the unused names, remove uninteresting events.  Try to cover the main
thread pool tasks and the frame encoder times.
Subject: [x265] ppa: emit one event per CTU for more clarity, disable frame threads events

details:   http://hg.videolan.org/x265/rev/78ae7996a1ce
branches:  
changeset: 8995:78ae7996a1ce
user:      Steve Borho <steve at borho.org>
date:      Wed Dec 17 14:31:50 2014 -0600
description:
ppa: emit one event per CTU for more clarity, disable frame threads events

The frame threads are generally uninteresting when WPP is in use

diffstat:

 doc/reST/cli.rst                     |   11 ++-
 source/PPA/ppa.cpp                   |    4 +-
 source/PPA/ppa.h                     |   38 +---------
 source/PPA/ppaApi.h                  |   11 +++
 source/PPA/ppaCPUEvents.h            |   31 +-------
 source/common/common.h               |    9 ++
 source/common/x86/asm-primitives.cpp |    2 +
 source/common/x86/ipfilter8.asm      |  120 +++++++++++++++++++++++++++++++++++
 source/encoder/analysis.cpp          |    2 -
 source/encoder/api.cpp               |    6 +-
 source/encoder/encoder.cpp           |   78 +++++++++++-----------
 source/encoder/encoder.h             |    2 -
 source/encoder/frameencoder.cpp      |    9 +-
 source/encoder/framefilter.cpp       |    3 +-
 source/encoder/nal.cpp               |    6 +-
 source/encoder/slicetype.cpp         |    4 +
 source/x265.cpp                      |    5 +-
 source/x265.h                        |    5 +-
 18 files changed, 223 insertions(+), 123 deletions(-)

diffs (truncated from 638 to 300 lines):

diff -r ee36b6311aaf -r 78ae7996a1ce doc/reST/cli.rst
--- a/doc/reST/cli.rst	Sat Dec 13 00:24:11 2014 -0600
+++ b/doc/reST/cli.rst	Wed Dec 17 14:31:50 2014 -0600
@@ -956,7 +956,7 @@ Quality, rate control and rate distortio
 
 .. option:: --pass <integer>
 
-	Enable multipass rate control mode. Input is encoded multiple times,
+	Enable multi-pass rate control mode. Input is encoded multiple times,
 	storing the encoded information of each pass in a stats file from which
 	the consecutive pass tunes the qp of each frame to improve the quality
 	of the output. Default disabled
@@ -967,12 +967,17 @@ Quality, rate control and rate distortio
 
 	**Range of values:** 1 to 3
 
+.. option:: --stats <filename>
+
+	Specify file name of of the multi-pass stats file. If unspecified
+	the encoder will use x265_2pass.log
+
 .. option:: --slow-firstpass, --no-slow-firstpass
 
-	Enable a slow and more detailed first pass encode in Multipass rate
+	Enable a slow and more detailed first pass encode in multi-pass rate
 	control mode.  Speed of the first pass encode is slightly lesser and
 	quality midly improved when compared to the default settings in a
-	multipass encode. Default disabled (turbo mode enabled)
+	multi-pass encode. Default disabled (turbo mode enabled)
 
 	When **turbo** first pass is not disabled, these options are
 	set on the first pass to improve performance:
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppa.cpp
--- a/source/PPA/ppa.cpp	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppa.cpp	Wed Dec 17 14:31:50 2014 -0600
@@ -41,8 +41,10 @@ typedef ppa::Base *(FUNC_PPALibInit)(con
 typedef void (FUNC_PPALibRelease)(ppa::Base* &);
 }
 
+using namespace ppa;
+
 static FUNC_PPALibRelease *_pfuncPpaRelease;
-ppa::Base *ppabase;
+ppa::Base *ppa::ppabase;
 
 static void _ppaReleaseAtExit()
 {
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppa.h
--- a/source/PPA/ppa.h	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppa.h	Wed Dec 17 14:31:50 2014 -0600
@@ -21,17 +21,8 @@
  * For more information, contact us at license @ x265.com.
  *****************************************************************************/
 
-#ifndef _PPA_H_
-#define _PPA_H_
-
-#if !defined(ENABLE_PPA)
-
-#define PPA_INIT()
-#define PPAStartCpuEventFunc(e)
-#define PPAStopCpuEventFunc(e)
-#define PPAScopeEvent(e)
-
-#else
+#ifndef PPA_H
+#define PPA_H
 
 /* declare enum list of users CPU events */
 #define PPA_REGISTER_CPU_EVENT(x) x,
@@ -40,32 +31,13 @@ enum PPACpuEventEnum
 #include "ppaCPUEvents.h"
     PPACpuGroupNums
 };
-
 #undef PPA_REGISTER_CPU_EVENT
 
-#define PPA_INIT()               initializePPA()
-#define PPAStartCpuEventFunc(e)  if (ppabase) ppabase->triggerStartEvent(ppabase->getEventId(e))
-#define PPAStopCpuEventFunc(e)   if (ppabase) ppabase->triggerEndEvent(ppabase->getEventId(e))
-#define PPAScopeEvent(e)         _PPAScope __scope_(e)
-
 #include "ppaApi.h"
 
 void initializePPA();
-extern ppa::Base *ppabase;
 
-class _PPAScope
-{
-protected:
+#define PPA_INIT()               initializePPA()
+#define PPAScopeEvent(e)         ppa::ProfileScope ppaScope_(e)
 
-    ppa::EventID m_id;
-
-public:
-
-    _PPAScope(int e) { if (ppabase) { m_id = ppabase->getEventId(e); ppabase->triggerStartEvent(m_id); } else m_id = 0; }
-
-    ~_PPAScope()     { if (ppabase) ppabase->triggerEndEvent(m_id); }
-};
-
-#endif // if !defined(ENABLE_PPA)
-
-#endif /* _PPA_H_ */
+#endif /* PPA_H */
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppaApi.h
--- a/source/PPA/ppaApi.h	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppaApi.h	Wed Dec 17 14:31:50 2014 -0600
@@ -54,6 +54,17 @@ protected:
 
     virtual void init(const char **pNames, int eventCount) = 0;
 };
+
+extern ppa::Base *ppabase;
+
+struct ProfileScope
+{
+    ppa::EventID id;
+
+    ProfileScope(int e) { if (ppabase) { id = ppabase->getEventId(e); ppabase->triggerStartEvent(id); } else id = 0; }
+    ~ProfileScope()     { if (ppabase) ppabase->triggerEndEvent(id); }
+};
+
 }
 
 #endif //_PPA_API_H_
diff -r ee36b6311aaf -r 78ae7996a1ce source/PPA/ppaCPUEvents.h
--- a/source/PPA/ppaCPUEvents.h	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/PPA/ppaCPUEvents.h	Wed Dec 17 14:31:50 2014 -0600
@@ -1,25 +1,6 @@
-PPA_REGISTER_CPU_EVENT(encode_block)
-PPA_REGISTER_CPU_EVENT(bitstream_write)
-PPA_REGISTER_CPU_EVENT(DPB_prepareEncode)
-PPA_REGISTER_CPU_EVENT(FrameEncoder_compressFrame)
-PPA_REGISTER_CPU_EVENT(FrameEncoder_compressRows)
-PPA_REGISTER_CPU_EVENT(CompressCU)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth1)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth2)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth3)
-PPA_REGISTER_CPU_EVENT(CompressCU_Depth4)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth1)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth2)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth3)
-PPA_REGISTER_CPU_EVENT(CompressIntraCU_Depth4)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth1)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth2)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth3)
-PPA_REGISTER_CPU_EVENT(CheckRDCostIntra_Depth4)
-PPA_REGISTER_CPU_EVENT(CalcRDCostIntra)
-PPA_REGISTER_CPU_EVENT(Thread_ProcessRow)
-PPA_REGISTER_CPU_EVENT(Thread_compressCU)
-PPA_REGISTER_CPU_EVENT(Thread_encodeCU)
-PPA_REGISTER_CPU_EVENT(Thread_filterCU)
+PPA_REGISTER_CPU_EVENT(bitstreamWrite)
+PPA_REGISTER_CPU_EVENT(frameThread)
+PPA_REGISTER_CPU_EVENT(encodeCTU)
+PPA_REGISTER_CPU_EVENT(filterCTURow)
+PPA_REGISTER_CPU_EVENT(slicetypeDecideEV)
+PPA_REGISTER_CPU_EVENT(costEstimateRow)
diff -r ee36b6311aaf -r 78ae7996a1ce source/common/common.h
--- a/source/common/common.h	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/common/common.h	Wed Dec 17 14:31:50 2014 -0600
@@ -41,6 +41,15 @@
 
 #include "x265.h"
 
+#if ENABLE_PPA
+#include "PPA/ppa.h"
+#define ProfileScopeEvent(x) PPAScopeEvent(x)
+#define PROFILE_INIT()       PPA_INIT()
+#else
+#define ProfileScopeEvent(x)
+#define PROFILE_INIT()
+#endif
+
 #define FENC_STRIDE 64
 #define NUM_INTRA_MODE 35
 
diff -r ee36b6311aaf -r 78ae7996a1ce source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/common/x86/asm-primitives.cpp	Wed Dec 17 14:31:50 2014 -0600
@@ -1875,6 +1875,8 @@ void Setup_Assembly_Primitives(EncoderPr
 
         p.chroma[X265_CSP_I420].filter_hpp[CHROMA_8x8] = x265_interp_4tap_horiz_pp_8x8_avx2;
         p.chroma[X265_CSP_I420].filter_hpp[CHROMA_4x4] = x265_interp_4tap_horiz_pp_4x4_avx2;
+        p.chroma[X265_CSP_I420].filter_hpp[CHROMA_32x32] = x265_interp_4tap_horiz_pp_32x32_avx2;
+        p.chroma[X265_CSP_I420].filter_hpp[CHROMA_16x16] = x265_interp_4tap_horiz_pp_16x16_avx2;
 
         p.luma_vpp[LUMA_4x4] = x265_interp_8tap_vert_pp_4x4_avx2;
 
diff -r ee36b6311aaf -r 78ae7996a1ce source/common/x86/ipfilter8.asm
--- a/source/common/x86/ipfilter8.asm	Sat Dec 13 00:24:11 2014 -0600
+++ b/source/common/x86/ipfilter8.asm	Wed Dec 17 14:31:50 2014 -0600
@@ -179,6 +179,10 @@ tab_c_64_n64:   times 8 db 64, -64
 
 const interp4_shuf, times 2 db 0, 1, 8, 9, 4, 5, 12, 13, 2, 3, 10, 11, 6, 7, 14, 15
 
+ALIGN 32
+interp4_horiz_shuf1:    db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
+                        db 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14
+
 SECTION .text
 
 cextern pb_128
@@ -1451,6 +1455,122 @@ cglobal interp_4tap_horiz_pp_4x4, 4,6,6
     pextrd            [r2+r0],      xm3,     3
     RET
 
+INIT_YMM avx2
+cglobal interp_4tap_horiz_pp_32x32, 4,6,7
+    mov             r4d, r4m
+
+%ifdef PIC
+    lea               r5,           [tab_ChromaCoeff]
+    vpbroadcastd      m0,           [r5 + r4 * 4]
+%else
+    vpbroadcastd      m0,           [tab_ChromaCoeff + r4 * 4]
+%endif
+
+    mova              m1,           [interp4_horiz_shuf1]
+    vpbroadcastd      m2,           [pw_1]
+    mova              m6,           [pw_512]
+    ; register map
+    ; m0 - interpolate coeff
+    ; m1 - shuffle order table
+    ; m2 - constant word 1
+
+    dec               r0
+    mov               r4d,          32
+
+.loop:
+    ; Row 0
+    vbroadcasti128    m3,           [r0]                        ; [x x x x x A 9 8 7 6 5 4 3 2 1 0]
+    pshufb            m3,           m1
+    pmaddubsw         m3,           m0
+    pmaddwd           m3,           m2
+    vbroadcasti128    m4,           [r0 + 4]
+    pshufb            m4,           m1
+    pmaddubsw         m4,           m0
+    pmaddwd           m4,           m2
+    packssdw          m3,           m4
+    pmulhrsw          m3,           m6
+
+    vbroadcasti128    m4,           [r0 + 16]
+    pshufb            m4,           m1
+    pmaddubsw         m4,           m0
+    pmaddwd           m4,           m2
+    vbroadcasti128    m5,           [r0 + 20]
+    pshufb            m5,           m1
+    pmaddubsw         m5,           m0
+    pmaddwd           m5,           m2
+    packssdw          m4,           m5
+    pmulhrsw          m4,           m6
+
+    packuswb          m3,           m4
+    vpermq            m3,           m3,      11011000b
+
+    movu              [r2],         m3
+    lea               r2,           [r2 + r3]
+    lea               r0,           [r0 + r1]
+    dec               r4d
+    jnz               .loop
+    RET
+
+
+INIT_YMM avx2
+cglobal interp_4tap_horiz_pp_16x16, 4, 6, 7
+    mov               r4d,          r4m
+
+%ifdef PIC
+    lea               r5,           [tab_ChromaCoeff]
+    vpbroadcastd      m0,           [r5 + r4 * 4]
+%else
+    vpbroadcastd      m0,           [tab_ChromaCoeff + r4 * 4]
+%endif
+
+    mova              m6,           [pw_512]
+    mova              m1,           [interp4_horiz_shuf1]
+    vpbroadcastd      m2,           [pw_1]
+
+    ; register map
+    ; m0 - interpolate coeff
+    ; m1 - shuffle order table
+    ; m2 - constant word 1
+
+    dec               r0
+    mov               r4d,          8
+
+.loop:
+    ; Row 0
+    vbroadcasti128    m3,           [r0]                        ; [x x x x x A 9 8 7 6 5 4 3 2 1 0]
+    pshufb            m3,           m1
+    pmaddubsw         m3,           m0
+    pmaddwd           m3,           m2
+    vbroadcasti128    m4,           [r0 + 4]                    ; [x x x x x A 9 8 7 6 5 4 3 2 1 0]
+    pshufb            m4,           m1
+    pmaddubsw         m4,           m0


More information about the x265-commits mailing list