[x265-commits] [x265] doc: update doc about level/tier

Deepthi Nandakumar deepthi at multicorewareinc.com
Wed Feb 4 21:17:47 CET 2015


details:   http://hg.videolan.org/x265/rev/3aaa8b6242da
branches:  
changeset: 9281:3aaa8b6242da
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Feb 04 14:22:45 2015 +0530
description:
doc: update doc about level/tier
Subject: [x265] level: fix VPS uninitialized issue

details:   http://hg.videolan.org/x265/rev/19ce367e1cd1
branches:  
changeset: 9282:19ce367e1cd1
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Wed Feb 04 14:58:15 2015 +0530
description:
level: fix VPS uninitialized issue
Subject: [x265] c_model: correct weight_sp round parameters check

details:   http://hg.videolan.org/x265/rev/e9a324ddc5f0
branches:  
changeset: 9283:e9a324ddc5f0
user:      Min Chen <chenm003 at 163.com>
date:      Wed Feb 04 17:38:53 2015 +0800
description:
c_model: correct weight_sp round parameters check
Subject: [x265] blockcopy_pp_8x6: optimize register uses

details:   http://hg.videolan.org/x265/rev/30b8714cd7cd
branches:  
changeset: 9284:30b8714cd7cd
user:      Praveen Tiwari
date:      Tue Feb 03 17:14:55 2015 +0530
description:
blockcopy_pp_8x6: optimize register uses
Subject: [x265] blockcopy_pp_6x8 sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/f1c1be09c980
branches:  
changeset: 9285:f1c1be09c980
user:      Praveen Tiwari
date:      Tue Feb 03 14:30:42 2015 +0530
description:
blockcopy_pp_6x8 sse2 asm code optimization

improved, 248.67c -> 212.56c
Subject: [x265] blockcopy_pp_8x12: sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/914d08d8bf3f
branches:  
changeset: 9286:914d08d8bf3f
user:      Praveen Tiwari
date:      Tue Feb 03 17:55:38 2015 +0530
description:
blockcopy_pp_8x12: sse2 asm code optimization

improved, 235.05c -> 158.79c
Subject: [x265] blockcopy_pp_8x8: sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/136154409d8e
branches:  
changeset: 9287:136154409d8e
user:      Praveen Tiwari
date:      Tue Feb 03 18:09:35 2015 +0530
description:
blockcopy_pp_8x8: sse2 asm code optimization

improved, 127.71c -> 110.09c
Subject: [x265] blockcopy_pp_8x16: sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/712ffb4d0ee9
branches:  
changeset: 9288:712ffb4d0ee9
user:      Praveen Tiwari
date:      Tue Feb 03 18:16:26 2015 +0530
description:
blockcopy_pp_8x16: sse2 asm code optimization

improved, 210.34c -> 199.33c
Subject: [x265] blockcopy_pp_8x32: sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/1f0696400114
branches:  
changeset: 9289:1f0696400114
user:      Praveen Tiwari
date:      Tue Feb 03 18:26:22 2015 +0530
description:
blockcopy_pp_8x32: sse2 asm code optimization

improved, 394.92c -> 368.48c
Subject: [x265] blockcopy_pp_8x64: sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/31fc9b70fc46
branches:  
changeset: 9290:31fc9b70fc46
user:      Praveen Tiwari
date:      Tue Feb 03 18:37:14 2015 +0530
description:
blockcopy_pp_8x64: sse2 asm code optimization

improved, 800.38c -> 752.18c
Subject: [x265] encoder: Add support for temporal layering of the encoded bitstream.

details:   http://hg.videolan.org/x265/rev/272781048200
branches:  
changeset: 9291:272781048200
user:      Aarthi Thirumalai
date:      Tue Feb 03 16:21:21 2015 +0530
description:
encoder: Add support for temporal layering of the encoded bitstream.

Implements temporal sublayers, signaling a temporal id in NAL units. Output
bitstreams can be extracted either at the base temporal layer (layer 0) with
roughly half the frame rate (if bframes=3) or at a higher temporal layer (layer
1) that decodes all the frames in the sequence. The base layer contains all
referenced frames, the sublayer contains unreferenced frames.

This commit also makes vps.maxLatencyIncrease be consistent with
sps.maxLatencyIncrease [CHANGES OUTPUTS]
Subject: [x265] cli: add --[no-]temporal-layers

details:   http://hg.videolan.org/x265/rev/0384794799bc
branches:  
changeset: 9292:0384794799bc
user:      Aarthi Thirumalai
date:      Tue Feb 03 16:22:42 2015 +0530
description:
cli: add --[no-]temporal-layers
Subject: [x265] Make FrameEncoder partially virtual so it can be overloaded

details:   http://hg.videolan.org/x265/rev/c39d428cf7b6
branches:  
changeset: 9293:c39d428cf7b6
user:      Nicolas Morey-Chaisemartin <nmorey at kalray.eu>
date:      Tue Oct 28 10:50:26 2014 +0100
description:
Make FrameEncoder partially virtual so it can be overloaded

FrameEncoder is a logical place to use a hardware accelerator or an alternative core encoder.
By making a few function virtual, it can be easily replaced by an inheriting class, transparently for the Encoder.
This way all the RC/interaction code can stay within the FrameEncoder class and reduce conflicts in later updates.
---
 source/encoder/encoder.cpp    | 32 ++++++++++++++++++--------------
 source/encoder/encoder.h      |  2 +-
 source/encoder/frameencoder.h |  8 ++++----
 3 files changed, 23 insertions(+), 19 deletions(-)
Subject: [x265] improve rdoQuant by split path on different probability

details:   http://hg.videolan.org/x265/rev/ba1e85e58035
branches:  
changeset: 9294:ba1e85e58035
user:      Min Chen <chenm003 at 163.com>
date:      Wed Feb 04 17:38:48 2015 +0800
description:
improve rdoQuant by split path on different probability

diffstat:

 doc/reST/cli.rst                 |   17 +-
 source/CMakeLists.txt            |    2 +-
 source/common/param.cpp          |    2 +
 source/common/pixel.cpp          |    2 +-
 source/common/quant.cpp          |   62 ++++++-
 source/common/slice.h            |    5 +-
 source/common/x86/blockcopy8.asm |  348 ++++++++++++++++++++++++--------------
 source/encoder/dpb.cpp           |   44 +++-
 source/encoder/dpb.h             |    4 +-
 source/encoder/encoder.cpp       |   56 +++---
 source/encoder/encoder.h         |    2 +-
 source/encoder/entropy.cpp       |   74 +++++---
 source/encoder/entropy.h         |    2 +-
 source/encoder/frameencoder.h    |    8 +-
 source/encoder/level.cpp         |    3 +-
 source/encoder/nal.cpp           |    2 +-
 source/x265.h                    |    6 +
 source/x265cli.h                 |    7 +-
 18 files changed, 425 insertions(+), 221 deletions(-)

diffs (truncated from 1073 to 300 lines):

diff -r e11dd720557a -r ba1e85e58035 doc/reST/cli.rst
--- a/doc/reST/cli.rst	Tue Feb 03 21:15:08 2015 -0600
+++ b/doc/reST/cli.rst	Wed Feb 04 17:38:48 2015 +0800
@@ -411,7 +411,10 @@ Profile, Level, Tier
 	If :option:`--level-idc` has been specified, the option adds the
 	intention to support the High tier of that level. If your specified
 	level does not support a High tier, a warning is issued and this
-	modifier flag is ignored.
+	modifier flag is ignored. If :option:`--level-idc` has been specified,
+	but not --high-tier, then the encoder will attempt to encode at the 
+	specified level, main tier first, turning on high tier only if 
+	necessary and available at that level.
 
 .. note::
 	:option:`--profile`, :option:`--level-idc`, and
@@ -1359,6 +1362,18 @@ Bitstream options
 	Picture Timing SEI messages providing timing information to the
 	decoder. Default disabled
 
+.. option:: --temporal-layers,--no-temporal-layers
+
+	Enable a temporal sub layer. All referenced I/P/B frames are in the
+	base layer and all unreferenced B frames are placed in a temporal
+	sublayer. A decoder may chose to drop the sublayer and only decode
+	and display the base layer slices.
+	
+	If used with a fixed GOP (:option:`b-adapt` 0) and :option:`bframes`
+	3 then the two layers evenly split the frame rate, with a cadence of
+	PbBbP. You probably also want :option:`--no-scenecut` and a keyframe
+	interval that is a multiple of 4.
+
 .. option:: --aud, --no-aud
 
 	Emit an access unit delimiter NAL at the start of each slice access
diff -r e11dd720557a -r ba1e85e58035 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Tue Feb 03 21:15:08 2015 -0600
+++ b/source/CMakeLists.txt	Wed Feb 04 17:38:48 2015 +0800
@@ -21,7 +21,7 @@ include(CheckSymbolExists)
 include(CheckCXXCompilerFlag)
 
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 43)
+set(X265_BUILD 44)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r e11dd720557a -r ba1e85e58035 source/common/param.cpp
--- a/source/common/param.cpp	Tue Feb 03 21:15:08 2015 -0600
+++ b/source/common/param.cpp	Wed Feb 04 17:38:48 2015 +0800
@@ -181,6 +181,7 @@ void x265_param_default(x265_param *para
     param->bIntraInBFrames = 0;
     param->bLossless = 0;
     param->bCULossless = 0;
+    param->bEnableTemporalSubLayers = 0;
 
     /* Rate control options */
     param->rc.vbvMaxBitrate = 0;
@@ -605,6 +606,7 @@ int x265_param_parse(x265_param *p, cons
             p->scenecutThreshold = atoi(value);
         }
     }
+    OPT("temporal-layers") p->bEnableTemporalSubLayers = atobool(value);
     OPT("keyint") p->keyframeMax = atoi(value);
     OPT("min-keyint") p->keyframeMin = atoi(value);
     OPT("rc-lookahead") p->lookaheadDepth = atoi(value);
diff -r e11dd720557a -r ba1e85e58035 source/common/pixel.cpp
--- a/source/common/pixel.cpp	Tue Feb 03 21:15:08 2015 -0600
+++ b/source/common/pixel.cpp	Wed Feb 04 17:38:48 2015 +0800
@@ -527,7 +527,7 @@ void weight_sp_c(const int16_t* src, pix
     X265_CHECK(!((w0 << 6) > 32767), "w0 using more than 16 bits, asm output will mismatch\n");
     X265_CHECK(!(round > 32767), "round using more than 16 bits, asm output will mismatch\n");
     X265_CHECK((shift >= correction), "shift must be include factor correction, please update ASM ABI\n");
-    X265_CHECK(!(round & ((1 << correction) - 1)), "round must be include factor correction, please update ASM ABI\n");
+    X265_CHECK(!(round & ((1 << (correction - 1)) - 1)), "round must be include factor correction, please update ASM ABI\n");
 
     for (y = 0; y <= height - 1; y++)
     {
diff -r e11dd720557a -r ba1e85e58035 source/common/quant.cpp
--- a/source/common/quant.cpp	Tue Feb 03 21:15:08 2015 -0600
+++ b/source/common/quant.cpp	Wed Feb 04 17:38:48 2015 +0800
@@ -50,7 +50,7 @@ inline int fastMin(int x, int y)
     return y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
 }
 
-inline int getICRate(uint32_t absLevel, int32_t diffLevel, const int* greaterOneBits, const int* levelAbsBits, uint32_t absGoRice, uint32_t c1c2Idx)
+inline int getICRate(uint32_t absLevel, int32_t diffLevel, const int* greaterOneBits, const int* levelAbsBits, const uint32_t absGoRice, const uint32_t maxVlc, uint32_t c1c2Idx)
 {
     X265_CHECK(c1c2Idx <= 3, "c1c2Idx check failure\n");
     X265_CHECK(absGoRice <= 4, "absGoRice check failure\n");
@@ -72,7 +72,6 @@ inline int getICRate(uint32_t absLevel, 
     else
     {
         uint32_t symbol = diffLevel;
-        const uint32_t maxVlc = g_goRiceRange[absGoRice];
         bool expGolomb = (symbol > maxVlc);
 
         if (expGolomb)
@@ -105,6 +104,25 @@ inline int getICRate(uint32_t absLevel, 
     return rate;
 }
 
+inline int getICRateLessVlc(uint32_t absLevel, int32_t diffLevel, const uint32_t absGoRice)
+{
+    X265_CHECK(absGoRice <= 4, "absGoRice check failure\n");
+    if (!absLevel)
+    {
+        X265_CHECK(diffLevel < 0, "diffLevel check failure\n");
+        return 0;
+    }
+    int rate;
+
+    uint32_t symbol = diffLevel;
+    uint32_t prefLen = (symbol >> absGoRice) + 1;
+    uint32_t numBins = fastMin(prefLen + absGoRice, 8 /* g_goRicePrefixLen[absGoRice] + absGoRice */);
+
+    rate = numBins << 15;
+
+    return rate;
+}
+
 /* Calculates the cost for specific absolute transform level */
 inline uint32_t getICRateCost(uint32_t absLevel, int32_t diffLevel, const int* greaterOneBits, const int* levelAbsBits, uint32_t absGoRice, uint32_t c1c2Idx)
 {
@@ -674,9 +692,43 @@ uint32_t Quant::rdoQuant(const CUData& c
                 /* record costs for sign-hiding performed at the end */
                 if (level)
                 {
-                    int rateNow = getICRate(level, level - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx);
-                    rateIncUp[blkPos] = getICRate(level + 1, level + 1 - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx) - rateNow;
-                    rateIncDown[blkPos] = getICRate(level - 1, level - 1 - baseLevel, greaterOneBits, levelAbsBits, goRiceParam, c1c2Idx) - rateNow;
+                    const int32_t diff0 = level - 1 - baseLevel;
+                    const int32_t diff2 = level + 1 - baseLevel;
+                    const int32_t maxVlc = g_goRiceRange[goRiceParam];
+                    int rate0, rate1, rate2;
+
+                    if (diff0 < -2)  // prob (92.9, 86.5, 74.5)%
+                    {
+                        // NOTE: Min: L - 1 - {1,2,1,3} < -2 ==> L < {0,1,0,2}
+                        //            additional L > 0, so I got (L > 0 && L < 2) ==> L = 1
+                        X265_CHECK(level == 1, "absLevel check failure\n");
+
+                        const int rateEqual2 = greaterOneBits[1] + levelAbsBits[0];;
+                        const int rateNotEqual2 = greaterOneBits[0];
+
+                        rate0 = 0;
+                        rate2 = rateEqual2;
+                        rate1 = rateNotEqual2;
+
+                        X265_CHECK(rate1 == getICRateNegDiff(level + 0, greaterOneBits, levelAbsBits), "rate1 check failure!\n");
+                        X265_CHECK(rate2 == getICRateNegDiff(level + 1, greaterOneBits, levelAbsBits), "rate1 check failure!\n");
+                        X265_CHECK(rate0 == getICRateNegDiff(level - 1, greaterOneBits, levelAbsBits), "rate1 check failure!\n");
+                    }
+                    else if (diff0 >= 0 && diff2 <= maxVlc)     // prob except from above path (98.6, 97.9, 96.9)%
+                    {
+                        // NOTE: no c1c2 correct rate since all of rate include this factor
+                        rate1 = getICRateLessVlc(level + 0, diff0 + 1, goRiceParam);
+                        rate2 = getICRateLessVlc(level + 1, diff0 + 2, goRiceParam);
+                        rate0 = getICRateLessVlc(level - 1, diff0 + 0, goRiceParam);
+                    }
+                    else
+                    {
+                        rate1 = getICRate(level + 0, diff0 + 1, greaterOneBits, levelAbsBits, goRiceParam, maxVlc, c1c2Idx);
+                        rate2 = getICRate(level + 1, diff0 + 2, greaterOneBits, levelAbsBits, goRiceParam, maxVlc, c1c2Idx);
+                        rate0 = getICRate(level - 1, diff0 + 0, greaterOneBits, levelAbsBits, goRiceParam, maxVlc, c1c2Idx);
+                    }
+                    rateIncUp[blkPos] = rate2 - rate1;
+                    rateIncDown[blkPos] = rate0 - rate1;
                 }
                 else
                 {
diff -r e11dd720557a -r ba1e85e58035 source/common/slice.h
--- a/source/common/slice.h	Tue Feb 03 21:15:08 2015 -0600
+++ b/source/common/slice.h	Wed Feb 04 17:38:48 2015 +0800
@@ -149,8 +149,10 @@ struct TimingInfo
 
 struct VPS
 {
+    uint32_t         maxTempSubLayers;
     uint32_t         numReorderPics;
     uint32_t         maxDecPicBuffering;
+    uint32_t         maxLatencyIncrease;
     HRDInfo          hrdParameters;
     ProfileTierLevel ptl;
 };
@@ -228,9 +230,10 @@ struct SPS
     bool     bUseAMP; // use param
     uint32_t maxAMPDepth;
 
+    uint32_t maxTempSubLayers;   // max number of Temporal Sub layers
     uint32_t maxDecPicBuffering; // these are dups of VPS values
+    uint32_t maxLatencyIncrease;
     int      numReorderPics;
-    int      maxLatencyIncrease;
 
     bool     bUseStrongIntraSmoothing; // use param
     bool     bTemporalMVPEnabled;
diff -r e11dd720557a -r ba1e85e58035 source/common/x86/blockcopy8.asm
--- a/source/common/x86/blockcopy8.asm	Tue Feb 03 21:15:08 2015 -0600
+++ b/source/common/x86/blockcopy8.asm	Wed Feb 04 17:38:48 2015 +0800
@@ -224,65 +224,51 @@ BLOCKCOPY_PP_W4_H8 4, 32
 ; void blockcopy_pp_6x8(pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride)
 ;-----------------------------------------------------------------------------
 INIT_XMM sse2
-cglobal blockcopy_pp_6x8, 4, 7, 8
-
-    movd     m0,     [r2]
-    movd     m1,     [r2 + r3]
-    movd     m2,     [r2 + 2 * r3]
-    lea      r5,     [r2 + 2 * r3]
-    movd     m3,     [r5 + r3]
-
-    movd     m4,     [r5 + 2 * r3]
-    lea      r5,     [r5 + 2 * r3]
-    movd     m5,     [r5 + r3]
-    movd     m6,     [r5 + 2 * r3]
-    lea      r5,     [r5 + 2 * r3]
-    movd     m7,     [r5 + r3]
-
-    movd     [r0],                m0
-    movd     [r0 + r1],           m1
-    movd     [r0 + 2 * r1],       m2
-    lea      r6,                  [r0 + 2 * r1]
-    movd     [r6 + r1],           m3
-
-    movd     [r6 + 2 * r1],        m4
-    lea      r6,                   [r6 + 2 * r1]
-    movd     [r6 + r1],            m5
-    movd     [r6 + 2 * r1],        m6
-    lea      r6,                   [r6 + 2 * r1]
-    movd     [r6 + r1],            m7
-
-    mov     r4w,     [r2 + 4]
-    mov     r5w,     [r2 + r3 + 4]
-    mov     r6w,     [r2 + 2 * r3 + 4]
-
-    mov     [r0 + 4],            r4w
-    mov     [r0 + r1 + 4],       r5w
-    mov     [r0 + 2 * r1 + 4],   r6w
-
-    lea     r0,              [r0 + 2 * r1]
-    lea     r2,              [r2 + 2 * r3]
-
-    mov     r4w,             [r2 + r3 + 4]
-    mov     r5w,             [r2 + 2 * r3 + 4]
-
-    mov     [r0 + r1 + 4],       r4w
-    mov     [r0 + 2 * r1 + 4],   r5w
-
-    lea     r0,              [r0 + 2 * r1]
-    lea     r2,              [r2 + 2 * r3]
-
-    mov     r4w,             [r2 + r3 + 4]
-    mov     r5w,             [r2 + 2 * r3 + 4]
-
-    mov     [r0 + r1 + 4],       r4w
-    mov     [r0 + 2 * r1 + 4],   r5w
-
-    lea     r0,              [r0 + 2 * r1]
-    lea     r2,              [r2 + 2 * r3]
-
-    mov     r4w,             [r2 + r3 + 4]
-    mov     [r0 + r1 + 4],       r4w
+cglobal blockcopy_pp_6x8, 4, 7, 3
+
+    movd     m0,  [r2]
+    mov      r4w, [r2 + 4]
+    movd     m1,  [r2 + r3]
+    mov      r5w, [r2 + r3 + 4]
+    movd     m2,  [r2 + 2 * r3]
+    mov      r6w, [r2 + 2 * r3 + 4]
+
+    movd     [r0],              m0
+    mov      [r0 + 4],          r4w
+    movd     [r0 + r1],         m1
+    mov      [r0 + r1 + 4],     r5w
+    movd     [r0 + 2 * r1],     m2
+    mov      [r0 + 2 * r1 + 4], r6w
+
+    lea      r2,  [r2 + 2 * r3]
+    movd     m0,  [r2 + r3]
+    mov      r4w, [r2 + r3 + 4]
+    movd     m1,  [r2 + 2 * r3]
+    mov      r5w, [r2 + 2 * r3 + 4]
+    lea      r2,  [r2 + 2 * r3]
+    movd     m2,  [r2 + r3]
+    mov      r6w, [r2 + r3 + 4]
+
+    lea      r0,                [r0 + 2 * r1]
+    movd     [r0 + r1],         m0
+    mov      [r0 + r1 + 4],     r4w
+    movd     [r0 + 2 * r1],     m1
+    mov      [r0 + 2 * r1 + 4], r5w
+    lea      r0,                [r0 + 2 * r1]
+    movd     [r0 + r1],         m2
+    mov      [r0 + r1 + 4],     r6w
+
+    lea      r2,                [r2 + 2 * r3]
+    movd     m0,                [r2]
+    mov      r4w,               [r2 + 4]
+    movd     m1,                [r2 + r3]


More information about the x265-commits mailing list