[x265-commits] [x265] slicetype: fix the BRef cost estimates in vbv lookahead.
Aarthi at videolan.org
Aarthi at videolan.org
Wed Feb 18 23:55:26 CET 2015
details: http://hg.videolan.org/x265/rev/359daecfbb47
branches: stable
changeset: 9366:359daecfbb47
user: Aarthi Thirumalai
date: Mon Feb 16 10:33:58 2015 +0530
description:
slicetype: fix the BRef cost estimates in vbv lookahead.
Subject: [x265] Merge with stable
details: http://hg.videolan.org/x265/rev/9a6849146225
branches:
changeset: 9367:9a6849146225
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Wed Feb 18 14:43:48 2015 +0530
description:
Merge with stable
Subject: [x265] rename variable g_maxFullDepth to g_unitSizeDepth, NUM_CU_PARTITIONS to NUM_4x4_PARTITIONS
details: http://hg.videolan.org/x265/rev/15ab013c56dd
branches:
changeset: 9368:15ab013c56dd
user: Santhoshini Sekar<santhoshini at multicorewareinc.com>
date: Mon Feb 16 14:28:19 2015 +0530
description:
rename variable g_maxFullDepth to g_unitSizeDepth, NUM_CU_PARTITIONS to NUM_4x4_PARTITIONS
for better clarity
Subject: [x265] asm-see: intra_pred_ang4_2, fix xmm register count
details: http://hg.videolan.org/x265/rev/c5e50d780f06
branches:
changeset: 9369:c5e50d780f06
user: Praveen Tiwari <praveen at multicorewareinc.com>
date: Wed Feb 18 15:35:58 2015 +0530
description:
asm-see: intra_pred_ang4_2, fix xmm register count
diffstat:
doc/reST/cli.rst | 190 +++++----
doc/reST/threading.rst | 11 +-
readme.rst | 14 +
source/CMakeLists.txt | 11 +-
source/common/bitstream.cpp | 2 +-
source/common/common.cpp | 4 +
source/common/common.h | 7 +-
source/common/constants.cpp | 2 +-
source/common/constants.h | 2 +-
source/common/cudata.cpp | 32 +-
source/common/cudata.h | 4 +-
source/common/ipfilter.cpp | 45 +-
source/common/param.cpp | 29 +-
source/common/picyuv.cpp | 6 +-
source/common/pixel.cpp | 2 +-
source/common/primitives.cpp | 1 +
source/common/primitives.h | 10 +-
source/common/quant.cpp | 78 ++++-
source/common/scalinglist.cpp | 2 +-
source/common/shortyuv.cpp | 6 +-
source/common/slice.cpp | 12 +-
source/common/slice.h | 11 +-
source/common/threading.h | 19 +-
source/common/x86/blockcopy8.asm | 693 +++++++++++++++++++++++---------------
source/common/x86/intrapred8.asm | 2 +-
source/encoder/analysis.cpp | 279 ++++++++------
source/encoder/analysis.h | 3 +-
source/encoder/api.cpp | 2 +-
source/encoder/dpb.cpp | 44 +-
source/encoder/dpb.h | 4 +-
source/encoder/encoder.cpp | 218 ++++++++++--
source/encoder/encoder.h | 2 +-
source/encoder/entropy.cpp | 166 ++++----
source/encoder/entropy.h | 6 +-
source/encoder/frameencoder.cpp | 19 +-
source/encoder/frameencoder.h | 11 +-
source/encoder/framefilter.cpp | 5 +
source/encoder/level.cpp | 25 +-
source/encoder/nal.cpp | 2 +-
source/encoder/search.cpp | 304 +++++++++-------
source/encoder/search.h | 93 +++++-
source/encoder/slicetype.cpp | 44 +-
source/encoder/slicetype.h | 17 +-
source/input/y4m.cpp | 58 +--
source/output/y4m.cpp | 8 -
source/output/yuv.cpp | 4 -
source/test/ipfilterharness.cpp | 73 ++++-
source/test/ipfilterharness.h | 4 +-
source/x265.h | 478 +++++++++++++-------------
source/x265cli.h | 12 +-
50 files changed, 1901 insertions(+), 1175 deletions(-)
diffs (truncated from 5631 to 300 lines):
diff -r 3ed2a4215e08 -r c5e50d780f06 doc/reST/cli.rst
--- a/doc/reST/cli.rst Mon Feb 16 18:26:29 2015 +0530
+++ b/doc/reST/cli.rst Wed Feb 18 15:35:58 2015 +0530
@@ -171,6 +171,8 @@ Performance Options
Over-allocation of frame threads will not improve performance, it
will generally just increase memory use.
+ **Values:** any value between 8 and 16. Default is 0, auto-detect
+
.. option:: --threads <integer>
Number of threads to allocate for the worker thread pool This pool
@@ -409,7 +411,17 @@ Profile, Level, Tier
If :option:`--level-idc` has been specified, the option adds the
intention to support the High tier of that level. If your specified
level does not support a High tier, a warning is issued and this
- modifier flag is ignored.
+ modifier flag is ignored. If :option:`--level-idc` has been specified,
+ but not --high-tier, then the encoder will attempt to encode at the
+ specified level, main tier first, turning on high tier only if
+ necessary and available at that level.
+
+.. option:: --ref <1..16>
+
+ Max number of L0 references to be allowed. This number has a linear
+ multiplier effect on the amount of work performed in motion search,
+ but will generally have a beneficial affect on compression and
+ distortion. Default 3
.. note::
:option:`--profile`, :option:`--level-idc`, and
@@ -494,14 +506,6 @@ the prediction quad-tree.
Measure full CU size (2Nx2N) merge candidates first; if no residual
is found the analysis is short circuited. Default disabled
-.. option:: --fast-cbf, --no-fast-cbf
-
- Short circuit analysis if a prediction is found that does not set
- the coded block flag (aka: no residual was encoded). It prevents
- the encoder from perhaps finding other predictions that also have no
- residual but require less signaling bits or have less distortion.
- Only applicable for RD levels 5 and 6. Default disabled
-
.. option:: --fast-intra, --no-fast-intra
Perform an initial scan of every fifth intra angular mode, then
@@ -526,14 +530,6 @@ the prediction quad-tree.
Only effective at RD levels 3 and above, which perform RDO mode
decisions.
-.. option:: --tskip, --no-tskip
-
- Enable evaluation of transform skip (bypass DCT but still use
- quantization) coding for 4x4 TU coded blocks.
-
- Only effective at RD levels 3 and above, which perform RDO mode
- decisions. Default disabled
-
.. option:: --tskip-fast, --no-tskip-fast
Only evaluate transform skip for NxN intra predictions (4x4 blocks).
@@ -593,9 +589,76 @@ as the residual quad-tree (RQT).
partitions, in which case a TU split is implied and thus the
residual quad-tree begins one layer below the CU quad-tree.
+.. option:: --nr-intra <integer>, --nr-inter <integer>
+
+ Noise reduction - an adaptive deadzone applied after DCT
+ (subtracting from DCT coefficients), before quantization. It does
+ no pixel-level filtering, doesn't cross DCT block boundaries, has no
+ overlap, The higher the strength value parameter, the more
+ aggressively it will reduce noise.
+
+ Enabling noise reduction will make outputs diverge between different
+ numbers of frame threads. Outputs will be deterministic but the
+ outputs of -F2 will no longer match the outputs of -F3, etc.
+
+ **Values:** any value in range of 0 to 2000. Default 0 (disabled).
+
+.. option:: --tskip, --no-tskip
+
+ Enable evaluation of transform skip (bypass DCT but still use
+ quantization) coding for 4x4 TU coded blocks.
+
+ Only effective at RD levels 3 and above, which perform RDO mode
+ decisions. Default disabled
+
+.. option:: --rdpenalty <0..2>
+
+ When set to 1, transform units of size 32x32 are given a 4x bit cost
+ penalty compared to smaller transform units, in intra coded CUs in P
+ or B slices.
+
+ When set to 2, transform units of size 32x32 are not even attempted,
+ unless otherwise required by the maximum recursion depth. For this
+ option to be effective with 32x32 intra CUs,
+ :option:`--tu-intra-depth` must be at least 2. For it to be
+ effective with 64x64 intra CUs, :option:`--tu-intra-depth` must be
+ at least 3.
+
+ Note that in HEVC an intra transform unit (a block of the residual
+ quad-tree) is also a prediction unit, meaning that the intra
+ prediction signal is generated for each TU block, the residual
+ subtracted and then coded. The coding unit simply provides the
+ prediction modes that will be used when predicting all of the
+ transform units within the CU. This means that when you prevent
+ 32x32 intra transform units, you are preventing 32x32 intra
+ predictions.
+
+ Default 0, disabled.
+
+ **Values:** 0:disabled 1:4x cost penalty 2:force splits
+
+.. option:: --max-tu-size <32|16|8|4>
+
+ Maximum TU size (width and height). The residual can be more
+ efficiently compressed by the DCT transform when the max TU size
+ is larger, but at the expense of more computation. Transform unit
+ quad-tree begins at the same depth of the coded tree unit, but if the
+ maximum TU size is smaller than the CU size then transform QT begins
+ at the depth of the max-tu-size. Default: 32.
+
Temporal / motion search options
================================
+.. option:: --max-merge <1..5>
+
+ Maximum number of neighbor (spatial and temporal) candidate blocks
+ that the encoder may consider for merging motion predictions. If a
+ merge candidate results in no residual, it is immediately selected
+ as a "skip". Otherwise the merge candidates are tested as part of
+ motion estimation when searching for the least cost inter option.
+ The max candidate number is encoded in the SPS and determines the
+ bit cost of signaling merge CUs. Default 2
+
.. option:: --me <integer|string>
Motion search method. Generally, the higher the number the harder
@@ -658,16 +721,6 @@ Temporal / motion search options
**Range of values:** an integer from 0 to 32768
-.. option:: --max-merge <1..5>
-
- Maximum number of neighbor (spatial and temporal) candidate blocks
- that the encoder may consider for merging motion predictions. If a
- merge candidate results in no residual, it is immediately selected
- as a "skip". Otherwise the merge candidates are tested as part of
- motion estimation when searching for the least cost inter option.
- The max candidate number is encoded in the SPS and determines the
- bit cost of signaling merge CUs. Default 2
-
.. option:: --temporal-mvp, --no-temporal-mvp
Enable temporal motion vector predictors in P and B slices.
@@ -704,32 +757,6 @@ Spatial/intra options
propagation of reference errors that may have resulted from lossy
signals. Default disabled
-.. option:: --rdpenalty <0..2>
-
- When set to 1, transform units of size 32x32 are given a 4x bit cost
- penalty compared to smaller transform units, in intra coded CUs in P
- or B slices.
-
- When set to 2, transform units of size 32x32 are not even attempted,
- unless otherwise required by the maximum recursion depth. For this
- option to be effective with 32x32 intra CUs,
- :option:`--tu-intra-depth` must be at least 2. For it to be
- effective with 64x64 intra CUs, :option:`--tu-intra-depth` must be
- at least 3.
-
- Note that in HEVC an intra transform unit (a block of the residual
- quad-tree) is also a prediction unit, meaning that the intra
- prediction signal is generated for each TU block, the residual
- subtracted and then coded. The coding unit simply provides the
- prediction modes that will be used when predicting all of the
- transform units within the CU. This means that when you prevent
- 32x32 intra transform units, you are preventing 32x32 intra
- predictions.
-
- Default 0, disabled.
-
- **Values:** 0:disabled 1:4x cost penalty 2:force splits
-
Psycho-visual options
=====================
@@ -874,13 +901,6 @@ Slice decision options
Use B-frames as references, when possible. Default enabled
-.. option:: --ref <1..16>
-
- Max number of L0 references to be allowed. This number has a linear
- multiplier effect on the amount of work performed in motion search,
- but will generally have a beneficial affect on compression and
- distortion. Default 3
-
Quality, rate control and rate distortion options
=================================================
@@ -990,20 +1010,6 @@ Quality, rate control and rate distortio
less bits. This tends to improve detail in the backgrounds of video
with less detail in areas of high motion. Default enabled
-.. option:: --nr-intra <integer>, --nr-inter <integer>
-
- Noise reduction - an adaptive deadzone applied after DCT
- (subtracting from DCT coefficients), before quantization. It does
- no pixel-level filtering, doesn't cross DCT block boundaries, has no
- overlap, The higher the strength value parameter, the more
- aggressively it will reduce noise.
-
- Enabling noise reduction will make outputs diverge between different
- numbers of frame threads. Outputs will be deterministic but the
- outputs of -F2 will no longer match the outputs of -F3, etc.
-
- **Values:** any value in range of 0 to 2000. Default 0 (disabled).
-
.. option:: --pass <integer>
Enable multi-pass rate control mode. Input is encoded multiple times,
@@ -1342,13 +1348,13 @@ Bitstream options
to keep the stream headers for you and you want keyframes to be
random access points. Default disabled
-.. option:: --info, --no-info
+.. option:: --aud, --no-aud
- Emit an informational SEI with the stream headers which describes
- the encoder version, build info, and encode parameters. This is very
- helpful for debugging purposes but encoding version numbers and
- build info could make your bitstreams diverge and interfere with
- regression testing. Default enabled
+ Emit an access unit delimiter NAL at the start of each slice access
+ unit. If :option:`--repeat-headers` is not enabled (indicating the
+ user will be writing headers manually at the start of the stream)
+ the very first AUD will be skipped since it cannot be placed at the
+ start of the access unit, where it belongs. Default disabled
.. option:: --hrd, --no-hrd
@@ -1357,13 +1363,13 @@ Bitstream options
Picture Timing SEI messages providing timing information to the
decoder. Default disabled
-.. option:: --aud, --no-aud
+.. option:: --info, --no-info
- Emit an access unit delimiter NAL at the start of each slice access
- unit. If :option:`--repeat-headers` is not enabled (indicating the
- user will be writing headers manually at the start of the stream)
- the very first AUD will be skipped since it cannot be placed at the
- start of the access unit, where it belongs. Default disabled
+ Emit an informational SEI with the stream headers which describes
+ the encoder version, build info, and encode parameters. This is very
+ helpful for debugging purposes but encoding version numbers and
+ build info could make your bitstreams diverge and interfere with
+ regression testing. Default enabled
.. option:: --hash <integer>
@@ -1375,6 +1381,18 @@ Bitstream options
2. CRC
3. Checksum
+.. option:: --temporal-layers,--no-temporal-layers
+
+ Enable a temporal sub layer. All referenced I/P/B frames are in the
+ base layer and all unreferenced B frames are placed in a temporal
+ sublayer. A decoder may chose to drop the sublayer and only decode
+ and display the base layer slices.
+
+ If used with a fixed GOP (:option:`b-adapt` 0) and :option:`bframes`
+ 3 then the two layers evenly split the frame rate, with a cadence of
+ PbBbP. You probably also want :option:`--no-scenecut` and a keyframe
+ interval that is a multiple of 4.
+
Debugging options
=================
diff -r 3ed2a4215e08 -r c5e50d780f06 doc/reST/threading.rst
--- a/doc/reST/threading.rst Mon Feb 16 18:26:29 2015 +0530
+++ b/doc/reST/threading.rst Wed Feb 18 15:35:58 2015 +0530
@@ -125,9 +125,14 @@ The second extenuating circumstance is t
for motion reference must be processed by the loop filters and the loop
filters cannot run until a full row has been encoded, and it must run a
full row behind the encode process so that the pixels below the row
-being filtered are available. When you add up all the row lags each
-frame ends up being 3 CTU rows behind its reference frames (the
-equivalent of 12 macroblock rows for x264)
+being filtered are available. On top of this, HEVC has two loop filters:
+deblocking and SAO, which must be run in series with a row lag between
+them. When you add up all the row lags each frame ends up being 3 CTU
+rows behind its reference frames (the equivalent of 12 macroblock rows
+for x264). And keep in mind the wave-front progression pattern; by the
+time the reference frame finishes the third row of CTUs, nearly half of
+the CTUs in the frame may be compressed (depending on the display aspect
+ratio).
More information about the x265-commits
mailing list