[x265] [PATCH] perf: Enabling lookahead-slices for all presets except veryslow & placebo
Pradeep Ramachandran
pradeep at multicorewareinc.com
Thu Nov 5 05:41:46 CET 2015
# HG changeset patch
# User Pradeep Ramachandran <pradeep at multicorewareinc.com>
# Date 1446695802 -19800
# Thu Nov 05 09:26:42 2015 +0530
# Node ID 37fb5c10bfce78c7af302f644032f0d453c3158c
# Parent 3103afbd31fa9b26533f06202516a511ee221439
perf: Enabling lookahead-slices for all presets except veryslow & placebo
Seeing ~10% performance on the faster presets on skylake, and ~2X performance
on Xeon systems in ultrafast setting. Performance improvement is through a
considerable increase in utilization across the board. Disabling slicing for
videos of resoultion < 720p to limit impact on quality.
Commit will change outputs, but reduction in quality (measured by PSNR or SSIM)
is <0.01% across a wide variety of presets and runs.
diff -r 3103afbd31fa -r 37fb5c10bfce doc/reST/cli.rst
--- a/doc/reST/cli.rst Thu Nov 05 06:13:51 2015 +0530
+++ b/doc/reST/cli.rst Thu Nov 05 09:26:42 2015 +0530
@@ -1133,21 +1133,31 @@
.. option:: --lookahead-slices <0..16>
- Use multiple worker threads to measure the estimated cost of each
- frame within the lookahead. When :option:`--b-adapt` is 2, most
- frame cost estimates will be performed in batch mode, many cost
- estimates at the same time, and lookahead-slices is ignored for
- batched estimates. The effect on performance can be quite small.
- The higher this parameter, the less accurate the frame costs will be
- (since context is lost across slice boundaries) which will result in
- less accurate B-frame and scene-cut decisions.
+ Use multiple worker threads to measure the estimated cost of each frame
+ within the lookahead. The frame is divided into the specified number of
+ slices, and one-thread is launched per slice. When :option:`--b-adapt` is
+ 2, most frame cost estimates will be performed in batch mode (many cost
+ estimates at the same time) and lookahead-slices is ignored for batched
+ estimates; it may still be used for single cost estimations. The higher this
+ parameter, the less accurate the frame costs will be (since context is lost
+ across slice boundaries) which will result in less accurate B-frame and
+ scene-cut decisions. The effect on performance can be significant especially
+ on systems with many threads.
- The encoder may internally lower the number of slices to ensure
- each slice codes at least 10 16x16 rows of lowres blocks. If slices
- are used in lookahead, they are logged in the list of tools as
- *lslices*.
-
- **Values:** 0 - disabled (default). 1 is the same as 0. Max 16
+ The encoder may internally lower the number of slices or disable
+ slicing to ensure each slice codes at least 10 16x16 rows of lowres
+ blocks to minimize the impact on quality. For example, for 720p and
+ 1080p videos, the number of slices is capped to 4 and 6, respectively.
+ For resolutions lesser than 720p, slicing is auto-disabled.
+
+ If slices are used in lookahead, they are logged in the list of tools
+ as *lslices*
+
+ **Values:** 0 - disabled. 1 is the same as 0. Max 16.
+ Default: 8 for ultrafast, superfast, faster, fast, medium
+ 4 for slow, slower
+ disabled for veryslow, slower
+
.. option:: --b-adapt <integer>
diff -r 3103afbd31fa -r 37fb5c10bfce doc/reST/presets.rst
--- a/doc/reST/presets.rst Thu Nov 05 06:13:51 2015 +0530
+++ b/doc/reST/presets.rst Thu Nov 05 09:26:42 2015 +0530
@@ -19,61 +19,63 @@
The presets adjust encoder parameters to affect these trade-offs.
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
-+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
-| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| ref | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
++=====================+===========+===========+==========+========+======+========+======+========+==========+=========+
+| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| ref | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| lookahead-slices | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
Placebo mode enables transform-skip prediction evaluation.
diff -r 3103afbd31fa -r 37fb5c10bfce source/common/param.cpp
--- a/source/common/param.cpp Thu Nov 05 06:13:51 2015 +0530
+++ b/source/common/param.cpp Thu Nov 05 09:26:42 2015 +0530
@@ -147,7 +147,7 @@
param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
param->bBPyramid = 1;
param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
- param->lookaheadSlices = 0;
+ param->lookaheadSlices = 8;
/* Intra Coding Tools */
param->bEnableConstrainedIntra = 0;
@@ -348,6 +348,7 @@
param->subpelRefine = 3;
param->maxNumMergeCand = 3;
param->searchMethod = X265_STAR_SEARCH;
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists
}
else if (!strcmp(preset, "slower"))
{
@@ -365,6 +366,7 @@
param->maxNumMergeCand = 3;
param->searchMethod = X265_STAR_SEARCH;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists
}
else if (!strcmp(preset, "veryslow"))
{
@@ -383,6 +385,7 @@
param->searchMethod = X265_STAR_SEARCH;
param->maxNumReferences = 5;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 0; // disabled for best quality
}
else if (!strcmp(preset, "placebo"))
{
@@ -404,6 +407,7 @@
param->maxNumReferences = 5;
param->rc.bEnableSlowFirstPass = 1;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 0; // disabled for best quality
// TODO: optimized esa
}
else
diff -r 3103afbd31fa -r 37fb5c10bfce source/encoder/slicetype.cpp
--- a/source/encoder/slicetype.cpp Thu Nov 05 06:13:51 2015 +0530
+++ b/source/encoder/slicetype.cpp Thu Nov 05 09:26:42 2015 +0530
@@ -516,7 +516,16 @@
m_bBatchFrameCosts = m_bBatchMotionSearch;
if (m_param->lookaheadSlices && !m_pool)
+ {
+ x265_log(param, X265_LOG_WARNING, "No pools found; disabling lookahead-slices\n");
m_param->lookaheadSlices = 0;
+ }
+
+ if (m_param->lookaheadSlices && (m_param->sourceHeight < 720))
+ {
+ x265_log(param, X265_LOG_WARNING, "Source height < 720p; disabling lookahead-slices\n");
+ m_param->lookaheadSlices = 0;
+ }
if (m_param->lookaheadSlices > 1)
{
More information about the x265-devel
mailing list