[x265] [PATCH] perf: Enabling lookahead-slices for all presets except veryslow & placebo
Pradeep Ramachandran
pradeep at multicorewareinc.com
Tue Nov 3 17:02:34 CET 2015
# HG changeset patch
# User Pradeep Ramachandran <pradeep at multicorewareinc.com>
# Date 1446566445 -19800
# Tue Nov 03 21:30:45 2015 +0530
# Node ID cdc2b132a66f97bab510313f614909df3089c2c7
# Parent 61396ea8096a9f75667ac01ae8b4bf02169d3b64
perf: Enabling lookahead-slices for all presets except veryslow & placebo
Seeing ~10% performance on the faster presets on skylake, and ~2X performance
on Xeon systems in ultrafast setting. Performance improvement is through a
considerable increase in utilization across the board. Disabling slicing for
videos of resoultion < 720p to limit impact on quality.
Commit will change outputs, but reduction in quality (measured by PSNR or SSIM)
is <0.01% across a wide variety of presets and runs.
diff -r 61396ea8096a -r cdc2b132a66f doc/reST/cli.rst
--- a/doc/reST/cli.rst Mon Oct 12 10:23:37 2015 +0800
+++ b/doc/reST/cli.rst Tue Nov 03 21:30:45 2015 +0530
@@ -1124,21 +1124,31 @@
.. option:: --lookahead-slices <0..16>
- Use multiple worker threads to measure the estimated cost of each
- frame within the lookahead. When :option:`--b-adapt` is 2, most
- frame cost estimates will be performed in batch mode, many cost
- estimates at the same time, and lookahead-slices is ignored for
- batched estimates. The effect on performance can be quite small.
- The higher this parameter, the less accurate the frame costs will be
- (since context is lost across slice boundaries) which will result in
- less accurate B-frame and scene-cut decisions.
+ Use multiple worker threads to measure the estimated cost of each frame
+ within the lookahead. The frame is divided into the specified number of
+ slices, and one-thread is launched per slice. When :option:`--b-adapt` is
+ 2, most frame cost estimates will be performed in batch mode (many cost
+ estimates at the same time) and lookahead-slices is ignored for batched
+ estimates; it may still be used for single cost estimations. The higher this
+ parameter, the less accurate the frame costs will be (since context is lost
+ across slice boundaries) which will result in less accurate B-frame and
+ scene-cut decisions. The effect on performance can be significant especially
+ on systems with many threads.
- The encoder may internally lower the number of slices to ensure
- each slice codes at least 10 16x16 rows of lowres blocks. If slices
- are used in lookahead, they are logged in the list of tools as
- *lslices*.
-
- **Values:** 0 - disabled (default). 1 is the same as 0. Max 16
+ The encoder may internally lower the number of slices or disable
+ slicing to ensure each slice codes at least 10 16x16 rows of lowres
+ blocks to minimize the impact on quality. For example, for 720p and
+ 1080p videos, the number of slices is capped to 4 and 6, respectively.
+ For resolutions lesser than 720p, slicing is auto-disabled.
+
+ If slices are used in lookahead, they are logged in the list of tools
+ as *lslices*
+
+ **Values:** 0 - disabled. 1 is the same as 0. Max 16.
+ Default: 8 for ultrafast, superfast, faster, fast, medium
+ 4 for slow, slower
+ disabled for veryslow, slower
+
.. option:: --b-adapt <integer>
diff -r 61396ea8096a -r cdc2b132a66f doc/reST/presets.rst
--- a/doc/reST/presets.rst Mon Oct 12 10:23:37 2015 +0800
+++ b/doc/reST/presets.rst Tue Nov 03 21:30:45 2015 +0530
@@ -19,61 +19,63 @@
The presets adjust encoder parameters to affect these trade-offs.
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
-+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
-| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| refs | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
++=================+===========+===========+==========+========+======+========+======+========+==========+=========+
+| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| refs | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| lookahead-slices | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | 0 | 0 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
Placebo mode enables transform-skip prediction evaluation.
diff -r 61396ea8096a -r cdc2b132a66f source/common/param.cpp
--- a/source/common/param.cpp Mon Oct 12 10:23:37 2015 +0800
+++ b/source/common/param.cpp Tue Nov 03 21:30:45 2015 +0530
@@ -147,7 +147,7 @@
param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
param->bBPyramid = 1;
param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
- param->lookaheadSlices = 0;
+ param->lookaheadSlices = 8;
/* Intra Coding Tools */
param->bEnableConstrainedIntra = 0;
@@ -347,6 +347,7 @@
param->subpelRefine = 3;
param->maxNumMergeCand = 3;
param->searchMethod = X265_STAR_SEARCH;
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists
}
else if (!strcmp(preset, "slower"))
{
@@ -364,6 +365,7 @@
param->maxNumMergeCand = 3;
param->searchMethod = X265_STAR_SEARCH;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists
}
else if (!strcmp(preset, "veryslow"))
{
@@ -382,6 +384,7 @@
param->searchMethod = X265_STAR_SEARCH;
param->maxNumReferences = 5;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 0; // disabled for best quality
}
else if (!strcmp(preset, "placebo"))
{
@@ -403,6 +406,7 @@
param->maxNumReferences = 5;
param->rc.bEnableSlowFirstPass = 1;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 0; // disabled for best quality
// TODO: optimized esa
}
else
More information about the x265-devel
mailing list