[x265] [PATCH] perf: Enabling lookahead-slices for all presets except veryslow & placebo
Pradeep Ramachandran
pradeep at multicorewareinc.com
Wed Oct 28 04:48:20 CET 2015
# HG changeset patch
# User Pradeep Ramachandran <pradeep at multicorewareinc.com>
# Date 1446004071 -19800
# Wed Oct 28 09:17:51 2015 +0530
# Node ID 9220269d972168381461e609dfc7d46498bc197c
# Parent 6563218ce342c30bfd4f9bc172a1dab510e6e55b
perf: Enabling lookahead-slices for all presets except veryslow & placebo
Seeing ~10% performance on the faster presets on skylake, and ~2X performance
on Xeon systems in ultrafast setting. Performance improvement is through a
considerable increase in utilization across the board.
Commit will change outputs, but reduction in quality (measured by PSNR or SSIM)
is <0.01% across a wide variety of presets and runs.
diff -r 6563218ce342 -r 9220269d9721 doc/reST/cli.rst
--- a/doc/reST/cli.rst Mon Oct 26 12:13:53 2015 +0530
+++ b/doc/reST/cli.rst Wed Oct 28 09:17:51 2015 +0530
@@ -1124,21 +1124,23 @@
.. option:: --lookahead-slices <0..16>
- Use multiple worker threads to measure the estimated cost of each
- frame within the lookahead. When :option:`--b-adapt` is 2, most
- frame cost estimates will be performed in batch mode, many cost
- estimates at the same time, and lookahead-slices is ignored for
- batched estimates. The effect on performance can be quite small.
- The higher this parameter, the less accurate the frame costs will be
- (since context is lost across slice boundaries) which will result in
- less accurate B-frame and scene-cut decisions.
+ Use multiple worker threads to measure the estimated cost of each frame
+ within the lookahead. The frame is divided into the specified number of
+ slices, and one-thread is launched per slice. When :option:`--b-adapt` is
+ 2, most frame cost estimates will be performed in batch mode (many cost
+ estimates at the same time) and lookahead-slices is ignored for batched
+ estimates; it may still be used for single cost estimations. The higher this
+ parameter, the less accurate the frame costs will be (since context is lost
+ across slice boundaries) which will result in less accurate B-frame and
+ scene-cut decisions. The effect on performance can be significant especially
+ on systems with many threads.
The encoder may internally lower the number of slices to ensure
each slice codes at least 10 16x16 rows of lowres blocks. If slices
are used in lookahead, they are logged in the list of tools as
*lslices*.
- **Values:** 0 - disabled (default). 1 is the same as 0. Max 16
+ **Values:** 0 - disabled. 1 is the same as 0. Max 16. Default: 8
.. option:: --b-adapt <integer>
diff -r 6563218ce342 -r 9220269d9721 doc/reST/presets.rst
--- a/doc/reST/presets.rst Mon Oct 26 12:13:53 2015 +0530
+++ b/doc/reST/presets.rst Wed Oct 28 09:17:51 2015 +0530
@@ -19,61 +19,63 @@
The presets adjust encoder parameters to affect these trade-offs.
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
-+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
-| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| refs | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
++=================+===========+===========+==========+========+======+========+======+========+==========+=========+
+| ctu | 32 | 32 | 32 | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| bframes | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 8 | 8 | 8 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-adapt | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 | 2 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20 | 25 | 30 | 40 | 60 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| scenecut | 0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| refs | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 5 | 5 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| me | dia | hex | hex | hex | hex | hex | star | star | star | star |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| merange | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 92 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| subme | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rect | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| amp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| max-merge | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 4 | 5 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| early-skip | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| fast-intra | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-intra | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| sao | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| signhide | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightp | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightb | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| aq-mode | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| cuTree | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdLevel | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 6 | 6 | 6 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | 2 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-intra | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-inter | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| lookahead-slices | 8 | 8 | 8 | 8 | 8 | 8 | 4 | 4 | 0 | 0 |
++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
Placebo mode enables transform-skip prediction evaluation.
diff -r 6563218ce342 -r 9220269d9721 source/common/param.cpp
--- a/source/common/param.cpp Mon Oct 26 12:13:53 2015 +0530
+++ b/source/common/param.cpp Wed Oct 28 09:17:51 2015 +0530
@@ -147,7 +147,7 @@
param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
param->bBPyramid = 1;
param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
- param->lookaheadSlices = 0;
+ param->lookaheadSlices = 8;
/* Intra Coding Tools */
param->bEnableConstrainedIntra = 0;
@@ -347,6 +347,7 @@
param->subpelRefine = 3;
param->maxNumMergeCand = 3;
param->searchMethod = X265_STAR_SEARCH;
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists
}
else if (!strcmp(preset, "slower"))
{
@@ -364,6 +365,7 @@
param->maxNumMergeCand = 3;
param->searchMethod = X265_STAR_SEARCH;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 4; // limit parallelism as already enough work exists
}
else if (!strcmp(preset, "veryslow"))
{
@@ -382,6 +384,7 @@
param->searchMethod = X265_STAR_SEARCH;
param->maxNumReferences = 5;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 0; // disabled for best quality
}
else if (!strcmp(preset, "placebo"))
{
@@ -403,6 +406,7 @@
param->maxNumReferences = 5;
param->rc.bEnableSlowFirstPass = 1;
param->bIntraInBFrames = 1;
+ param->lookaheadSlices = 0; // disabled for best quality
// TODO: optimized esa
}
else
More information about the x265-devel
mailing list