[x265] [PATCH] perf: Enabling lookahead-slices for all presets except veryslow & placebo
Pradeep Ramachandran
pradeep at multicorewareinc.com
Wed Nov 4 07:42:54 CET 2015
This patch no longer applies at the tip as the doc was recently changed.
Will send an update - please ignore until then.
Pradeep.
On Tue, Nov 3, 2015 at 9:32 PM, Pradeep Ramachandran <
pradeep at multicorewareinc.com> wrote:
> # HG changeset patch
> # User Pradeep Ramachandran <pradeep at multicorewareinc.com>
> # Date 1446566445 -19800
> # Tue Nov 03 21:30:45 2015 +0530
> # Node ID cdc2b132a66f97bab510313f614909df3089c2c7
> # Parent 61396ea8096a9f75667ac01ae8b4bf02169d3b64
> perf: Enabling lookahead-slices for all presets except veryslow & placebo
>
> Seeing ~10% performance on the faster presets on skylake, and ~2X
> performance
> on Xeon systems in ultrafast setting. Performance improvement is through a
> considerable increase in utilization across the board. Disabling slicing
> for
> videos of resoultion < 720p to limit impact on quality.
>
> Commit will change outputs, but reduction in quality (measured by PSNR or
> SSIM)
> is <0.01% across a wide variety of presets and runs.
>
> diff -r 61396ea8096a -r cdc2b132a66f doc/reST/cli.rst
> --- a/doc/reST/cli.rst Mon Oct 12 10:23:37 2015 +0800
> +++ b/doc/reST/cli.rst Tue Nov 03 21:30:45 2015 +0530
> @@ -1124,21 +1124,31 @@
>
> .. option:: --lookahead-slices <0..16>
>
> - Use multiple worker threads to measure the estimated cost of each
> - frame within the lookahead. When :option:`--b-adapt` is 2, most
> - frame cost estimates will be performed in batch mode, many cost
> - estimates at the same time, and lookahead-slices is ignored for
> - batched estimates. The effect on performance can be quite small.
> - The higher this parameter, the less accurate the frame costs will
> be
> - (since context is lost across slice boundaries) which will result
> in
> - less accurate B-frame and scene-cut decisions.
> + Use multiple worker threads to measure the estimated cost of each
> frame
> + within the lookahead. The frame is divided into the specified
> number of
> + slices, and one-thread is launched per slice. When
> :option:`--b-adapt` is
> + 2, most frame cost estimates will be performed in batch mode (many
> cost
> + estimates at the same time) and lookahead-slices is ignored for
> batched
> + estimates; it may still be used for single cost estimations. The
> higher this
> + parameter, the less accurate the frame costs will be (since
> context is lost
> + across slice boundaries) which will result in less accurate
> B-frame and
> + scene-cut decisions. The effect on performance can be significant
> especially
> + on systems with many threads.
>
> - The encoder may internally lower the number of slices to ensure
> - each slice codes at least 10 16x16 rows of lowres blocks. If slices
> - are used in lookahead, they are logged in the list of tools as
> - *lslices*.
> -
> - **Values:** 0 - disabled (default). 1 is the same as 0. Max 16
> + The encoder may internally lower the number of slices or disable
> + slicing to ensure each slice codes at least 10 16x16 rows of lowres
> + blocks to minimize the impact on quality. For example, for 720p and
> + 1080p videos, the number of slices is capped to 4 and 6, respectively.
> + For resolutions lesser than 720p, slicing is auto-disabled.
> +
> + If slices are used in lookahead, they are logged in the list of tools
> + as *lslices*
> +
> + **Values:** 0 - disabled. 1 is the same as 0. Max 16.
> + Default: 8 for ultrafast, superfast, faster, fast, medium
> + 4 for slow, slower
> + disabled for veryslow, slower
> +
>
> .. option:: --b-adapt <integer>
>
> diff -r 61396ea8096a -r cdc2b132a66f doc/reST/presets.rst
> --- a/doc/reST/presets.rst Mon Oct 12 10:23:37 2015 +0800
> +++ b/doc/reST/presets.rst Tue Nov 03 21:30:45 2015 +0530
> @@ -19,61 +19,63 @@
>
> The presets adjust encoder parameters to affect these trade-offs.
>
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| | ultrafast | superfast | veryfast | faster | fast |
> medium | slow | slower | veryslow | placebo |
>
> -+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
> -| ctu | 32 | 32 | 32 | 64 | 64 | 64
> | 64 | 64 | 64 | 64 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| min-cu-size | 16 | 8 | 8 | 8 | 8 | 8
> | 8 | 8 | 8 | 8 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| bframes | 3 | 3 | 4 | 4 | 4 | 4
> | 4 | 8 | 8 | 8 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| b-adapt | 0 | 0 | 0 | 0 | 0 | 2
> | 2 | 2 | 2 | 2 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rc-lookahead | 5 | 10 | 15 | 15 | 15 | 20
> | 25 | 30 | 40 | 60 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| scenecut | 0 | 40 | 40 | 40 | 40 | 40
> | 40 | 40 | 40 | 40 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| refs | 1 | 1 | 1 | 1 | 2 | 3
> | 3 | 3 | 5 | 5 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| me | dia | hex | hex | hex | hex |
> hex | star | star | star | star |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| merange | 57 | 57 | 57 | 57 | 57 | 57
> | 57 | 57 | 57 | 92 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| subme | 0 | 1 | 1 | 2 | 2 | 2
> | 3 | 3 | 4 | 5 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rect | 0 | 0 | 0 | 0 | 0 | 0
> | 1 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| amp | 0 | 0 | 0 | 0 | 0 | 0
> | 0 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| max-merge | 2 | 2 | 2 | 2 | 2 | 2
> | 3 | 3 | 4 | 5 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| early-skip | 1 | 1 | 1 | 1 | 0 | 0
> | 0 | 0 | 0 | 0 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| fast-intra | 1 | 1 | 1 | 1 | 1 | 0
> | 0 | 0 | 0 | 0 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| b-intra | 0 | 0 | 0 | 0 | 0 | 0
> | 0 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| sao | 0 | 0 | 1 | 1 | 1 | 1
> | 1 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| signhide | 0 | 1 | 1 | 1 | 1 | 1
> | 1 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| weightp | 0 | 0 | 1 | 1 | 1 | 1
> | 1 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| weightb | 0 | 0 | 0 | 0 | 0 | 0
> | 0 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| aq-mode | 0 | 0 | 1 | 1 | 1 | 1
> | 1 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| cuTree | 0 | 0 | 0 | 0 | 1 | 1
> | 1 | 1 | 1 | 1 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rdLevel | 2 | 2 | 2 | 2 | 2 | 3
> | 4 | 6 | 6 | 6 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| rdoq-level | 0 | 0 | 0 | 0 | 0 | 0
> | 2 | 2 | 2 | 2 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| tu-intra | 1 | 1 | 1 | 1 | 1 | 1
> | 1 | 2 | 3 | 4 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> -| tu-inter | 1 | 1 | 1 | 1 | 1 | 1
> | 1 | 2 | 3 | 4 |
>
> -+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| | ultrafast | superfast | veryfast | faster | fast |
> medium | slow | slower | veryslow | placebo |
>
> ++=================+===========+===========+==========+========+======+========+======+========+==========+=========+
> +| ctu | 32 | 32 | 32 | 64 | 64 |
> 64 | 64 | 64 | 64 | 64 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| min-cu-size | 16 | 8 | 8 | 8 | 8 |
> 8 | 8 | 8 | 8 | 8 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| bframes | 3 | 3 | 4 | 4 | 4 |
> 4 | 4 | 8 | 8 | 8 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| b-adapt | 0 | 0 | 0 | 0 | 0 |
> 2 | 2 | 2 | 2 | 2 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rc-lookahead | 5 | 10 | 15 | 15 | 15 |
> 20 | 25 | 30 | 40 | 60 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| scenecut | 0 | 40 | 40 | 40 | 40 |
> 40 | 40 | 40 | 40 | 40 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| refs | 1 | 1 | 1 | 1 | 2 |
> 3 | 3 | 3 | 5 | 5 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| me | dia | hex | hex | hex | hex |
> hex | star | star | star | star |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| merange | 57 | 57 | 57 | 57 | 57 |
> 57 | 57 | 57 | 57 | 92 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| subme | 0 | 1 | 1 | 2 | 2 |
> 2 | 3 | 3 | 4 | 5 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rect | 0 | 0 | 0 | 0 | 0 |
> 0 | 1 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| amp | 0 | 0 | 0 | 0 | 0 |
> 0 | 0 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| max-merge | 2 | 2 | 2 | 2 | 2 |
> 2 | 3 | 3 | 4 | 5 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| early-skip | 1 | 1 | 1 | 1 | 0 |
> 0 | 0 | 0 | 0 | 0 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| fast-intra | 1 | 1 | 1 | 1 | 1 |
> 0 | 0 | 0 | 0 | 0 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| b-intra | 0 | 0 | 0 | 0 | 0 |
> 0 | 0 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| sao | 0 | 0 | 1 | 1 | 1 |
> 1 | 1 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| signhide | 0 | 1 | 1 | 1 | 1 |
> 1 | 1 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| weightp | 0 | 0 | 1 | 1 | 1 |
> 1 | 1 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| weightb | 0 | 0 | 0 | 0 | 0 |
> 0 | 0 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| aq-mode | 0 | 0 | 1 | 1 | 1 |
> 1 | 1 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| cuTree | 0 | 0 | 0 | 0 | 1 |
> 1 | 1 | 1 | 1 | 1 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rdLevel | 2 | 2 | 2 | 2 | 2 |
> 3 | 4 | 6 | 6 | 6 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| rdoq-level | 0 | 0 | 0 | 0 | 0 |
> 0 | 2 | 2 | 2 | 2 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| tu-intra | 1 | 1 | 1 | 1 | 1 |
> 1 | 1 | 2 | 3 | 4 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| tu-inter | 1 | 1 | 1 | 1 | 1 |
> 1 | 1 | 2 | 3 | 4 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
> +| lookahead-slices | 8 | 8 | 8 | 8 | 8 |
> 8 | 4 | 4 | 0 | 0 |
>
> ++------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
>
> Placebo mode enables transform-skip prediction evaluation.
>
> diff -r 61396ea8096a -r cdc2b132a66f source/common/param.cpp
> --- a/source/common/param.cpp Mon Oct 12 10:23:37 2015 +0800
> +++ b/source/common/param.cpp Tue Nov 03 21:30:45 2015 +0530
> @@ -147,7 +147,7 @@
> param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
> param->bBPyramid = 1;
> param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
> - param->lookaheadSlices = 0;
> + param->lookaheadSlices = 8;
>
> /* Intra Coding Tools */
> param->bEnableConstrainedIntra = 0;
> @@ -347,6 +347,7 @@
> param->subpelRefine = 3;
> param->maxNumMergeCand = 3;
> param->searchMethod = X265_STAR_SEARCH;
> + param->lookaheadSlices = 4; // limit parallelism as already
> enough work exists
> }
> else if (!strcmp(preset, "slower"))
> {
> @@ -364,6 +365,7 @@
> param->maxNumMergeCand = 3;
> param->searchMethod = X265_STAR_SEARCH;
> param->bIntraInBFrames = 1;
> + param->lookaheadSlices = 4; // limit parallelism as already
> enough work exists
> }
> else if (!strcmp(preset, "veryslow"))
> {
> @@ -382,6 +384,7 @@
> param->searchMethod = X265_STAR_SEARCH;
> param->maxNumReferences = 5;
> param->bIntraInBFrames = 1;
> + param->lookaheadSlices = 0; // disabled for best quality
> }
> else if (!strcmp(preset, "placebo"))
> {
> @@ -403,6 +406,7 @@
> param->maxNumReferences = 5;
> param->rc.bEnableSlowFirstPass = 1;
> param->bIntraInBFrames = 1;
> + param->lookaheadSlices = 0; // disabled for best quality
> // TODO: optimized esa
> }
> else
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20151104/0b8a984c/attachment-0001.html>
More information about the x265-devel
mailing list