[x265] [PATCH] perf: Enabling lookahead-slices for all presets except veryslow & placebo

Pradeep Ramachandran pradeep at multicorewareinc.com
Thu Nov 5 05:41:46 CET 2015


# HG changeset patch
# User Pradeep Ramachandran <pradeep at multicorewareinc.com>
# Date 1446695802 -19800
#      Thu Nov 05 09:26:42 2015 +0530
# Node ID 37fb5c10bfce78c7af302f644032f0d453c3158c
# Parent  3103afbd31fa9b26533f06202516a511ee221439
perf: Enabling lookahead-slices for all presets except veryslow & placebo

Seeing ~10% performance on the faster presets on skylake, and ~2X performance
on Xeon systems in ultrafast setting. Performance improvement is through a
considerable increase in utilization across the board. Disabling slicing for
videos of resoultion < 720p to limit impact on quality.

Commit will change outputs, but reduction in quality (measured by PSNR or SSIM)
is <0.01% across a wide variety of presets and runs.

diff -r 3103afbd31fa -r 37fb5c10bfce doc/reST/cli.rst
--- a/doc/reST/cli.rst	Thu Nov 05 06:13:51 2015 +0530
+++ b/doc/reST/cli.rst	Thu Nov 05 09:26:42 2015 +0530
@@ -1133,21 +1133,31 @@
 
 .. option:: --lookahead-slices <0..16>
 
-	Use multiple worker threads to measure the estimated cost of each
-	frame within the lookahead. When :option:`--b-adapt` is 2, most
-	frame cost estimates will be performed in batch mode, many cost
-	estimates at the same time, and lookahead-slices is ignored for
-	batched estimates. The effect on performance can be quite small.
-	The higher this parameter, the less accurate the frame costs will be
-	(since context is lost across slice boundaries) which will result in
-	less accurate B-frame and scene-cut decisions.
+	Use multiple worker threads to measure the estimated cost of each frame
+	within the lookahead. The frame is divided into the specified number of
+	slices, and one-thread is launched  per slice. When :option:`--b-adapt` is
+	2, most frame cost estimates will be performed in batch mode (many cost
+	estimates at the same time) and lookahead-slices is ignored for batched
+	estimates; it may still be used for single cost estimations. The higher this
+	parameter, the less accurate the frame costs will be (since context is lost
+	across slice boundaries) which will result in less accurate B-frame and
+	scene-cut decisions. The effect on performance can be significant especially
+	on systems with many threads.
 
-	The encoder may internally lower the number of slices to ensure
-	each slice codes at least 10 16x16 rows of lowres blocks. If slices
-	are used in lookahead, they are logged in the list of tools as
-	*lslices*.
-	
-	**Values:** 0 - disabled (default). 1 is the same as 0. Max 16
+	The encoder may internally lower the number of slices or disable
+    slicing to ensure each slice codes at least 10 16x16 rows of lowres
+    blocks to minimize the impact on quality. For example, for 720p and
+    1080p videos, the number of slices is capped to 4 and 6, respectively.
+    For resolutions lesser than 720p, slicing is auto-disabled.
+        
+    If slices are used in lookahead, they are logged in the list of tools
+    as *lslices*
+
+	**Values:** 0 - disabled. 1 is the same as 0. Max 16.
+    Default: 8 for ultrafast, superfast, faster, fast, medium
+             4 for slow, slower
+             disabled for veryslow, slower
+
 
 .. option:: --b-adapt <integer>
 
diff -r 3103afbd31fa -r 37fb5c10bfce doc/reST/presets.rst
--- a/doc/reST/presets.rst	Thu Nov 05 06:13:51 2015 +0530
+++ b/doc/reST/presets.rst	Thu Nov 05 09:26:42 2015 +0530
@@ -19,61 +19,63 @@
 
 The presets adjust encoder parameters to affect these trade-offs.
 
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-|              | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
-+==============+===========+===========+==========+========+======+========+======+========+==========+=========+
-| ctu          |   32      |    32     |   32     |  64    |  64  |   64   |  64  |  64    |   64     |   64    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| min-cu-size  |   16      |     8     |    8     |   8    |   8  |    8   |   8  |   8    |    8     |    8    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| bframes      |    3      |     3     |    4     |   4    |  4   |    4   |  4   |   8    |    8     |    8    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-adapt      |    0      |     0     |    0     |   0    |  0   |    2   |  2   |   2    |    2     |    2    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rc-lookahead |    5      |    10     |   15     |  15    |  15  |   20   |  25  |   30   |   40     |   60    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| scenecut     |    0      |    40     |   40     |  40    |  40  |   40   |  40  |   40   |   40     |   40    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| ref          |    1      |     1     |    1     |   1    |  2   |    3   |  3   |   3    |    5     |    5    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| me           |   dia     |   hex     |   hex    |  hex   | hex  |   hex  | star |  star  |   star   |   star  |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| merange      |   57      |    57     |   57     |  57    |  57  |   57   | 57   |  57    |   57     |   92    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| subme        |    0      |     1     |    1     |   2    |  2   |    2   |  3   |   3    |    4     |    5    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rect         |    0      |     0     |    0     |   0    |  0   |    0   |  1   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| amp          |    0      |     0     |    0     |   0    |  0   |    0   |  0   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| max-merge    |    2      |     2     |    2     |   2    |  2   |    2   |  3   |   3    |    4     |    5    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| early-skip   |    1      |     1     |    1     |   1    |  0   |    0   |  0   |   0    |    0     |    0    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| fast-intra   |    1      |     1     |    1     |   1    |  1   |    0   |  0   |   0    |    0     |    0    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| b-intra      |    0      |     0     |    0     |   0    |  0   |    0   |  0   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| sao          |    0      |     0     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| signhide     |    0      |     1     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightp      |    0      |     0     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| weightb      |    0      |     0     |    0     |   0    |  0   |    0   |  0   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| aq-mode      |    0      |     0     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| cuTree       |    0      |     0     |    0     |   0    |  1   |    1   |  1   |   1    |    1     |    1    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdLevel      |    2      |     2     |    2     |   2    |  2   |    3   |  4   |   6    |    6     |    6    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| rdoq-level   |    0      |     0     |    0     |   0    |  0   |    0   |  2   |   2    |    2     |    2    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-intra     |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
-| tu-inter     |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
-+--------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+|                     | ultrafast | superfast | veryfast | faster | fast | medium | slow | slower | veryslow | placebo |
++=====================+===========+===========+==========+========+======+========+======+========+==========+=========+
+| ctu                 |   32      |    32     |   32     |  64    |  64  |   64   |  64  |  64    |   64     |   64    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| min-cu-size         |   16      |     8     |    8     |   8    |   8  |    8   |   8  |   8    |    8     |    8    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| bframes             |    3      |     3     |    4     |   4    |  4   |    4   |  4   |   8    |    8     |    8    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-adapt             |    0      |     0     |    0     |   0    |  0   |    2   |  2   |   2    |    2     |    2    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rc-lookahead        |    5      |    10     |   15     |  15    |  15  |   20   |  25  |   30   |   40     |   60    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| scenecut            |    0      |    40     |   40     |  40    |  40  |   40   |  40  |   40   |   40     |   40    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| ref                 |    1      |     1     |    1     |   1    |  2   |    3   |  3   |   3    |    5     |    5    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| me                  |   dia     |   hex     |   hex    |  hex   | hex  |   hex  | star |  star  |   star   |   star  |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| merange             |   57      |    57     |   57     |  57    |  57  |   57   | 57   |  57    |   57     |   92    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| subme               |    0      |     1     |    1     |   2    |  2   |    2   |  3   |   3    |    4     |    5    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rect                |    0      |     0     |    0     |   0    |  0   |    0   |  1   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| amp                 |    0      |     0     |    0     |   0    |  0   |    0   |  0   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| max-merge           |    2      |     2     |    2     |   2    |  2   |    2   |  3   |   3    |    4     |    5    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| early-skip          |    1      |     1     |    1     |   1    |  0   |    0   |  0   |   0    |    0     |    0    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| fast-intra          |    1      |     1     |    1     |   1    |  1   |    0   |  0   |   0    |    0     |    0    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| b-intra             |    0      |     0     |    0     |   0    |  0   |    0   |  0   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| sao                 |    0      |     0     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| signhide            |    0      |     1     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightp             |    0      |     0     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| weightb             |    0      |     0     |    0     |   0    |  0   |    0   |  0   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| aq-mode             |    0      |     0     |    1     |   1    |  1   |    1   |  1   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| cuTree              |    0      |     0     |    0     |   0    |  1   |    1   |  1   |   1    |    1     |    1    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdLevel             |    2      |     2     |    2     |   2    |  2   |    3   |  4   |   6    |    6     |    6    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| rdoq-level          |    0      |     0     |    0     |   0    |  0   |    0   |  2   |   2    |    2     |    2    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-intra            |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| tu-inter            |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
+| lookahead-slices    |    1      |     1     |    1     |   1    |  1   |    1   |  1   |   2    |    3     |    4    |
++---------------------+-----------+-----------+----------+--------+------+--------+------+--------+----------+---------+
 
 Placebo mode enables transform-skip prediction evaluation.
 
diff -r 3103afbd31fa -r 37fb5c10bfce source/common/param.cpp
--- a/source/common/param.cpp	Thu Nov 05 06:13:51 2015 +0530
+++ b/source/common/param.cpp	Thu Nov 05 09:26:42 2015 +0530
@@ -147,7 +147,7 @@
     param->bFrameAdaptive = X265_B_ADAPT_TRELLIS;
     param->bBPyramid = 1;
     param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
-    param->lookaheadSlices = 0;
+    param->lookaheadSlices = 8;
 
     /* Intra Coding Tools */
     param->bEnableConstrainedIntra = 0;
@@ -348,6 +348,7 @@
             param->subpelRefine = 3;
             param->maxNumMergeCand = 3;
             param->searchMethod = X265_STAR_SEARCH;
+            param->lookaheadSlices = 4; // limit parallelism as already enough work exists
         }
         else if (!strcmp(preset, "slower"))
         {
@@ -365,6 +366,7 @@
             param->maxNumMergeCand = 3;
             param->searchMethod = X265_STAR_SEARCH;
             param->bIntraInBFrames = 1;
+            param->lookaheadSlices = 4; // limit parallelism as already enough work exists
         }
         else if (!strcmp(preset, "veryslow"))
         {
@@ -383,6 +385,7 @@
             param->searchMethod = X265_STAR_SEARCH;
             param->maxNumReferences = 5;
             param->bIntraInBFrames = 1;
+            param->lookaheadSlices = 0; // disabled for best quality
         }
         else if (!strcmp(preset, "placebo"))
         {
@@ -404,6 +407,7 @@
             param->maxNumReferences = 5;
             param->rc.bEnableSlowFirstPass = 1;
             param->bIntraInBFrames = 1;
+            param->lookaheadSlices = 0; // disabled for best quality
             // TODO: optimized esa
         }
         else
diff -r 3103afbd31fa -r 37fb5c10bfce source/encoder/slicetype.cpp
--- a/source/encoder/slicetype.cpp	Thu Nov 05 06:13:51 2015 +0530
+++ b/source/encoder/slicetype.cpp	Thu Nov 05 09:26:42 2015 +0530
@@ -516,7 +516,16 @@
     m_bBatchFrameCosts = m_bBatchMotionSearch;
 
     if (m_param->lookaheadSlices && !m_pool)
+    {
+        x265_log(param, X265_LOG_WARNING, "No pools found; disabling lookahead-slices\n");
         m_param->lookaheadSlices = 0;
+    }
+
+    if (m_param->lookaheadSlices && (m_param->sourceHeight < 720))
+    {
+        x265_log(param, X265_LOG_WARNING, "Source height < 720p; disabling lookahead-slices\n");
+        m_param->lookaheadSlices = 0;
+    }
 
     if (m_param->lookaheadSlices > 1)
     {


More information about the x265-devel mailing list