[x265-commits] [x265] api: nits

Mon Feb 2 18:51:38 CET 2015

details:   http://hg.videolan.org/x265/rev/5e5dc3763f63
branches:  
changeset: 9237:5e5dc3763f63
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 29 10:37:54 2015 -0600
description:
api: nits
Subject: [x265] rdcost: auto down-scale psy-rd at higher QPs

details:   http://hg.videolan.org/x265/rev/28a40d526fcf
branches:  
changeset: 9238:28a40d526fcf
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 29 10:34:27 2015 -0600
description:
rdcost: auto down-scale psy-rd at higher QPs

When QP gets above 42, turn down psy-rd by half. When it gets to 50 disable it
outright. Note that we're not mucking with psy-rdoq at this time.
Subject: [x265] param: enable psy-rd and psy-rdoq by default

details:   http://hg.videolan.org/x265/rev/46c767f4eb46
branches:  
changeset: 9239:46c767f4eb46
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 29 10:47:02 2015 -0600
description:
param: enable psy-rd and psy-rdoq by default

The psycho-visual cost functions are assembly optimized now, so there isn't a
large cost penalty to having them enabled.
Subject: [x265] encoder: no longer warn when disabling psy-rdo[q] for rdlevel reasons

details:   http://hg.videolan.org/x265/rev/a9c161585663
branches:  
changeset: 9240:a9c161585663
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 29 13:27:54 2015 -0600
description:
encoder: no longer warn when disabling psy-rdo[q] for rdlevel reasons
Subject: [x265] encoder: allow 8 frame threads with 4k and many core servers

details:   http://hg.videolan.org/x265/rev/7a021c3ef8fa
branches:  
changeset: 9241:7a021c3ef8fa
user:      Steve Borho <steve at borho.org>
date:      Thu Jan 29 19:55:35 2015 -0600
description:
encoder: allow 8 frame threads with 4k and many core servers
Subject: [x265] quant: add m_tqBypass

details:   http://hg.videolan.org/x265/rev/dadc7a234fa1
branches:  
changeset: 9242:dadc7a234fa1
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Fri Jan 30 22:18:28 2015 +0900
description:
quant: add m_tqBypass
Subject: [x265] improve codeCoeffNxN by calculate context in scanLast loop

details:   http://hg.videolan.org/x265/rev/d75156e5313d
branches:  
changeset: 9243:d75156e5313d
user:      Min Chen <chenm003 at 163.com>
date:      Fri Jan 30 20:19:12 2015 +0800
description:
improve codeCoeffNxN by calculate context in scanLast loop
Subject: [x265] pixelHarness: add testharness code for estimateCUPropagateCost

details:   http://hg.videolan.org/x265/rev/dc5e97f8be93
branches:  
changeset: 9244:dc5e97f8be93
user:      Santhoshini Sekar<santhoshini at multicorewareinc.com>
date:      Wed Jan 28 15:58:37 2015 +0530
description:
pixelHarness: add testharness code for estimateCUPropagateCost
Subject: [x265] nit: replace hard-coded 51 with QP_MAX_SPEC

details:   http://hg.videolan.org/x265/rev/6c5156500d6d
branches:  
changeset: 9245:6c5156500d6d
user:      Steve Borho <steve at borho.org>
date:      Fri Jan 30 11:54:22 2015 -0600
description:
nit: replace hard-coded 51 with QP_MAX_SPEC
Subject: [x265] merge default into stable, prep for 1.5 tag

details:   http://hg.videolan.org/x265/rev/3e6c23a65f6e
branches:  stable
changeset: 9246:3e6c23a65f6e
user:      Steve Borho <steve at borho.org>
date:      Fri Jan 30 11:56:09 2015 -0600
description:
merge default into stable, prep for 1.5 tag
Subject: [x265] rc: fix comment text that was pasted from the HEVC spec

details:   http://hg.videolan.org/x265/rev/e2c958ff874e
branches:  stable
changeset: 9247:e2c958ff874e
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 13:48:34 2015 -0600
description:
rc: fix comment text that was pasted from the HEVC spec

The AyCpbRemovalTime typo is from the spec itself. The ? was an error copying
the doc text to ascii encoding. The spec uses a unicode division symbol.
Subject: [x265] encoder: whitespace nits and document fixes

details:   http://hg.videolan.org/x265/rev/954aab61cae7
branches:  stable
changeset: 9248:954aab61cae7
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Fri Jan 30 11:27:55 2015 +0530
description:
encoder: whitespace nits and document fixes
Subject: [x265] threading: use InterlockedExchangeAdd for ATOMIC_ADD

details:   http://hg.videolan.org/x265/rev/bc0fbae84481
branches:  stable
changeset: 9249:bc0fbae84481
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Mon Feb 02 14:34:16 2015 +0530
description:
threading: use InterlockedExchangeAdd for ATOMIC_ADD

This patch fixes build error in 32 bit VC-compilers which do not support
InterlockedAdd. InterlockedExchangeAdd requires ptr to be aligned to 32-bit
boundaries.
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/3f613af5070a
branches:  
changeset: 9250:3f613af5070a
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Mon Feb 02 17:00:41 2015 +0530
description:
Merge with stable
Subject: [x265] blockfill_s_16x16 sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/6628052c5020
branches:  
changeset: 9251:6628052c5020
user:      Praveen Tiwari
date:      Mon Feb 02 15:19:19 2015 +0530
description:
blockfill_s_16x16 sse2 asm code optimization

eliminated branch instructions and optimized LEA instruction
Subject: [x265] blockfill_s_32x32 sse2 asm code optimization

details:   http://hg.videolan.org/x265/rev/1760823cdd46
branches:  
changeset: 9252:1760823cdd46
user:      Praveen Tiwari
date:      Mon Feb 02 16:27:07 2015 +0530
description:
blockfill_s_32x32 sse2 asm code optimization

optimized LEA instruction
Subject: [x265] threading: create a utility class for measuring elapsed time in functions

details:   http://hg.videolan.org/x265/rev/605ec2527345
branches:  
changeset: 9253:605ec2527345
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 14:50:14 2015 -0600
description:
threading: create a utility class for measuring elapsed time in functions

The general idea is to make it trivial to add per-function profiling with just
a single line of code that can also be compiled out via a build macro.
Subject: [x265] api: set a limit on the number of frame encoders

details:   http://hg.videolan.org/x265/rev/a498b1375244
branches:  
changeset: 9254:a498b1375244
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 15:15:48 2015 -0600
description:
api: set a limit on the number of frame encoders

The practical limit is currently around 10 frame encoders. Any more than that
and you start degrading performance.  A hard-coded limit of 16 allows room for
future improvements to frame parallelism.
Subject: [x265] doc: update frame threading docs

details:   http://hg.videolan.org/x265/rev/4eecc5181f72
branches:  
changeset: 9255:4eecc5181f72
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 15:24:33 2015 -0600
description:
doc: update frame threading docs
Subject: [x265] rdcost: use a more gradual fall-off function for psy-rd at high QP

details:   http://hg.videolan.org/x265/rev/6ab5d656aea1
branches:  stable
changeset: 9256:6ab5d656aea1
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 14:27:17 2015 -0600
description:
rdcost: use a more gradual fall-off function for psy-rd at high QP

This algorithm results in:

qp 39 psyRd 256
qp 40 psyRd 253
qp 41 psyRd 227
qp 42 psyRd 183
qp 43 psyRd 131
qp 44 psyRd 82
qp 45 psyRd 44
qp 46 psyRd 19
qp 47 psyRd 6
qp 48 psyRd 1
qp 49 psyRd 0
qp 50 psyRd 0
qp 51 psyRd 0
Subject: [x265] api: give type name to x265_cli_csp

details:   http://hg.videolan.org/x265/rev/2fb7f322c6d4
branches:  stable
changeset: 9257:2fb7f322c6d4
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 14:57:06 2015 -0600
description:
api: give type name to x265_cli_csp
Subject: [x265] frameencoder: simplify noise reduction update logic flow

details:   http://hg.videolan.org/x265/rev/cf9b6df23d0d
branches:  
changeset: 9258:cf9b6df23d0d
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 16:03:57 2015 -0600
description:
frameencoder: simplify noise reduction update logic flow
Subject: [x265] search: add compile-time optional detailed CU stats

details:   http://hg.videolan.org/x265/rev/95ad2b84c5bb
branches:  
changeset: 9259:95ad2b84c5bb
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 15:57:00 2015 -0600
description:
search: add compile-time optional detailed CU stats

By keeping accumulators per worker thread per frame encoder, we can avoid the
use of atomics and simply accumulate the results at the end of each frame. The
calls to x265_mdate() and accumulation work is still measurable and so all of
this work is only performed if the cmake option is enabled.

The logging output looks like:

x265 [info]: CU: Worker threads compressed 9856 64X64 CTUs in 16.559 worker seconds, 595.211 CTUs per second
x265 [info]: CU: %23.29 time spent in motion estimation, averaging 9.726 CU inter modes per CTU
x265 [info]: CU: 0.996 PME masters per inter CU, each blocked an average of 0.340 ns
x265 [info]: CU:       0.017 slaves per PME master, each took an average of 0.041 ms
x265 [info]: CU: %03.97 time spent in intra analysis, averaging 9.515 Intra PUs per CTU
x265 [info]: CU: %16.70 time spent in inter RDO, measuring 20.980 inter/merge predictions per CTU
x265 [info]: CU: %14.03 time spent in intra RDO, measuring 27.478 intra predictions per CTU
x265 [info]: CU: 9.726 PMODE masters per CTU, each blocked an average of 0.293 ns
x265 [info]: CU:       1.622 slaves per PMODE master, each took average of 0.038 ms
Subject: [x265] search: remove unused zeroPixel

details:   http://hg.videolan.org/x265/rev/f025285c2128
branches:  
changeset: 9260:f025285c2128
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 31 15:52:34 2015 -0600
description:
search: remove unused zeroPixel
Subject: [x265] search: seperate intra analysis from RDO in estIntraPredQT(), improve var names

details:   http://hg.videolan.org/x265/rev/60e07fced4d2
branches:  
changeset: 9261:60e07fced4d2
user:      Steve Borho <steve at borho.org>
date:      Sun Feb 01 14:10:10 2015 -0600
description:
search: seperate intra analysis from RDO in estIntraPredQT(), improve var names

This clarifies the statistics in I slices and in RD levels 4 and 5. This coomit
adds a brace { } scope to perform the profiling but does not change indentation.
This will be done in the next commit.
Subject: [x265] search: fix indentation of new brace scope, no logic change

details:   http://hg.videolan.org/x265/rev/2357bcb2a9a6
branches:  
changeset: 9262:2357bcb2a9a6
user:      Steve Borho <steve at borho.org>
date:      Sun Feb 01 14:11:58 2015 -0600
description:
search: fix indentation of new brace scope, no logic change
Subject: [x265] pmode: do not call findJob() from task master, avoid double counting of time

details:   http://hg.videolan.org/x265/rev/4aaa6c38201a
branches:  
changeset: 9263:4aaa6c38201a
user:      Steve Borho <steve at borho.org>
date:      Sun Feb 01 16:07:30 2015 -0600
description:
pmode: do not call findJob() from task master, avoid double counting of time

This should make pmode stats more correct
Subject: [x265] Merge with stable

details:   http://hg.videolan.org/x265/rev/269cc414f218
branches:  
changeset: 9264:269cc414f218
user:      Steve Borho <steve at borho.org>
date:      Mon Feb 02 11:51:03 2015 -0600
description:
Merge with stable

diffstat:

 doc/reST/cli.rst                 |   10 +-
 doc/reST/threading.rst           |   13 +-
 source/CMakeLists.txt            |    5 +
 source/common/common.h           |    1 +
 source/common/deblock.cpp        |   41 ++--
 source/common/param.cpp          |   10 +-
 source/common/quant.cpp          |   22 +-
 source/common/quant.h            |    5 +-
 source/common/threading.h        |   19 ++-
 source/common/x86/blockcopy8.asm |  127 ++++++++------
 source/encoder/analysis.cpp      |   41 +++-
 source/encoder/encoder.cpp       |   76 +++++++-
 source/encoder/entropy.cpp       |  120 +++++++++----
 source/encoder/frameencoder.cpp  |   17 +-
 source/encoder/frameencoder.h    |    3 +
 source/encoder/ratecontrol.cpp   |    2 +-
 source/encoder/rdcost.h          |   10 +-
 source/encoder/search.cpp        |  331 ++++++++++++++++++++------------------
 source/encoder/search.h          |   81 +++++++++-
 source/test/pixelharness.cpp     |   43 +++++
 source/test/pixelharness.h       |    2 +
 source/x265.h                    |   33 ++-
 22 files changed, 670 insertions(+), 342 deletions(-)

diffs (truncated from 1904 to 300 lines):

diff -r bf257ba100c5 -r 269cc414f218 doc/reST/cli.rst

--- a/doc/reST/cli.rst	Thu Jan 29 10:10:30 2015 -0600
+++ b/doc/reST/cli.rst	Mon Feb 02 11:51:03 2015 -0600
@@ -171,6 +171,8 @@ Performance Options
 	Over-allocation of frame threads will not improve performance, it
 	will generally just increase memory use.
 
+	**Values:** any value between 8 and 16. Default is 0, auto-detect
+
 .. option:: --threads <integer>
 
 	Number of threads to allocate for the worker thread pool  This pool
@@ -792,7 +794,7 @@ areas of high motion.
 	energy of the source image in the encoded image at the expense of
 	compression efficiency. It only has effect on presets which use
 	RDO-based mode decisions (:option:`--rd` 3 and above).  1.0 is a
-	typical value. Default disabled.  Experimental
+	typical value. Default 1.0
 
 	**Range of values:** 0 .. 2.0
 
@@ -802,9 +804,9 @@ areas of high motion.
 	energy in the reconstructed image. This generally improves perceived
 	visual quality at the cost of lower quality metric scores.  It only
 	has effect on slower presets which use RDO Quantization
-	(:option:`--rd` 4, 5 and 6). 1.0 is a typical value. Default
-	disabled. High values can be beneficial in preserving high-frequency
-	detail like film grain. Experimental
+	(:option:`--rd` 4, 5 and 6). 1.0 is a typical value. High values can 
+	be beneficial in preserving high-frequency detail like film grain. 
+	Default: 1.0
 
 	**Range of values:** 0 .. 50.0
 
diff -r bf257ba100c5 -r 269cc414f218 doc/reST/threading.rst
--- a/doc/reST/threading.rst	Thu Jan 29 10:10:30 2015 -0600
+++ b/doc/reST/threading.rst	Mon Feb 02 11:51:03 2015 -0600
@@ -125,9 +125,14 @@ The second extenuating circumstance is t
 for motion reference must be processed by the loop filters and the loop
 filters cannot run until a full row has been encoded, and it must run a
 full row behind the encode process so that the pixels below the row
-being filtered are available. When you add up all the row lags each
-frame ends up being 3 CTU rows behind its reference frames (the
-equivalent of 12 macroblock rows for x264)
+being filtered are available. On top of this, HEVC has two loop filters:
+deblocking and SAO, which must be run in series with a row lag between
+them. When you add up all the row lags each frame ends up being 3 CTU
+rows behind its reference frames (the equivalent of 12 macroblock rows
+for x264). And keep in mind the wave-front progression pattern; by the
+time the reference frame finishes the third row of CTUs, nearly half of
+the CTUs in the frame may be compressed (depending on the display aspect
+ratio).
 
 The third extenuating circumstance is that when a frame being encoded
 becomes blocked by a reference frame row being available, that frame's
@@ -172,7 +177,7 @@ count, but may be manually specified via
 	+-------+--------+
 	| Cores | Frames |
 	+=======+========+
-	|  > 32 |   6    |
+	|  > 32 |  6..8  |
 	+-------+--------+
 	| >= 16 |   5    |
 	+-------+--------+
diff -r bf257ba100c5 -r 269cc414f218 source/CMakeLists.txt
--- a/source/CMakeLists.txt	Thu Jan 29 10:10:30 2015 -0600
+++ b/source/CMakeLists.txt	Mon Feb 02 11:51:03 2015 -0600
@@ -240,6 +240,11 @@ if(ENABLE_VTUNE)
     add_subdirectory(profile/vtune)
 endif(ENABLE_VTUNE)
 
+option(DETAILED_CU_STATS "Enable internal profiling of encoder work" OFF)
+if(DETAILED_CU_STATS)
+    add_definitions(-DDETAILED_CU_STATS)
+endif(DETAILED_CU_STATS)
+
 add_subdirectory(encoder)
 add_subdirectory(common)
 
diff -r bf257ba100c5 -r 269cc414f218 source/common/common.h
--- a/source/common/common.h	Thu Jan 29 10:10:30 2015 -0600
+++ b/source/common/common.h	Mon Feb 02 11:51:03 2015 -0600
@@ -281,6 +281,7 @@ typedef int16_t  coeff_t;      // transf
 
 #define MLS_GRP_NUM                 64 // Max number of coefficient groups, max(16, 64)
 #define MLS_CG_SIZE                 4  // Coefficient group size of 4x4
+#define MLS_CG_BLK_SIZE             (MLS_CG_SIZE * MLS_CG_SIZE)
 #define MLS_CG_LOG2_SIZE            2
 
 #define QUANT_IQUANT_SHIFT          20 // Q(QP%6) * IQ(QP%6) = 2^20
diff -r bf257ba100c5 -r 269cc414f218 source/common/deblock.cpp
--- a/source/common/deblock.cpp	Thu Jan 29 10:10:30 2015 -0600
+++ b/source/common/deblock.cpp	Mon Feb 02 11:51:03 2015 -0600
@@ -401,14 +401,22 @@ void Deblock::edgeFilterLuma(const CUDat
         if (!bs)
             continue;
 
-        int32_t qpQ = cuQ->m_qp[partQ];
-
         // Derive neighboring PU index
         uint32_t partP;
         const CUData* cuP = (dir == EDGE_VER ? cuQ->getPULeft(partP, partQ) : cuQ->getPUAbove(partP, partQ));
 
+        if (bCheckNoFilter)
+        {
+            // check if each of PUs is lossless coded
+            maskP = cuP->m_tqBypass[partP] - 1;
+            maskQ = cuQ->m_tqBypass[partQ] - 1;
+            if (!(maskP | maskQ))
+                continue;
+        }
+
+        int32_t qpQ = cuQ->m_qp[partQ];
         int32_t qpP = cuP->m_qp[partP];
-        int32_t qp = (qpP + qpQ + 1) >> 1;
+        int32_t qp  = (qpP + qpQ + 1) >> 1;
 
         int32_t indexB = x265_clip3(0, QP_MAX_SPEC, qp + betaOffset);
 
@@ -428,13 +436,6 @@ void Deblock::edgeFilterLuma(const CUDat
         if (d >= beta)
             continue;
 
-        if (bCheckNoFilter)
-        {
-            // check if each of PUs is lossless coded
-            maskP = (cuP->m_tqBypass[partP] ? 0 : -1);
-            maskQ = (cuQ->m_tqBypass[partQ] ? 0 : -1);
-        }
-
         int32_t indexTC = x265_clip3(0, QP_MAX_SPEC + DEFAULT_INTRA_TC_OFFSET, int32_t(qp + DEFAULT_INTRA_TC_OFFSET * (bs - 1) + tcOffset));
         int32_t tc = s_tcTable[indexTC] << bitdepthShift;
 
@@ -506,33 +507,29 @@ void Deblock::edgeFilterChroma(const CUD
         if (bs <= 1)
             continue;
 
-        int32_t qpQ = cuQ->m_qp[partQ];
-
         // Derive neighboring PU index
         uint32_t partP;
         const CUData* cuP = (dir == EDGE_VER ? cuQ->getPULeft(partP, partQ) : cuQ->getPUAbove(partP, partQ));
 
-        int32_t qpP = cuP->m_qp[partP];
-
         if (bCheckNoFilter)
         {
             // check if each of PUs is lossless coded
             maskP = (cuP->m_tqBypass[partP] ? 0 : -1);
             maskQ = (cuQ->m_tqBypass[partQ] ? 0 : -1);
+            if (!(maskP | maskQ))
+                continue;
         }
 
+        int32_t qpQ = cuQ->m_qp[partQ];
+        int32_t qpP = cuP->m_qp[partP];
+        int32_t qpA = (qpP + qpQ + 1) >> 1;
+
         intptr_t unitOffset = idx * srcStep << LOG2_UNIT_SIZE;
         for (uint32_t chromaIdx = 0; chromaIdx < 2; chromaIdx++)
         {
-            int32_t chromaQPOffset  = pps->chromaQpOffset[chromaIdx];
-            int32_t qp = ((qpP + qpQ + 1) >> 1) + chromaQPOffset;
+            int32_t qp = qpA + pps->chromaQpOffset[chromaIdx];
             if (qp >= 30)
-            {
-                if (chFmt == X265_CSP_I420)
-                    qp = g_chromaScale[qp];
-                else
-                    qp = X265_MIN(qp, 51);
-            }
+                qp = chFmt == X265_CSP_I420 ? g_chromaScale[qp] : X265_MIN(qp, QP_MAX_SPEC);
 
             int32_t indexTC = x265_clip3(0, QP_MAX_SPEC + DEFAULT_INTRA_TC_OFFSET, int32_t(qp + DEFAULT_INTRA_TC_OFFSET + tcOffset));
             const int32_t bitdepthShift = X265_DEPTH - 8;
diff -r bf257ba100c5 -r 269cc414f218 source/common/param.cpp
--- a/source/common/param.cpp	Thu Jan 29 10:10:30 2015 -0600
+++ b/source/common/param.cpp	Mon Feb 02 11:51:03 2015 -0600
@@ -174,8 +174,8 @@ void x265_param_default(x265_param *para
     param->cbQpOffset = 0;
     param->crQpOffset = 0;
     param->rdPenalty = 0;
-    param->psyRd = 0.0;
-    param->psyRdoq = 0.0;
+    param->psyRd = 1.0;
+    param->psyRdoq = 1.0;
     param->analysisMode = 0;
     param->analysisFileName = NULL;
     param->bIntraInBFrames = 0;
@@ -963,7 +963,7 @@ int x265_check_params(x265_param *param)
           "x265 was compiled for 8bit encodes, only 8bit internal depth supported");
 #endif
 
-    CHECK(param->rc.qp < -6 * (param->internalBitDepth - 8) || param->rc.qp > 51,
+    CHECK(param->rc.qp < -6 * (param->internalBitDepth - 8) || param->rc.qp > QP_MAX_SPEC,
           "QP exceeds supported range (-QpBDOffsety to 51)");
     CHECK(param->fpsNum == 0 || param->fpsDenom == 0,
           "Frame rate numerator and denominator must be specified");
@@ -979,8 +979,8 @@ int x265_check_params(x265_param *param)
           "subme must be less than or equal to X265_MAX_SUBPEL_LEVEL (7)");
     CHECK(param->subpelRefine < 0,
           "subme must be greater than or equal to 0");
-    CHECK(param->frameNumThreads < 0,
-          "frameNumThreads (--frame-threads) must be 0 or higher");
+    CHECK(param->frameNumThreads < 0 || param->frameNumThreads > X265_MAX_FRAME_THREADS,
+          "frameNumThreads (--frame-threads) must be [0 .. X265_MAX_FRAME_THREADS)");
     CHECK(param->cbQpOffset < -12, "Min. Chroma Cb QP Offset is -12");
     CHECK(param->cbQpOffset >  12, "Max. Chroma Cb QP Offset is  12");
     CHECK(param->crQpOffset < -12, "Min. Chroma Cr QP Offset is -12");
diff -r bf257ba100c5 -r 269cc414f218 source/common/quant.cpp
--- a/source/common/quant.cpp	Thu Jan 29 10:10:30 2015 -0600
+++ b/source/common/quant.cpp	Mon Feb 02 11:51:03 2015 -0600
@@ -169,6 +169,7 @@ bool Quant::init(bool useRDOQ, double ps
     m_resiDctCoeff = X265_MALLOC(int16_t, MAX_TR_SIZE * MAX_TR_SIZE * 2);
     m_fencDctCoeff = m_resiDctCoeff + (MAX_TR_SIZE * MAX_TR_SIZE);
     m_fencShortBuf = X265_MALLOC(int16_t, MAX_TR_SIZE * MAX_TR_SIZE);
+    m_tqBypass = false;
 
     return m_resiDctCoeff && m_fencShortBuf;
 }
@@ -190,13 +191,16 @@ Quant::~Quant()
     X265_FREE(m_fencShortBuf);
 }
 
-void Quant::setQPforQuant(const CUData& ctu)
+void Quant::setQPforQuant(const CUData& cu)
 {
-    m_nr = m_frameNr ? &m_frameNr[ctu.m_encData->m_frameEncoderID] : NULL;
-    int qpy = ctu.m_qp[0];
+    m_tqBypass = !!cu.m_tqBypass[0];
+    if (m_tqBypass)
+        return;
+    m_nr = m_frameNr ? &m_frameNr[cu.m_encData->m_frameEncoderID] : NULL;
+    int qpy = cu.m_qp[0];
     m_qpParam[TEXT_LUMA].setQpParam(qpy + QP_BD_OFFSET);
-    setChromaQP(qpy + ctu.m_slice->m_pps->chromaQpOffset[0], TEXT_CHROMA_U, ctu.m_chromaFormat);
-    setChromaQP(qpy + ctu.m_slice->m_pps->chromaQpOffset[1], TEXT_CHROMA_V, ctu.m_chromaFormat);
+    setChromaQP(qpy + cu.m_slice->m_pps->chromaQpOffset[0], TEXT_CHROMA_U, cu.m_chromaFormat);
+    setChromaQP(qpy + cu.m_slice->m_pps->chromaQpOffset[1], TEXT_CHROMA_V, cu.m_chromaFormat);
 }
 
 void Quant::setChromaQP(int qpin, TextType ttype, int chFmt)
@@ -207,7 +211,7 @@ void Quant::setChromaQP(int qpin, TextTy
         if (chFmt == X265_CSP_I420)
             qp = g_chromaScale[qp];
         else
-            qp = X265_MIN(qp, 51);
+            qp = X265_MIN(qp, QP_MAX_SPEC);
     }
     m_qpParam[ttype].setQpParam(qp + QP_BD_OFFSET);
 }
@@ -326,7 +330,7 @@ uint32_t Quant::transformNxN(const CUDat
                              coeff_t* coeff, uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx, bool useTransformSkip)
 {
     const uint32_t sizeIdx = log2TrSize - 2;
-    if (cu.m_tqBypass[absPartIdx])
+    if (m_tqBypass)
     {
         X265_CHECK(log2TrSize >= 2 && log2TrSize <= 5, "Block size mistake!\n");
         return primitives.cu[sizeIdx].copy_cnt(coeff, residual, resiStride);
@@ -406,11 +410,11 @@ uint32_t Quant::transformNxN(const CUDat
     }
 }
 
-void Quant::invtransformNxN(bool transQuantBypass, int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
+void Quant::invtransformNxN(int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
                             uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig)
 {
     const uint32_t sizeIdx = log2TrSize - 2;
-    if (transQuantBypass)
+    if (m_tqBypass)
     {
         primitives.cu[sizeIdx].cpy1Dto2D_shl(residual, coeff, resiStride, 0);
         return;
diff -r bf257ba100c5 -r 269cc414f218 source/common/quant.h
--- a/source/common/quant.h	Thu Jan 29 10:10:30 2015 -0600
+++ b/source/common/quant.h	Mon Feb 02 11:51:03 2015 -0600
@@ -93,6 +93,7 @@ public:
 
     NoiseReduction*    m_nr;
     NoiseReduction*    m_frameNr; // Array of NR structures, one for each frameEncoder
+    bool               m_tqBypass;
 
     Quant();
     ~Quant();
@@ -102,12 +103,12 @@ public:
     bool allocNoiseReduction(const x265_param& param);
 
     /* CU setup */
-    void setQPforQuant(const CUData& ctu);
+    void setQPforQuant(const CUData& cu);
 
     uint32_t transformNxN(const CUData& cu, const pixel* fenc, uint32_t fencStride, const int16_t* residual, uint32_t resiStride, coeff_t* coeff,
                           uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx, bool useTransformSkip);
 
-    void invtransformNxN(bool transQuantBypass, int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
+    void invtransformNxN(int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
                          uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig);
 
     /* static methods shared with entropy.cpp */