[x265-commits] [x265] analysis: remove redundant argument in compressIntraCU

Gopu Govindaswamy gopu at multicorewareinc.com
Wed Jan 7 12:52:10 CET 2015


details:   http://hg.videolan.org/x265/rev/c4ec3f22846b
branches:  
changeset: 8998:c4ec3f22846b
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Tue Dec 23 11:43:32 2014 +0530
description:
analysis: remove redundant argument in compressIntraCU
Subject: [x265] encoder: allocate memory for inter and intra analysis data based on slicetype

details:   http://hg.videolan.org/x265/rev/9fdab427a191
branches:  
changeset: 8999:9fdab427a191
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Tue Dec 23 12:17:08 2014 +0530
description:
encoder: allocate memory for inter and intra analysis data based on slicetype
Subject: [x265] rdcost: unify scaleChromaDist*()

details:   http://hg.videolan.org/x265/rev/5f9f7194267b
branches:  
changeset: 9000:5f9f7194267b
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Tue Dec 23 17:40:53 2014 +0900
description:
rdcost: unify scaleChromaDist*()
Subject: [x265] entropy: inline codeTransformSkipFlags()

details:   http://hg.videolan.org/x265/rev/1bf769c6953d
branches:  
changeset: 9001:1bf769c6953d
user:      Ashok Kumar Mishra<ashok at multicorewareinc.com>
date:      Wed Dec 24 12:31:27 2014 +0530
description:
entropy: inline codeTransformSkipFlags()
Subject: [x265] nal: VPS startCodeprefix needs 4 bytes

details:   http://hg.videolan.org/x265/rev/3a77bd71239f
branches:  
changeset: 9002:3a77bd71239f
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Mon Dec 29 16:13:16 2014 +0530
description:
nal: VPS startCodeprefix needs 4 bytes

Issue pointed out by Zakk Saito, Pegasys
Subject: [x265] sei: m_lastBPSEI is overwritten each time

details:   http://hg.videolan.org/x265/rev/92624efa63a3
branches:  
changeset: 9003:92624efa63a3
user:      Deepthi Nandakumar <deepthi at multicorewareinc.com>
date:      Mon Dec 29 16:57:13 2014 +0530
description:
sei: m_lastBPSEI is overwritten each time

Issue reported by Zakk Saito, Pegasys
Subject: [x265] refine intra neighbors

details:   http://hg.videolan.org/x265/rev/38d2d0878acd
branches:  
changeset: 9004:38d2d0878acd
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Thu Dec 25 13:15:56 2014 +0900
description:
refine intra neighbors
Subject: [x265] Added cmake support to pass along build flags to yasm.

details:   http://hg.videolan.org/x265/rev/143f81ed72f1
branches:  
changeset: 9005:143f81ed72f1
user:      David T Yuen <dtyx265 at gmail.com>
date:      Mon Dec 29 13:31:20 2014 -0800
description:
Added cmake support to pass along build flags to yasm.

This is particularly helpful when debugging.
Subject: [x265] asm & testbench: psyCost_pp_4x4 in sse4: improve 2088c->337c

details:   http://hg.videolan.org/x265/rev/32ed3f21039a
branches:  
changeset: 9006:32ed3f21039a
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Mon Dec 29 13:49:02 2014 +0530
description:
asm & testbench: psyCost_pp_4x4 in sse4: improve 2088c->337c
Subject: [x265] asm: psyCost_pp_8x8 in sse4: improve 6425c->928c

details:   http://hg.videolan.org/x265/rev/5dc8e5bf8770
branches:  
changeset: 9007:5dc8e5bf8770
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Dec 30 14:35:08 2014 +0530
description:
asm: psyCost_pp_8x8 in sse4: improve 6425c->928c
Subject: [x265] asm: psyCost_pp_8x8 for HIGH_BIT_DEPTH in sse4: improve 6995c->1070c

details:   http://hg.videolan.org/x265/rev/ee5a40def3c9
branches:  
changeset: 9008:ee5a40def3c9
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Tue Dec 30 17:18:04 2014 +0530
description:
asm: psyCost_pp_8x8 for HIGH_BIT_DEPTH in sse4: improve 6995c->1070c
Subject: [x265] saoCuOrgE0 asm code, improved 500.43 -> 466.58

details:   http://hg.videolan.org/x265/rev/d01d3fad8fbb
branches:  
changeset: 9009:d01d3fad8fbb
user:      Praveen Tiwari
date:      Mon Dec 22 16:42:51 2014 +0530
description:
saoCuOrgE0 asm code, improved 500.43 -> 466.58
Subject: [x265] _upBuff1: scale down from int32_t to int8_t

details:   http://hg.videolan.org/x265/rev/648b0f1de393
branches:  
changeset: 9010:648b0f1de393
user:      Praveen Tiwari
date:      Tue Dec 23 13:07:09 2014 +0530
description:
_upBuff1: scale down from int32_t to int8_t
Subject: [x265] added calSign primitive, improved 2316.99 -> 233.63 (9.92x) over C code

details:   http://hg.videolan.org/x265/rev/2014ed669cbe
branches:  
changeset: 9011:2014ed669cbe
user:      Praveen Tiwari
date:      Tue Dec 30 19:38:00 2014 +0530
description:
added calSign primitive, improved 2316.99 -> 233.63 (9.92x) over C code

Calsign primitive will be utilized to optimize various SAO algorithm switch cases
Subject: [x265] asm: fix error in psyCost_pp_8x8

details:   http://hg.videolan.org/x265/rev/4b49370f56b3
branches:  
changeset: 9012:4b49370f56b3
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Wed Dec 31 11:24:09 2014 +0530
description:
asm: fix error in psyCost_pp_8x8
Subject: [x265] update restrict in weight_pp_c reference code

details:   http://hg.videolan.org/x265/rev/5409290f1a97
branches:  
changeset: 9013:5409290f1a97
user:      Min Chen <chenm003 at 163.com>
date:      Wed Dec 31 14:19:10 2014 +0800
description:
update restrict in weight_pp_c reference code
Subject: [x265] asm: psyCost_pp_16x16 in sse4: improve 27086c->3566c

details:   http://hg.videolan.org/x265/rev/8379ff016b56
branches:  
changeset: 9014:8379ff016b56
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Wed Dec 31 12:55:54 2014 +0530
description:
asm: psyCost_pp_16x16 in sse4: improve 27086c->3566c
Subject: [x265] common: unify clip templates, no output changes

details:   http://hg.videolan.org/x265/rev/9b553540a49b
branches:  
changeset: 9015:9b553540a49b
user:      Steve Borho <steve at borho.org>
date:      Fri Jan 02 13:51:47 2015 +0530
description:
common: unify clip templates, no output changes

replace Clip3 users with x265_clip3 and Clip with x265_clip. This commit cleans
up a few places that were using x265_clip3 when they could have used x265_clip
and some that were using 0 instead of QP_MIN (code written before QP_MIN
existed)
Subject: [x265] intrapred: nits

details:   http://hg.videolan.org/x265/rev/c49a8f17ede7
branches:  
changeset: 9016:c49a8f17ede7
user:      Steve Borho <steve at borho.org>
date:      Fri Jan 02 13:51:58 2015 +0530
description:
intrapred: nits
Subject: [x265] calSign: clarify constness

details:   http://hg.videolan.org/x265/rev/8b077b2b3c46
branches:  
changeset: 9017:8b077b2b3c46
user:      Praveen Tiwari
date:      Fri Jan 02 17:03:33 2015 +0530
description:
calSign: clarify constness
Subject: [x265] _upBuff1: scale down fron int32_t to int8_t

details:   http://hg.videolan.org/x265/rev/76a295779186
branches:  
changeset: 9018:76a295779186
user:      Praveen Tiwari
date:      Wed Dec 24 18:27:46 2014 +0530
description:
_upBuff1: scale down fron int32_t to int8_t
Subject: [x265] asm: fix denoise assembly following int32->int16 coeff change, re-enable it

details:   http://hg.videolan.org/x265/rev/5363cc9dcb04
branches:  
changeset: 9019:5363cc9dcb04
user:      Steve Borho <steve at borho.org>
date:      Sat Jan 03 11:37:12 2015 +0530
description:
asm: fix denoise assembly following int32->int16 coeff change, re-enable it

Bug fix was from Min Chen
Subject: [x265] SAO_EO_1: sign asm code integration

details:   http://hg.videolan.org/x265/rev/df1b82c03bb7
branches:  
changeset: 9020:df1b82c03bb7
user:      Praveen Tiwari
date:      Fri Jan 02 17:56:53 2015 +0530
description:
SAO_EO_1: sign asm code integration
Subject: [x265] SAO_EO_2: sign asm code integration

details:   http://hg.videolan.org/x265/rev/59fb9845c749
branches:  
changeset: 9021:59fb9845c749
user:      Praveen Tiwari
date:      Fri Jan 02 17:35:36 2015 +0530
description:
SAO_EO_2: sign asm code integration
Subject: [x265] calcSaoStatsCu, SAO_EO_1: sign asm code integration

details:   http://hg.videolan.org/x265/rev/f255e8d06423
branches:  
changeset: 9022:f255e8d06423
user:      Praveen Tiwari
date:      Fri Jan 02 18:22:38 2015 +0530
description:
calcSaoStatsCu, SAO_EO_1: sign asm code integration
Subject: [x265] asm: psyCost_pp_32x32 in sse4: improve 110849c->13373c

details:   http://hg.videolan.org/x265/rev/79b94534c12e
branches:  
changeset: 9023:79b94534c12e
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Mon Jan 05 10:34:14 2015 +0530
description:
asm: psyCost_pp_32x32 in sse4: improve 110849c->13373c
Subject: [x265] asm: psyCost_pp_64x64 in sse4: improve 417824c->56347c

details:   http://hg.videolan.org/x265/rev/b117e003625b
branches:  
changeset: 9024:b117e003625b
user:      Divya Manivannan <divya at multicorewareinc.com>
date:      Mon Jan 05 10:49:33 2015 +0530
description:
asm: psyCost_pp_64x64 in sse4: improve 417824c->56347c
Subject: [x265] testbench: fix bug in generate weight input data

details:   http://hg.videolan.org/x265/rev/935eb3505548
branches:  
changeset: 9025:935eb3505548
user:      Min Chen <chenm003 at 163.com>
date:      Mon Jan 05 16:20:07 2015 +0800
description:
testbench: fix bug in generate weight input data
Subject: [x265] encoder: initialize analysis data to null

details:   http://hg.videolan.org/x265/rev/e3039bcc217a
branches:  
changeset: 9026:e3039bcc217a
user:      Gopu Govindaswamy <gopu at multicorewareinc.com>
date:      Mon Jan 05 12:25:37 2015 +0530
description:
encoder: initialize analysis data to null
Subject: [x265] saoCuOrgB0: asm code

details:   http://hg.videolan.org/x265/rev/a70b51601009
branches:  
changeset: 9027:a70b51601009
user:      Praveen Tiwari
date:      Mon Jan 05 13:42:27 2015 +0530
description:
saoCuOrgB0: asm code
Subject: [x265] sao.cpp: fixed compiler warnings

details:   http://hg.videolan.org/x265/rev/0b6c83da1747
branches:  
changeset: 9028:0b6c83da1747
user:      Praveen Tiwari
date:      Mon Jan 05 10:57:01 2015 +0530
description:
sao.cpp: fixed compiler warnings
Subject: [x265] sao.cpp: fixed shadow warnings

details:   http://hg.videolan.org/x265/rev/feebd0ecda69
branches:  
changeset: 9029:feebd0ecda69
user:      Praveen Tiwari
date:      Mon Jan 05 18:57:20 2015 +0530
description:
sao.cpp: fixed shadow warnings
Subject: [x265] fix weightCost() [CHANGES OUTPUT]

details:   http://hg.videolan.org/x265/rev/6bbb39f3272e
branches:  
changeset: 9030:6bbb39f3272e
user:      Satoshi Nakagawa <nakagawa424 at oki.com>
date:      Tue Jan 06 11:29:49 2015 +0900
description:
fix weightCost() [CHANGES OUTPUT]
Subject: [x265] encoder: disable WPP if not enough columns

details:   http://hg.videolan.org/x265/rev/aa91ea065999
branches:  
changeset: 9031:aa91ea065999
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 06 12:15:33 2015 +0530
description:
encoder: disable WPP if not enough columns

Prevents deadlocks if there are not enough CTUs to start the second row. This
exposes the next problem, which is a crash in deblocking that needs to resolved
after this fix.
Subject: [x265] frameencoder: skip active/busy row checks if WPP is disabled

details:   http://hg.videolan.org/x265/rev/95f1e1f0efa4
branches:  
changeset: 9032:95f1e1f0efa4
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 06 12:33:36 2015 +0530
description:
frameencoder: skip active/busy row checks if WPP is disabled

There is no row parallelism, so these flags are not necesssarily accurate. This
was causing rows to be skipped when there were not enough columns to set active
flags.
Subject: [x265] slicetype: allow queue to fill past full to prevent bottlenecks

details:   http://hg.videolan.org/x265/rev/d36211d0190f
branches:  
changeset: 9033:d36211d0190f
user:      Steve Borho <steve at borho.org>
date:      Tue Jan 06 15:38:58 2015 +0530
description:
slicetype: allow queue to fill past full to prevent bottlenecks

Allow the lookahead to grow just past full before we begin pulling off output
frames and handing them to frame encoders.  This lag of about one mini-gop
allows slicetypeDecide to stay ahead of the frame encoders and always have
frames in the output queue when they are needed.  It's a non-trivial performance
boost for most presets that used b-adapt 2.
Subject: [x265] saoCuOrgE2: asm code

details:   http://hg.videolan.org/x265/rev/a3c9b1ed90bf
branches:  
changeset: 9034:a3c9b1ed90bf
user:      Praveen Tiwari
date:      Tue Jan 06 16:17:36 2015 +0530
description:
saoCuOrgE2: asm code
Subject: [x265] entropy: modified last coefficient position encoding in codeCoeffNxN()

details:   http://hg.videolan.org/x265/rev/357ec738fb0c
branches:  
changeset: 9035:357ec738fb0c
user:      Ashok Kumar Mishra<ashok at multicorewareinc.com>
date:      Tue Jan 06 15:39:58 2015 +0530
description:
entropy: modified last coefficient position encoding in codeCoeffNxN()
Subject: [x265] asm: remove redundant alias, this is handled by Setup_Alias_Primitives()

details:   http://hg.videolan.org/x265/rev/b73dbd4a68ef
branches:  
changeset: 9036:b73dbd4a68ef
user:      Steve Borho <steve at borho.org>
date:      Wed Jan 07 15:07:35 2015 +0530
description:
asm: remove redundant alias, this is handled by Setup_Alias_Primitives()
Subject: [x265] primitives: avoid alias chain, direcly alias base primitive

details:   http://hg.videolan.org/x265/rev/ee358bb8ea82
branches:  
changeset: 9037:ee358bb8ea82
user:      Steve Borho <steve at borho.org>
date:      Wed Jan 07 15:18:13 2015 +0530
description:
primitives: avoid alias chain, direcly alias base primitive
Subject: [x265] Quant: modified rate cost calculation of last significant coefficient

details:   http://hg.videolan.org/x265/rev/e1fb89002fa6
branches:  
changeset: 9038:e1fb89002fa6
user:      Ashok Kumar Mishra<ashok at multicorewareinc.com>
date:      Wed Jan 07 15:55:07 2015 +0530
description:
Quant: modified rate cost calculation of last significant coefficient
Subject: [x265] asm: saoCuOrgE1 asm code

details:   http://hg.videolan.org/x265/rev/fa1c745d82c8
branches:  
changeset: 9039:fa1c745d82c8
user:      Nabajit Deka
date:      Wed Jan 07 13:44:23 2015 +0530
description:
asm: saoCuOrgE1 asm code
Subject: [x265] asm: saoCuOrgE3 asm code

details:   http://hg.videolan.org/x265/rev/191b430b2e55
branches:  
changeset: 9040:191b430b2e55
user:      Nabajit Deka
date:      Wed Jan 07 14:18:11 2015 +0530
description:
asm: saoCuOrgE3 asm code
Subject: [x265] sao: merge saoCuOrgE3 asm with encoder along with sign asm code integration

details:   http://hg.videolan.org/x265/rev/ff32d97fe59c
branches:  
changeset: 9041:ff32d97fe59c
user:      Nabajit Deka
date:      Wed Jan 07 14:27:33 2015 +0530
description:
sao: merge saoCuOrgE3 asm with encoder along with sign asm code integration

diffstat:

 source/cmake/CMakeASM_YASMInformation.cmake |   13 +
 source/common/common.h                      |   18 +-
 source/common/constants.cpp                 |   11 +-
 source/common/constants.h                   |    2 +-
 source/common/cudata.cpp                    |   30 +-
 source/common/cudata.h                      |    1 -
 source/common/dct.cpp                       |   40 +-
 source/common/deblock.cpp                   |   38 +-
 source/common/intrapred.cpp                 |   18 +-
 source/common/loopfilter.cpp                |   83 ++
 source/common/param.cpp                     |    2 +-
 source/common/pixel.cpp                     |   16 +-
 source/common/predict.cpp                   |  118 +-
 source/common/predict.h                     |   16 +-
 source/common/primitives.cpp                |    2 +-
 source/common/primitives.h                  |   10 +
 source/common/quant.cpp                     |   40 +-
 source/common/quant.h                       |   22 -
 source/common/x86/asm-primitives.cpp        |   25 +-
 source/common/x86/const-a.asm               |    1 +
 source/common/x86/dct8.asm                  |    4 +-
 source/common/x86/loopfilter.asm            |  342 ++++++++-
 source/common/x86/loopfilter.h              |    6 +
 source/common/x86/pixel-a.asm               |  978 ++++++++++++++++++++++++++++
 source/common/x86/pixel.h                   |    5 +
 source/encoder/analysis.cpp                 |   22 +-
 source/encoder/analysis.h                   |    2 +-
 source/encoder/encoder.cpp                  |   63 +-
 source/encoder/entropy.cpp                  |  143 +--
 source/encoder/entropy.h                    |    5 +-
 source/encoder/frameencoder.cpp             |   13 +-
 source/encoder/nal.cpp                      |    2 +-
 source/encoder/ratecontrol.cpp              |   72 +-
 source/encoder/ratecontrol.h                |    2 +-
 source/encoder/rdcost.h                     |   32 +-
 source/encoder/sao.cpp                      |  246 +++++-
 source/encoder/sao.h                        |    3 +-
 source/encoder/search.cpp                   |  410 ++++++-----
 source/encoder/search.h                     |    6 +-
 source/encoder/slicetype.cpp                |  140 ++-
 source/encoder/slicetype.h                  |   14 +-
 source/encoder/weightPrediction.cpp         |   28 +-
 source/filters/filters.cpp                  |    2 +-
 source/test/ipfilterharness.cpp             |    4 +-
 source/test/pixelharness.cpp                |  275 +++++++-
 source/test/pixelharness.h                  |   10 +
 46 files changed, 2548 insertions(+), 787 deletions(-)

diffs (truncated from 5406 to 300 lines):

diff -r 8d2f418829c8 -r ff32d97fe59c source/cmake/CMakeASM_YASMInformation.cmake
--- a/source/cmake/CMakeASM_YASMInformation.cmake	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/cmake/CMakeASM_YASMInformation.cmake	Wed Jan 07 14:27:33 2015 +0530
@@ -35,6 +35,19 @@ if(HIGH_BIT_DEPTH)
 else()
     list(APPEND ASM_FLAGS -DHIGH_BIT_DEPTH=0 -DBIT_DEPTH=8)
 endif()
+
+list(APPEND ASM_FLAGS "${CMAKE_ASM_YASM_FLAGS}")
+
+if(CMAKE_BUILD_TYPE MATCHES Release)
+    list(APPEND ASM_FLAGS "${CMAKE_ASM_YASM_FLAGS_RELEASE}")
+elseif(CMAKE_BUILD_TYPE MATCHES Debug)
+    list(APPEND ASM_FLAGS "${CMAKE_ASM_YASM_FLAGS_DEBUG}")
+elseif(CMAKE_BUILD_TYPE MATCHES MinSizeRel)
+    list(APPEND ASM_FLAGS "${CMAKE_ASM_YASM_FLAGS_MINSIZEREL}")
+elseif(CMAKE_BUILD_TYPE MATCHES RelWithDebInfo)
+    list(APPEND ASM_FLAGS "${CMAKE_ASM_YASM_FLAGS_RELWITHDEBINFO}")
+endif()
+
 set(YASM_FLAGS ${ARGS} ${ASM_FLAGS} PARENT_SCOPE)
 string(REPLACE ";" " " CMAKE_ASM_YASM_COMPILER_ARG1 "${ARGS}")
 
diff -r 8d2f418829c8 -r ff32d97fe59c source/common/common.h
--- a/source/common/common.h	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/common/common.h	Wed Jan 07 14:27:33 2015 +0530
@@ -146,23 +146,17 @@ typedef int32_t  ssum2_t;      //Signed 
 #define BITS_FOR_POC 8
 
 template<typename T>
-inline pixel Clip(T x)
-{
-    return (pixel)std::min<T>(T((1 << X265_DEPTH) - 1), std::max<T>(T(0), x));
-}
-
-template<typename T>
-inline T Clip3(T minVal, T maxVal, T a)
-{
-    return std::min<T>(std::max<T>(minVal, a), maxVal);
-}
-
-template<typename T>
 inline T x265_min(T a, T b) { return a < b ? a : b; }
 
 template<typename T>
 inline T x265_max(T a, T b) { return a > b ? a : b; }
 
+template<typename T>
+inline T x265_clip3(T minVal, T maxVal, T a) { return x265_min(x265_max(minVal, a), maxVal); }
+
+template<typename T> /* clip to pixel range, 0..255 or 0..1023 */
+inline pixel x265_clip(T x) { return (pixel)x265_min<T>(T((1 << X265_DEPTH) - 1), x265_max<T>(T(0), x)); }
+
 typedef int16_t  coeff_t;      // transform coefficient
 
 #define X265_MIN(a, b) ((a) < (b) ? (a) : (b))
diff -r 8d2f418829c8 -r ff32d97fe59c source/common/constants.cpp
--- a/source/common/constants.cpp	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/common/constants.cpp	Wed Jan 07 14:27:33 2015 +0530
@@ -412,7 +412,16 @@ const uint16_t* const g_scanOrderCG[NUM_
     { g_scan4x4[2], g_scan2x2[0], g_scan4x4[0], g_scan8x8diag }
 };
 
-const uint8_t g_minInGroup[10] = { 0, 1, 2, 3, 4, 6, 8, 12, 16, 24 };
+// Table used for encoding the last coefficient position. The index is the position.
+// The low 4 bits are the number of "1" in the prefix and the high 4 bits are the number
+// of bits in the suffix.
+const uint8_t g_lastCoeffTable[32] =
+{
+    0x00, 0x01, 0x02, 0x03, 0x14, 0x14, 0x15, 0x15,
+    0x26, 0x26, 0x26, 0x26, 0x27, 0x27, 0x27, 0x27,
+    0x38, 0x38, 0x38, 0x38, 0x38, 0x38, 0x38, 0x38,
+    0x39, 0x39, 0x39, 0x39, 0x39, 0x39, 0x39, 0x39,
+};
 
 // Rice parameters for absolute transform levels
 const uint8_t g_goRiceRange[5] = { 7, 14, 26, 46, 78 };
diff -r 8d2f418829c8 -r ff32d97fe59c source/common/constants.h
--- a/source/common/constants.h	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/common/constants.h	Wed Jan 07 14:27:33 2015 +0530
@@ -83,7 +83,7 @@ extern const uint16_t* const g_scanOrder
 extern const uint16_t g_scan8x8diag[8 * 8];
 extern const uint16_t g_scan4x4[NUM_SCAN_TYPE][4 * 4];
 
-extern const uint8_t g_minInGroup[10];
+extern const uint8_t g_lastCoeffTable[32];
 extern const uint8_t g_goRiceRange[5]; // maximum value coded with Rice codes
 
 // CABAC tables
diff -r 8d2f418829c8 -r ff32d97fe59c source/common/cudata.cpp
--- a/source/common/cudata.cpp	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/common/cudata.cpp	Wed Jan 07 14:27:33 2015 +0530
@@ -106,8 +106,8 @@ inline bool lessThanRow(int addr, int va
 
 inline MV scaleMv(MV mv, int scale)
 {
-    int mvx = Clip3(-32768, 32767, (scale * mv.x + 127 + (scale * mv.x < 0)) >> 8);
-    int mvy = Clip3(-32768, 32767, (scale * mv.y + 127 + (scale * mv.y < 0)) >> 8);
+    int mvx = x265_clip3(-32768, 32767, (scale * mv.x + 127 + (scale * mv.x < 0)) >> 8);
+    int mvy = x265_clip3(-32768, 32767, (scale * mv.y + 127 + (scale * mv.y < 0)) >> 8);
 
     return MV((int16_t)mvx, (int16_t)mvy);
 }
@@ -608,7 +608,7 @@ const CUData* CUData::getPUAboveRight(ui
         {
             if (curPartUnitIdx > g_rasterToZscan[absPartIdxRT - s_numPartInCUSize + 1])
             {
-                uint32_t absZorderCUIdx  = g_zscanToRaster[m_absIdxInCTU] + (1 << (m_log2CUSize[0] - LOG2_UNIT_SIZE)) - 1;
+                uint32_t absZorderCUIdx = g_zscanToRaster[m_absIdxInCTU] + (1 << (m_log2CUSize[0] - LOG2_UNIT_SIZE)) - 1;
                 arPartUnitIdx = g_rasterToZscan[absPartIdxRT - s_numPartInCUSize + 1];
                 if (isEqualRowOrCol(absPartIdxRT, absZorderCUIdx, s_numPartInCUSize))
                     return m_encData->getPicCTU(m_cuAddr);
@@ -689,8 +689,6 @@ const CUData* CUData::getPUBelowLeftAdi(
             return NULL;
         }
         blPartUnitIdx = g_rasterToZscan[absPartIdxLB + (1 + partUnitOffset) * s_numPartInCUSize - 1];
-        if (!m_cuLeft || !m_cuLeft->m_slice)
-            return NULL;
         return m_cuLeft;
     }
 
@@ -723,8 +721,6 @@ const CUData* CUData::getPUAboveRightAdi
             return NULL;
         }
         arPartUnitIdx = g_rasterToZscan[absPartIdxRT + NUM_CU_PARTITIONS - s_numPartInCUSize + partUnitOffset];
-        if (!m_cuAbove || !m_cuAbove->m_slice)
-            return NULL;
         return m_cuAbove;
     }
 
@@ -732,8 +728,6 @@ const CUData* CUData::getPUAboveRightAdi
         return NULL;
 
     arPartUnitIdx = g_rasterToZscan[NUM_CU_PARTITIONS - s_numPartInCUSize + partUnitOffset - 1];
-    if ((m_cuAboveRight == NULL || m_cuAboveRight->m_slice == NULL || (m_cuAboveRight->m_cuAddr) > m_cuAddr))
-        return NULL;
     return m_cuAboveRight;
 }
 
@@ -904,7 +898,7 @@ void CUData::getIntraTUQtDepthRange(uint
     tuDepthRange[0] = m_slice->m_sps->quadtreeTULog2MinSize;
     tuDepthRange[1] = m_slice->m_sps->quadtreeTULog2MaxSize;
 
-    tuDepthRange[0] = X265_MAX(tuDepthRange[0], X265_MIN(log2CUSize - (m_slice->m_sps->quadtreeTUMaxDepthIntra - 1 + splitFlag), tuDepthRange[1]));
+    tuDepthRange[0] = x265_clip3(tuDepthRange[0], tuDepthRange[1], log2CUSize - (m_slice->m_sps->quadtreeTUMaxDepthIntra - 1 + splitFlag));
 }
 
 void CUData::getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const
@@ -916,7 +910,7 @@ void CUData::getInterTUQtDepthRange(uint
     tuDepthRange[0] = m_slice->m_sps->quadtreeTULog2MinSize;
     tuDepthRange[1] = m_slice->m_sps->quadtreeTULog2MaxSize;
 
-    tuDepthRange[0] = X265_MAX(tuDepthRange[0], X265_MIN(log2CUSize - (quadtreeTUMaxDepth - 1 + splitFlag), tuDepthRange[1]));
+    tuDepthRange[0] = x265_clip3(tuDepthRange[0], tuDepthRange[1], log2CUSize - (quadtreeTUMaxDepth - 1 + splitFlag));
 }
 
 uint32_t CUData::getCtxSkipFlag(uint32_t absPartIdx) const
@@ -1363,14 +1357,6 @@ uint32_t CUData::deriveRightBottomIdx(ui
     return outPartIdxRB;
 }
 
-void CUData::deriveLeftRightTopIdxAdi(uint32_t& outPartIdxLT, uint32_t& outPartIdxRT, uint32_t partOffset, uint32_t partDepth) const
-{
-    uint32_t numPartInWidth = 1 << (m_log2CUSize[0] - LOG2_UNIT_SIZE - partDepth);
-
-    outPartIdxLT = m_absIdxInCTU + partOffset;
-    outPartIdxRT = g_rasterToZscan[g_zscanToRaster[outPartIdxLT] + numPartInWidth - 1];
-}
-
 bool CUData::hasEqualMotion(uint32_t absPartIdx, const CUData& candCU, uint32_t candAbsPartIdx) const
 {
     if (m_interDir[absPartIdx] != candCU.m_interDir[candAbsPartIdx])
@@ -2000,10 +1986,10 @@ void CUData::scaleMvByPOCDist(MV& outMV,
         outMV = inMV;
     else
     {
-        int tdb   = Clip3(-128, 127, diffPocB);
-        int tdd   = Clip3(-128, 127, diffPocD);
+        int tdb   = x265_clip3(-128, 127, diffPocB);
+        int tdd   = x265_clip3(-128, 127, diffPocD);
         int x     = (0x4000 + abs(tdd / 2)) / tdd;
-        int scale = Clip3(-4096, 4095, (tdb * x + 32) >> 6);
+        int scale = x265_clip3(-4096, 4095, (tdb * x + 32) >> 6);
         outMV = scaleMv(inMV, scale);
     }
 }
diff -r 8d2f418829c8 -r ff32d97fe59c source/common/cudata.h
--- a/source/common/cudata.h	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/common/cudata.h	Wed Jan 07 14:27:33 2015 +0530
@@ -212,7 +212,6 @@ public:
 
     void     getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const;
     int      getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const;
-    void     deriveLeftRightTopIdxAdi(uint32_t& partIdxLT, uint32_t& partIdxRT, uint32_t partOffset, uint32_t partDepth) const;
 
     uint32_t getSCUAddr() const                  { return (m_cuAddr << g_maxFullDepth * 2) + m_absIdxInCTU; }
     uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const;
diff -r 8d2f418829c8 -r ff32d97fe59c source/common/dct.cpp
--- a/source/common/dct.cpp	Sat Dec 20 21:27:14 2014 +0900
+++ b/source/common/dct.cpp	Wed Jan 07 14:27:33 2015 +0530
@@ -74,10 +74,10 @@ void inversedst(const int16_t* tmp, int1
         c[2] = tmp[i] - tmp[12 + i];
         c[3] = 74 * tmp[4 + i];
 
-        block[4 * i + 0] = (int16_t)Clip3(-32768, 32767, (29 * c[0] + 55 * c[1]     + c[3]               + rnd_factor) >> shift);
-        block[4 * i + 1] = (int16_t)Clip3(-32768, 32767, (55 * c[2] - 29 * c[1]     + c[3]               + rnd_factor) >> shift);
-        block[4 * i + 2] = (int16_t)Clip3(-32768, 32767, (74 * (tmp[i] - tmp[8 + i]  + tmp[12 + i])      + rnd_factor) >> shift);
-        block[4 * i + 3] = (int16_t)Clip3(-32768, 32767, (55 * c[0] + 29 * c[2]     - c[3]               + rnd_factor) >> shift);
+        block[4 * i + 0] = (int16_t)x265_clip3(-32768, 32767, (29 * c[0] + 55 * c[1]     + c[3]               + rnd_factor) >> shift);
+        block[4 * i + 1] = (int16_t)x265_clip3(-32768, 32767, (55 * c[2] - 29 * c[1]     + c[3]               + rnd_factor) >> shift);
+        block[4 * i + 2] = (int16_t)x265_clip3(-32768, 32767, (74 * (tmp[i] - tmp[8 + i]  + tmp[12 + i])      + rnd_factor) >> shift);
+        block[4 * i + 3] = (int16_t)x265_clip3(-32768, 32767, (55 * c[0] + 29 * c[2]     - c[3]               + rnd_factor) >> shift);
     }
 }
 
@@ -255,10 +255,10 @@ void partialButterflyInverse4(const int1
         E[1] = g_t4[0][1] * src[0] + g_t4[2][1] * src[2 * line];
 
         /* Combining even and odd terms at each hierarchy levels to calculate the final spatial domain vector */
-        dst[0] = (int16_t)(Clip3(-32768, 32767, (E[0] + O[0] + add) >> shift));
-        dst[1] = (int16_t)(Clip3(-32768, 32767, (E[1] + O[1] + add) >> shift));
-        dst[2] = (int16_t)(Clip3(-32768, 32767, (E[1] - O[1] + add) >> shift));
-        dst[3] = (int16_t)(Clip3(-32768, 32767, (E[0] - O[0] + add) >> shift));
+        dst[0] = (int16_t)(x265_clip3(-32768, 32767, (E[0] + O[0] + add) >> shift));
+        dst[1] = (int16_t)(x265_clip3(-32768, 32767, (E[1] + O[1] + add) >> shift));
+        dst[2] = (int16_t)(x265_clip3(-32768, 32767, (E[1] - O[1] + add) >> shift));
+        dst[3] = (int16_t)(x265_clip3(-32768, 32767, (E[0] - O[0] + add) >> shift));
 
         src++;
         dst += 4;
@@ -292,8 +292,8 @@ void partialButterflyInverse8(const int1
         E[2] = EE[1] - EO[1];
         for (k = 0; k < 4; k++)
         {
-            dst[k] = (int16_t)Clip3(-32768, 32767, (E[k] + O[k] + add) >> shift);
-            dst[k + 4] = (int16_t)Clip3(-32768, 32767, (E[3 - k] - O[3 - k] + add) >> shift);
+            dst[k] = (int16_t)x265_clip3(-32768, 32767, (E[k] + O[k] + add) >> shift);
+            dst[k + 4] = (int16_t)x265_clip3(-32768, 32767, (E[3 - k] - O[3 - k] + add) >> shift);
         }
 
         src++;
@@ -343,8 +343,8 @@ void partialButterflyInverse16(const int
 
         for (k = 0; k < 8; k++)
         {
-            dst[k]   = (int16_t)Clip3(-32768, 32767, (E[k] + O[k] + add) >> shift);
-            dst[k + 8] = (int16_t)Clip3(-32768, 32767, (E[7 - k] - O[7 - k] + add) >> shift);
+            dst[k]   = (int16_t)x265_clip3(-32768, 32767, (E[k] + O[k] + add) >> shift);
+            dst[k + 8] = (int16_t)x265_clip3(-32768, 32767, (E[7 - k] - O[7 - k] + add) >> shift);
         }
 
         src++;
@@ -407,8 +407,8 @@ void partialButterflyInverse32(const int
 
         for (k = 0; k < 16; k++)
         {
-            dst[k] = (int16_t)Clip3(-32768, 32767, (E[k] + O[k] + add) >> shift);
-            dst[k + 16] = (int16_t)Clip3(-32768, 32767, (E[15 - k] - O[15 - k] + add) >> shift);
+            dst[k] = (int16_t)x265_clip3(-32768, 32767, (E[k] + O[k] + add) >> shift);
+            dst[k + 16] = (int16_t)x265_clip3(-32768, 32767, (E[15 - k] - O[15 - k] + add) >> shift);
         }
 
         src++;
@@ -630,7 +630,7 @@ void dequant_normal_c(const int16_t* qua
     for (int n = 0; n < num; n++)
     {
         coeffQ = (quantCoef[n] * scale + add) >> shift;
-        coef[n] = (int16_t)Clip3(-32768, 32767, coeffQ);
+        coef[n] = (int16_t)x265_clip3(-32768, 32767, coeffQ);
     }
 }
 
@@ -649,15 +649,15 @@ void dequant_scaling_c(const int16_t* qu
         for (int n = 0; n < num; n++)
         {
             coeffQ = ((quantCoef[n] * deQuantCoef[n]) + add) >> (shift - per);
-            coef[n] = (int16_t)Clip3(-32768, 32767, coeffQ);
+            coef[n] = (int16_t)x265_clip3(-32768, 32767, coeffQ);
         }
     }
     else
     {
         for (int n = 0; n < num; n++)
         {
-            coeffQ   = Clip3(-32768, 32767, quantCoef[n] * deQuantCoef[n]);
-            coef[n] = (int16_t)Clip3(-32768, 32767, coeffQ << (per - shift));
+            coeffQ   = x265_clip3(-32768, 32767, quantCoef[n] * deQuantCoef[n]);
+            coef[n] = (int16_t)x265_clip3(-32768, 32767, coeffQ << (per - shift));
         }
     }
 }
@@ -680,7 +680,7 @@ uint32_t quant_c(const int16_t* coef, co
         if (level)
             ++numSig;
         level *= sign;
-        qCoef[blockpos] = (int16_t)Clip3(-32768, 32767, level);
+        qCoef[blockpos] = (int16_t)x265_clip3(-32768, 32767, level);
     }
 
     return numSig;


More information about the x265-commits mailing list