[x265-commits] [x265] testbench(quant): the Round value must be less than (2 ^ ...
Min Chen
chenm003 at 163.com
Tue Sep 9 18:19:26 CEST 2014
details: http://hg.videolan.org/x265/rev/53e0969c605f
branches:
changeset: 8011:53e0969c605f
user: Min Chen <chenm003 at 163.com>
date: Mon Sep 08 19:38:41 2014 -0700
description:
testbench(quant): the Round value must be less than (2 ^ qbits)
Subject: [x265] testbench(quant): the qBits value must be more than or equal to 8
details: http://hg.videolan.org/x265/rev/5dbf9e8f4028
branches:
changeset: 8012:5dbf9e8f4028
user: Min Chen <chenm003 at 163.com>
date: Mon Sep 08 19:38:56 2014 -0700
description:
testbench(quant): the qBits value must be more than or equal to 8
Subject: [x265] asm: improve quant by replace variant shift to fixed shift, 19k cycles -> 16.6k cycles
details: http://hg.videolan.org/x265/rev/277c1e05c247
branches:
changeset: 8013:277c1e05c247
user: Min Chen <chenm003 at 163.com>
date: Mon Sep 08 19:39:14 2014 -0700
description:
asm: improve quant by replace variant shift to fixed shift, 19k cycles -> 16.6k cycles
Subject: [x265] asm: avx2 version of quant, improve 16.6k cycles -> 8.4k cycles
details: http://hg.videolan.org/x265/rev/c4fb044c901b
branches:
changeset: 8014:c4fb044c901b
user: Min Chen <chenm003 at 163.com>
date: Mon Sep 08 19:39:34 2014 -0700
description:
asm: avx2 version of quant, improve 16.6k cycles -> 8.4k cycles
Subject: [x265] search: remove warning from MS compiler
details: http://hg.videolan.org/x265/rev/44cb33846e0e
branches:
changeset: 8015:44cb33846e0e
user: Deepthi Nandakumar <deepthi at multicorewareinc.com>
date: Tue Sep 09 10:39:52 2014 +0530
description:
search: remove warning from MS compiler
Subject: [x265] frameencoder: use x265_emms() prior to double QP clipping for VBV
details: http://hg.videolan.org/x265/rev/a414ca1c9067
branches:
changeset: 8016:a414ca1c9067
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 14:24:37 2014 +0200
description:
frameencoder: use x265_emms() prior to double QP clipping for VBV
Subject: [x265] frameencoder: use simple shifts to scale 2-pass CU type counters
details: http://hg.videolan.org/x265/rev/ebd5a0cac758
branches:
changeset: 8017:ebd5a0cac758
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 14:23:20 2014 +0200
description:
frameencoder: use simple shifts to scale 2-pass CU type counters
the cu type counters are summed at the end and turned into percentages, so
it doesn't matter what base unit is used, only that each depth has 4x the
value as depth+1
Subject: [x265] copy_cnt_4: enable fast non zero coefficient count path
details: http://hg.videolan.org/x265/rev/0dc2cbc36ee5
branches:
changeset: 8018:0dc2cbc36ee5
user: Praveen Tiwari
date: Tue Sep 09 11:07:59 2014 +0530
description:
copy_cnt_4: enable fast non zero coefficient count path
Subject: [x265] copy_cnt_4: combine mova and paddb to reduce code size, same speedup
details: http://hg.videolan.org/x265/rev/5edcbcbb338f
branches:
changeset: 8019:5edcbcbb338f
user: Praveen Tiwari
date: Tue Sep 09 11:36:58 2014 +0530
description:
copy_cnt_4: combine mova and paddb to reduce code size, same speedup
Subject: [x265] copy_cnt_4: faster AVX2 code
details: http://hg.videolan.org/x265/rev/f7f8206a70bd
branches:
changeset: 8020:f7f8206a70bd
user: Praveen Tiwari
date: Tue Sep 09 14:07:14 2014 +0530
description:
copy_cnt_4: faster AVX2 code
Subject: [x265] copy_cnt_8 AVX2 asm code, as per new interface
details: http://hg.videolan.org/x265/rev/331ef5121676
branches:
changeset: 8021:331ef5121676
user: Praveen Tiwari
date: Tue Sep 09 17:53:09 2014 +0530
description:
copy_cnt_8 AVX2 asm code, as per new interface
Subject: [x265] search: return distortion from xIntraCodingLumaBlk, do not pass by reference
details: http://hg.videolan.org/x265/rev/a7f4f750e9d4
branches:
changeset: 8022:a7f4f750e9d4
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 16:33:54 2014 +0200
description:
search: return distortion from xIntraCodingLumaBlk, do not pass by reference
Subject: [x265] search: return distortion from xRecurIntraChromaCodingQT, do not pass by ref
details: http://hg.videolan.org/x265/rev/b0a018562d29
branches:
changeset: 8023:b0a018562d29
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 16:39:34 2014 +0200
description:
search: return distortion from xRecurIntraChromaCodingQT, do not pass by ref
Subject: [x265] search: return distortion from xIntraCodingChromaBlk, do not pass by ref
details: http://hg.videolan.org/x265/rev/62f6924be843
branches:
changeset: 8024:62f6924be843
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 16:47:11 2014 +0200
description:
search: return distortion from xIntraCodingChromaBlk, do not pass by ref
Subject: [x265] search: return distortion from xRecurIntraCodingQT
details: http://hg.videolan.org/x265/rev/68ac5ca5d676
branches:
changeset: 8025:68ac5ca5d676
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 17:00:02 2014 +0200
description:
search: return distortion from xRecurIntraCodingQT
Subject: [x265] search: pass depthRange uniformly as uint32_t depthRange[2]
details: http://hg.videolan.org/x265/rev/cead9fe7ff30
branches:
changeset: 8026:cead9fe7ff30
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 17:13:51 2014 +0200
description:
search: pass depthRange uniformly as uint32_t depthRange[2]
effectively the same as uint32_t but compilers and debuggers can often do more
with the length info. plus it just makes the code more readable
Subject: [x265] search: return distortion from xEstimateResidualQT
details: http://hg.videolan.org/x265/rev/7d8e4935c1ca
branches:
changeset: 8027:7d8e4935c1ca
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 17:19:45 2014 +0200
description:
search: return distortion from xEstimateResidualQT
Subject: [x265] search: don't pass a zeroDistortion pointer if you don't want the answer
details: http://hg.videolan.org/x265/rev/84b1d287333f
branches:
changeset: 8028:84b1d287333f
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 17:22:46 2014 +0200
description:
search: don't pass a zeroDistortion pointer if you don't want the answer
Subject: [x265] search: fix camel case of residualQTIntraChroma
details: http://hg.videolan.org/x265/rev/d85792b9f373
branches:
changeset: 8029:d85792b9f373
user: Steve Borho <steve at borho.org>
date: Tue Sep 09 17:24:33 2014 +0200
description:
search: fix camel case of residualQTIntraChroma
Subject: [x265] analysis: modified compressInterCU_rd0_4() with CU-specific information
details: http://hg.videolan.org/x265/rev/2d9eb8cebb71
branches:
changeset: 8030:2d9eb8cebb71
user: Ashok Kumar Mishra<ashok at multicorewareinc.com>
date: Tue Sep 09 20:02:39 2014 +0530
description:
analysis: modified compressInterCU_rd0_4() with CU-specific information
diffstat:
source/common/dct.cpp | 2 +
source/common/x86/asm-primitives.cpp | 2 +
source/common/x86/blockcopy8.asm | 127 ++++++------------
source/common/x86/const-a.asm | 2 +-
source/common/x86/pixel-util.h | 1 +
source/common/x86/pixel-util8.asm | 228 ++++++++++++++++++++++++++++------
source/encoder/analysis.cpp | 43 ++---
source/encoder/analysis.h | 4 +-
source/encoder/frameencoder.cpp | 13 +-
source/encoder/search.cpp | 153 ++++++++++-------------
source/encoder/search.h | 42 +++---
source/test/mbdstharness.cpp | 4 +-
12 files changed, 351 insertions(+), 270 deletions(-)
diffs (truncated from 1249 to 300 lines):
diff -r b5f81a839403 -r 2d9eb8cebb71 source/common/dct.cpp
--- a/source/common/dct.cpp Mon Sep 08 22:40:00 2014 +0200
+++ b/source/common/dct.cpp Tue Sep 09 20:02:39 2014 +0530
@@ -772,6 +772,8 @@ void dequant_scaling_c(const int16_t* qu
uint32_t quant_c(int32_t* coef, int32_t* quantCoeff, int32_t* deltaU, int16_t* qCoef, int qBits, int add, int numCoeff)
{
+ X265_CHECK(qBits >= 8, "qBits less than 8\n");
+ X265_CHECK((numCoeff % 16) == 0, "numCoeff must be multiple of 16\n");
int qBits8 = qBits - 8;
uint32_t numSig = 0;
diff -r b5f81a839403 -r 2d9eb8cebb71 source/common/x86/asm-primitives.cpp
--- a/source/common/x86/asm-primitives.cpp Mon Sep 08 22:40:00 2014 +0200
+++ b/source/common/x86/asm-primitives.cpp Tue Sep 09 20:02:39 2014 +0530
@@ -1441,6 +1441,7 @@ void Setup_Assembly_Primitives(EncoderPr
if (cpuMask & X265_CPU_AVX2)
{
p.dct[DCT_4x4] = x265_dct4_avx2;
+ p.quant = x265_quant_avx2;
p.nquant = x265_nquant_avx2;
p.dequant_normal = x265_dequant_normal_avx2;
}
@@ -1739,6 +1740,7 @@ void Setup_Assembly_Primitives(EncoderPr
p.denoiseDct = x265_denoise_dct_avx2;
p.dct[DCT_4x4] = x265_dct4_avx2;
+ p.quant = x265_quant_avx2;
p.nquant = x265_nquant_avx2;
p.dequant_normal = x265_dequant_normal_avx2;
}
diff -r b5f81a839403 -r 2d9eb8cebb71 source/common/x86/blockcopy8.asm
--- a/source/common/x86/blockcopy8.asm Mon Sep 08 22:40:00 2014 +0200
+++ b/source/common/x86/blockcopy8.asm Tue Sep 09 20:02:39 2014 +0530
@@ -3973,13 +3973,12 @@ cglobal copy_cnt_4, 3,3,3
; get count
; CHECK_ME: Intel documents said POPCNT is SSE4.2 instruction, but just implement after Nehalem
-%if 0
+%if 1
pmovmskb eax, m0
not ax
popcnt ax, ax
%else
- mova m1, [pb_1]
- paddb m0, m1
+ paddb m0, [pb_1]
psadbw m0, m2
pshufd m1, m0, 2
paddw m0, m1
@@ -3991,7 +3990,7 @@ cglobal copy_cnt_4, 3,3,3
INIT_YMM avx2
cglobal copy_cnt_4, 3,3,3
add r2d, r2d
- xorpd xm2, xm2
+ xorpd m2, m2
; row 0 & 1
movq xm0, [r1]
@@ -4005,11 +4004,9 @@ cglobal copy_cnt_4, 3,3,3
vinserti128 m0, m0, xm1, 1
movu [r0], m0
- vextractf128 xm1, m0, 1
- packsswb xm0, xm1
- pcmpeqb xm0, xm2
-
; get count
+ packsswb xm0, xm1
+ pcmpeqb xm0, xm2
pmovmskb eax, xm0
not ax
popcnt ax, ax
@@ -4079,85 +4076,49 @@ cglobal copy_cnt_8, 3,3,6
INIT_YMM avx2
-%if ARCH_X86_64 == 1
-cglobal copy_cnt_8, 3,4,6
- %define tmpd eax
-%else
-cglobal copy_cnt_8, 3,5,6
- %define tmpd r4d
-%endif
+cglobal copy_cnt_8, 3,3,6
add r2d, r2d
- pxor m4, m4
- lea r3, [r2 * 3]
-
- ; row 0
+ xorpd m5, m5
+
+ ; row 0 - 1
movu xm0, [r1]
- mova xm2, xm0
- pmovsxwd m1, xm0
- movu [r0 + 0 * mmsize], m1
-
- ; row 1
- movu xm0, [r1 + r2]
- vinserti128 m2, m2, xm0, 1
- pmovsxwd m1, xm0
- movu [r0 + 1 * mmsize], m1
-
- ; row 2
- movu xm0, [r1 + r2 * 2]
- mova xm5, xm0
- pmovsxwd m1, xm0
- movu [r0 + 2 * mmsize], m1
-
- ; row 3
- movu xm0, [r1 + r3]
- vinserti128 m5, m5, xm0, 1
- packsswb m2, m5
- pcmpeqb m2, m4
- pmovmskb tmpd, m2
- not tmpd
- popcnt tmpd, tmpd
- pmovsxwd m1, xm0
- movu [r0 + 3 * mmsize], m1
-
- add r0, 4 * mmsize
- lea r1, [r1 + r2 * 4]
-
- ; row 4
- movu xm0, [r1]
- mova xm2, xm0
- pmovsxwd m1, xm0
- movu [r0 + 0 * mmsize], m1
-
- ; row 5
- movu xm0, [r1 + r2]
- vinserti128 m2, m2, xm0, 1
- pmovsxwd m1, xm0
- movu [r0 + 1 * mmsize], m1
-
- ; row 6
- movu xm0, [r1 + r2 * 2]
- mova xm5, xm0
- pmovsxwd m1, xm0
- movu [r0 + 2 * mmsize], m1
-
- ; row 7
- movu xm0, [r1 + r3]
- pmovsxwd m1, xm0
- movu [r0 + 3 * mmsize], m1
- vinserti128 m5, m5, xm0, 1
+ movu xm1, [r1 + r2]
+ vinserti128 m0, m0, xm1, 1
+ movu [r0], m0
+
+ ; row 2 - 3
+ movu xm1, [r1 + r2 * 2]
+ lea r1, [r1 + r2 * 2]
+ movu xm2, [r1 + r2]
+ vinserti128 m1, m1, xm2, 1
+ movu [r0 + 32], m1
+
+ ; row 4 - 5
+ movu xm2, [r1 + r2 * 2]
+ lea r1, [r1 + r2 * 2]
+ movu xm3, [r1 + r2]
+ vinserti128 m2, m2, xm3, 1
+ movu [r0 + 64], m2
+
+ ; row 6 - 7
+ movu xm3, [r1 + r2 * 2]
+ lea r1, [r1 + r2 * 2]
+ movu xm4, [r1 + r2]
+ vinserti128 m3, m3, xm4, 1
+ movu [r0 + 96], m3
; get count
- packsswb m2, m5
- pcmpeqb m2, m4
- pmovmskb r0d, m2
- not r0d
- popcnt r0d, r0d
-
-%if ARCH_X86_64 == 1
- add tmpd, r0d
-%else
- add r0d, tmpd
-%endif
+ vpacksswb m0, m1
+ vpcmpeqb m0, m5
+ vpmovmskb eax, m0
+ not eax
+ popcnt eax, eax
+ vpacksswb m2, m3
+ vpcmpeqb m2, m5
+ vpmovmskb r1d, m2
+ not r1d
+ popcnt r1d, r1d
+ add eax, r1d
RET
diff -r b5f81a839403 -r 2d9eb8cebb71 source/common/x86/const-a.asm
--- a/source/common/x86/const-a.asm Mon Sep 08 22:40:00 2014 +0200
+++ b/source/common/x86/const-a.asm Tue Sep 09 20:02:39 2014 +0530
@@ -76,7 +76,7 @@ const pw_ppppmmmm, dw 1,1,1,1,-1,-1,-1,-
const pw_ppmmppmm, dw 1,1,-1,-1,1,1,-1,-1
const pw_pmpmpmpm, dw 1,-1,1,-1,1,-1,1,-1
const pw_pmmpzzzz, dw 1,-1,-1,1,0,0,0,0
-const pd_1, times 4 dd 1
+const pd_1, times 8 dd 1
const pd_2, times 4 dd 2
const pd_4, times 4 dd 4
const pd_8, times 4 dd 8
diff -r b5f81a839403 -r 2d9eb8cebb71 source/common/x86/pixel-util.h
--- a/source/common/x86/pixel-util.h Mon Sep 08 22:40:00 2014 +0200
+++ b/source/common/x86/pixel-util.h Tue Sep 09 20:02:39 2014 +0530
@@ -45,6 +45,7 @@ void x265_transpose32_sse2(pixel *dest,
void x265_transpose64_sse2(pixel *dest, pixel *src, intptr_t stride);
uint32_t x265_quant_sse4(int32_t *coef, int32_t *quantCoeff, int32_t *deltaU, int16_t *qCoef, int qBits, int add, int numCoeff);
+uint32_t x265_quant_avx2(int32_t *coef, int32_t *quantCoeff, int32_t *deltaU, int16_t *qCoef, int qBits, int add, int numCoeff);
uint32_t x265_nquant_sse4(int32_t *coef, int32_t *quantCoeff, int16_t *qCoef, int qBits, int add, int numCoeff);
uint32_t x265_nquant_avx2(int32_t *coef, int32_t *quantCoeff, int16_t *qCoef, int qBits, int add, int numCoeff);
void x265_dequant_normal_sse4(const int16_t* quantCoef, int32_t* coef, int num, int scale, int shift);
diff -r b5f81a839403 -r 2d9eb8cebb71 source/common/x86/pixel-util8.asm
--- a/source/common/x86/pixel-util8.asm Mon Sep 08 22:40:00 2014 +0200
+++ b/source/common/x86/pixel-util8.asm Tue Sep 09 20:02:39 2014 +0530
@@ -54,6 +54,7 @@ cextern pw_1
cextern pw_00ff
cextern pw_2000
cextern pw_pixel_max
+cextern pd_1
cextern pd_32767
cextern pd_n32768
@@ -861,7 +862,6 @@ cglobal getResidual32, 4,5,7
;-----------------------------------------------------------------------------
INIT_XMM sse4
cglobal quant, 5,6,8
-
; fill qbits
movd m4, r4d ; m4 = qbits
@@ -873,52 +873,45 @@ cglobal quant, 5,6,8
movd m5, r5m
pshufd m5, m5, 0 ; m5 = add
+ lea r5, [pd_1]
+
mov r4d, r6m
shr r4d, 3
pxor m7, m7 ; m7 = numZero
.loop:
; 4 coeff
movu m0, [r0] ; m0 = level
- pxor m1, m1
- pcmpgtd m1, m0 ; m1 = sign
- movu m2, [r1] ; m2 = qcoeff
- pabsd m0, m0
- pmulld m0, m2 ; m0 = tmpLevel1
- paddd m2, m0, m5
+ pabsd m1, m0
+ pmulld m1, [r1] ; m0 = tmpLevel1
+ paddd m2, m1, m5
psrad m2, m4 ; m2 = level1
- pslld m3, m2, m4
- psubd m0, m3
- psrad m0, m6 ; m0 = deltaU1
- movu [r2], m0
- pxor m0, m0
- pcmpeqd m0, m2 ; m0 = mask4
- psubd m7, m0
-
- pxor m2, m1
- psubd m2, m1
- packssdw m2, m2
- movh [r3], m2
+
+ pslld m3, m2, 8
+ psrad m1, m6
+ psubd m1, m3 ; m1 = deltaU1
+
+ movu [r2], m1
+ psignd m3, m2, m0
+ pminud m2, [r5]
+ paddd m7, m2
+ packssdw m3, m3
+ movh [r3], m3
+
; 4 coeff
movu m0, [r0 + 16] ; m0 = level
- pxor m1, m1
- pcmpgtd m1, m0 ; m1 = sign
- movu m2, [r1 + 16] ; m2 = qcoeff
- pabsd m0, m0
- pmulld m0, m2 ; m0 = tmpLevel1
- paddd m2, m0, m5
+ pabsd m1, m0
+ pmulld m1, [r1 + 16] ; m0 = tmpLevel1
+ paddd m2, m1, m5
psrad m2, m4 ; m2 = level1
- pslld m3, m2, m4
- psubd m0, m3
More information about the x265-commits
mailing list