<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div><br></div>在 2015-10-22 19:15:07,"Dnyaneshwar Gorade" <dnyaneshwar@multicorewareinc.com> 写道:<br> <blockquote id="isReplyContent" style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;"><div dir="ltr"><div class="gmail_default" style="font-family: arial,helvetica,sans-serif; font-size: small;"></div><div class="gmail_default" style="font-family: arial,helvetica,sans-serif; font-size: small;"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 21, 2015 at 7:58 AM, chen <span dir="ltr"><<a href="mailto:chenm003@163.com" target="_blank">chenm003@163.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; padding-left: 1ex; border-left-color: rgb(204, 204, 204); border-left-width: 1px; border-left-style: solid;"><div style="color: rgb(0, 0, 0); line-height: 1.7; font-family: arial; font-size: 14px;"><div><div class="h5"><div><br></div><pre><br>At 2015-10-20 18:38:56,<a href="mailto:dnyaneshwar@multicorewareinc.com" target="_blank">dnyaneshwar@multicorewareinc.com</a> wrote:
># HG changeset patch
># User Dnyaneshwar G <<a href="mailto:dnyaneshwar@multicorewareinc.com" target="_blank">dnyaneshwar@multicorewareinc.com</a>>
># Date 1445337446 -19800
># Tue Oct 20 16:07:26 2015 +0530
># Node ID 987b5f8c2c447dc5b0e410d37f6212470feecd1c
># Parent f335a9a7b9083dcb2fc7a1cadc2dbeffdd6388f2
>asm: fix intrapred_planar16x16 sse4 code for main12
>
>diff -r f335a9a7b908 -r 987b5f8c2c44 source/common/x86/asm-primitives.cpp
>--- a/source/common/x86/asm-primitives.cpp Mon Oct 19 12:42:52 2015 +0530
>+++ b/source/common/x86/asm-primitives.cpp Tue Oct 20 16:07:26 2015 +0530
>@@ -1145,8 +1145,9 @@
> <a href="http://p.cu" target="_blank">p.cu</a>[BLOCK_4x4].intra_pred[PLANAR_IDX] = PFX(intra_pred_planar4_sse4);
> <a href="http://p.cu" target="_blank">p.cu</a>[BLOCK_8x8].intra_pred[PLANAR_IDX] = PFX(intra_pred_planar8_sse4);
>
>+ <a href="http://p.cu" target="_blank">p.cu</a>[BLOCK_16x16].intra_pred[PLANAR_IDX] = PFX(intra_pred_planar16_sse4);
>+
> #if X265_DEPTH <= 10
>- <a href="http://p.cu" target="_blank">p.cu</a>[BLOCK_16x16].intra_pred[PLANAR_IDX] = PFX(intra_pred_planar16_sse4);
> <a href="http://p.cu" target="_blank">p.cu</a>[BLOCK_32x32].intra_pred[PLANAR_IDX] = PFX(intra_pred_planar32_sse4);
> #endif
> ALL_LUMA_TU_S(intra_pred[DC_IDX], intra_pred_dc, sse4);
>diff -r f335a9a7b908 -r 987b5f8c2c44 source/common/x86/const-a.asm
>--- a/source/common/x86/const-a.asm Mon Oct 19 12:42:52 2015 +0530
>+++ b/source/common/x86/const-a.asm Tue Oct 20 16:07:26 2015 +0530
>@@ -122,6 +122,7 @@
> const pd_2, times 8 dd 2
> const pd_4, times 4 dd 4
> const pd_8, times 4 dd 8
>+const pd_15, times 8 dd 15
> const pd_16, times 8 dd 16
> const pd_31, times 4 dd 31
> const pd_32, times 8 dd 32
>@@ -136,7 +137,8 @@
> const pd_524416, times 4 dd 524416
> const pd_n32768, times 8 dd 0xffff8000
> const pd_n131072, times 4 dd 0xfffe0000
>-
>+const pd_planar16_mul, times 1 dd 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
>+const pd_planar16_mul1, times 1 dd 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
> const trans8_shuf, times 1 dd 0, 4, 1, 5, 2, 6, 3, 7
>
> const popcnt_table
>diff -r f335a9a7b908 -r 987b5f8c2c44 source/common/x86/intrapred16.asm
>--- a/source/common/x86/intrapred16.asm Mon Oct 19 12:42:52 2015 +0530
>+++ b/source/common/x86/intrapred16.asm Tue Oct 20 16:07:26 2015 +0530
>@@ -109,6 +109,7 @@
> cextern pw_16
> cextern pw_31
> cextern pw_32
>+cextern pd_15
> cextern pd_16
> cextern pd_31
> cextern pd_32
>@@ -123,6 +124,8 @@
> cextern pb_unpackwq1
> cextern pb_unpackwq2
> cextern pw_planar16_mul
>+cextern pd_planar16_mul
>+cextern pd_planar16_mul1
> cextern pw_planar32_mul
>
> ;-----------------------------------------------------------------------------------
>@@ -2216,6 +2219,114 @@
> ; void intra_pred_planar(pixel* dst, intptr_t dstStride, pixel*srcPix, int, int filter)
> ;---------------------------------------------------------------------------------------
> INIT_XMM sse4
>+%if ARCH_X86_64 == 1 && BIT_DEPTH == 12
>+cglobal intra_pred_planar16, 3,5,12
>+ add r1d, r1d
>+
>+ pmovzxwd m2, [r2 + 2]
>+ pmovzxwd m7, [r2 + 10]
>+ pmovzxwd m10, [r2 + 18]
>+ pmovzxwd m0, [r2 + 26]
>+
>+ movzx r3d, word [r2 + 34] ; topRight = above[16]
>+ lea r4, [pd_planar16_mul1]
>+
>+ movd m3, r3d
>+ pshufd m3, m3, 0 ; topRight
>+
>+ pmulld m8, m3, [r4 + 3*mmsize] ; (x + 1) * topRight
>+ pmulld m4, m3, [r4 + 2*mmsize] ; (x + 1) * topRight
>+ pmulld m9, m3, [r4 + 1*mmsize] ; (x + 1) * topRight
>+ pmulld m3, m3, [r4 + 0*mmsize] ; (x + 1) * topRight
>+
>+ mova m11, [pd_15]
>+ pmulld m1, m2, m11 ; (blkSize - 1 - y) * above[x]
>+ pmulld m6, m7, m11 ; (blkSize - 1 - y) * above[x]
>+ pmulld m5, m10, m11 ; (blkSize - 1 - y) * above[x]
>+ pmulld m11, m0 ; (blkSize - 1 - y) * above[x]
>+
>+ paddd m4, m5
>+ paddd m3, m1
>+ paddd m8, m11
>+ paddd m9, m6
>+
>+ mova m5, [pd_16]
>+ paddd m3, m5
>+ paddd m9, m5
>+ paddd m4, m5
>+ paddd m8, m5
>+
>+ movzx r4d, word [r2 + 98] ; bottomLeft = left[16]</pre></div></div><pre>r3 is free</pre><span><pre>>+ movd m6, r4d
>+ pshufd m6, m6, 0 ; bottomLeft
>+
>+ paddd m4, m6
>+ paddd m3, m6
>+ paddd m8, m6
>+ paddd m9, m6
>+
>+ psubd m1, m6, m0 ; column 12-15
>+ psubd m11, m6, m10 ; column 8-11
>+ psubd m10, m6, m7 ; column 4-7
>+ psubd m6, m2 ; column 0-3
>+
>+ add r2, 66
>+ lea r4, [<div class="gmail_default" style="font-family: arial,helvetica,sans-serif; font-size: small; display: inline;"></div>pd_planar16_mul]</pre></span><pre>don't need load again since above rename to r3</pre><pre><div><div class="h5"><div class="gmail_default" style="font-family: arial,helvetica,sans-serif; font-size: small; display: inline;"><pre>"pd_planar_mul" This constant is different than above constant loaded into r4 is "pd_planar_mul1"</pre></div>This is my fault, could you use _mul0 and _mul1, other are fine<br></div></div></pre></div></blockquote></div></div></div>
</blockquote></div>