<div dir="ltr"><div class="gmail_default" style="font-family:georgia,serif;color:rgb(0,0,0)"><br></div><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">chen</b> <span dir="ltr"><<a href="mailto:chenm003@163.com">chenm003@163.com</a>></span><br>Date: Tue, Nov 21, 2017 at 10:07 AM<br>Subject: Re: [x265] [PATCH] intra: sse4 version of strong intra smoothing<br>To: Development for x265 <<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a>><br><br><br><div style="line-height:1.7;color:rgb(0,0,0);font-size:14px;font-family:Arial"><pre><span class="gmail-">>diff -r a7c2f80c18af -r 973560d58dfb source/common/x86/intrapred8.<wbr>asm
>--- a/source/common/x86/<wbr>intrapred8.asm Mon Nov 20 14:31:22 2017 +0530
>+++ b/source/common/x86/<wbr>intrapred8.asm Tue Nov 21 03:10:14 2017 +0800
>@@ -22313,11 +22313,144 @@
> mov [r1 + 64], r3b ; LeftLast
> RET
>
>-INIT_XMM sse4
>-cglobal intra_filter_32x32, 2,4,6
>- mov r2b, byte [r0 + 64] ; topLast
>- mov r3b, byte [r0 + 128] ; LeftLast
>-
>+; this function add strong intra filter
>+<div class="gmail_default" style="font-family:georgia,serif;color:rgb(0,0,0);display:inline"></div>INIT_XMM sse4
>+cglobal intra_filter_32x32, 3,8,7
>+ xor r3d, r3d ; R9
>+ xor r4d, r4d ; R10
>+ mov r3b, byte [r0 + 64] ; topLast
<div>>+ mov r4b, byte [r0 + 128] ; LeftLast</div><div><br></div></span><div>xor+mov = movzx, the xor (clear to zero) does not spending cycle, but affect instruction decode rate</div><span class="gmail-"><div><br></div>>+
>+ ; strong intra filter is diabled
>+ cmp r2m, byte 0
>+ jz .normal_filter32
>+ ; decide to do strong intra filter
>+ xor r5d, r5d ; R11
>+ xor r6d, r6d ; RAX
>+ xor r7d, r7d ; RDI
>+ mov r5b, byte [r0] ; topLeft
>+ mov r6b, byte [r0 + 96] ; leftMiddle
>+ mov r7b, byte [r0 + 32] ; topMiddle
>+
>+ ; threshold = 8
>+ mov r2d, r3d ; R8
>+ add r2d, r5d ; (topLast + topLeft)
>+ shl r7d, 1 ; 2 * topMiddle
<div>>+ sub r2d, r7d</div></span><div>(A+B) - 2 * C <==> (A-C) + (B-C)</div><span class="gmail-"><div><br></div>>+ mov r7d, r2d ; backup r2d
>+ sar r7d, 31
>+ xor r2d, r7d
>+ sub r2d, r7d ; abs(r2d)
<div>>+ cmp r2d, 8</div></span><div>; how about this or instruction cdq?</div><div>; abs(x-y)</div><div>mov eax, X
sub eax, Y
sub Y, X
cmovg eax, Y</div><span class="gmail-"><div><br></div><div><br></div>>+ ; bilinearAbove is false
>+ jns .normal_filter32
>+
>+ mov r2d, r5d
>+ add r2d, r4d
>+ shl r6d, 1
>+ sub r2d, r6d
>+ mov r6d, r2d
>+ sar r6d, 31
>+ xor r2d, r6d
>+ sub r2d, r6d
>+ cmp r2d, 8
>+ ; bilinearLeft is false
>+ jns .normal_filter32
>+
>+ ; do strong intra filter shift = 6
>+ mov r2d, r5d
>+ shl r2d, 6
>+ add r2d, 32 ; init
>+ mov r6d, r4d
<div>>+ sub r6w, r5w ; deltaL size is word</div></span><div>partial register may stall in here</div><span class="gmail-"><div><br></div>>+ mov r7d, r3d
>+ sub r7w, r5w ; deltaR size is word
>+ movd xmm0, r2d
<div>>+ <div class="gmail_default" style="font-family:georgia,serif;color:rgb(0,0,0);display:inline"></div>vpbroadcastw xmm0, xmm0</div></span><div>SSE4?</div><div><div class="gmail_default" style="font-family:georgia,serif;color:rgb(0,0,0)">This is AVX2 instruction, so<b> <div class="gmail_default" style="display:inline"></div></b>intialization on top is wrong<span style="font-family:Arial">. W</span>e genrally we don't prefix xmm, ymm for native version m0, m1 will be better. </div><br></div><span class="gmail-"><div><br></div>>+ mova xmm4, xmm0
>+
</span></pre></div><br>______________________________<wbr>_________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/<wbr>listinfo/x265-devel</a><br>
<br></div><br></div>