<div dir="ltr"><div dir="ltr"># HG changeset patch</div><div dir="ltr"># User Akil Ayyappan<<a href="mailto:akil@multicorewareinc.com" target="_blank">akil@multicorewareinc.com</a>></div><div dir="ltr"># Date 1554365158 -19800</div><div dir="ltr"># Thu Apr 04 13:35:58 2019 +0530</div><div dir="ltr"># Node ID e7a726d1ca84d59f85cfafb428b8ffc4b9eb7000</div><div dir="ltr"># Parent b36242b9f354b8773e38674b876b0ca5dfc35ad2</div><div dir="ltr">SSIM-RD : 8-bit AVX2 performance improvement</div><div dir="ltr"><br></div><div>Patch has been pushed to x265 public branch.</div><div><br></div><div><br></div><div>Thanks & Regards,</div><div>Dinesh</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 5, 2019 at 3:33 PM Akil <<a href="mailto:akil@multicorewareinc.com">akil@multicorewareinc.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"># HG changeset patch</div><div dir="ltr"># User Akil Ayyappan<<a href="mailto:akil@multicorewareinc.com" target="_blank">akil@multicorewareinc.com</a>></div><div dir="ltr"># Date 1554365158 -19800</div><div dir="ltr"># Thu Apr 04 13:35:58 2019 +0530</div><div dir="ltr"># Node ID e7a726d1ca84d59f85cfafb428b8ffc4b9eb7000</div><div dir="ltr"># Parent b36242b9f354b8773e38674b876b0ca5dfc35ad2</div><div dir="ltr">SSIM-RD : 8-bit AVX2 performance improvement</div><div dir="ltr"><br></div><div dir="ltr">ssimDistortion</div><div dir="ltr">[16x16] 5.44x => 13.52x</div><div dir="ltr">[32x32] 6.01x => 18.99x</div><div dir="ltr">[64x64] 6.70x => 20.78x</div><div dir="ltr"><br></div><div dir="ltr">normFactor</div><div dir="ltr">[16x16] 8.42x => 17.96x</div><div dir="ltr">[32x32] 9.56x => 29.12x</div><div dir="ltr">[64x64] 8.96x => 25.29x</div><div dir="ltr"><br></div><div dir="ltr">diff -r b36242b9f354 -r e7a726d1ca84 source/common/x86/pixel-a.asm</div><div dir="ltr">--- a/source/common/x86/pixel-a.asm<span style="white-space:pre-wrap"> </span>Tue Apr 02 15:01:12 2019 +0530</div><div dir="ltr">+++ b/source/common/x86/pixel-a.asm<span style="white-space:pre-wrap"> </span>Thu Apr 04 13:35:58 2019 +0530</div><div dir="ltr">@@ -370,7 +370,7 @@</div><div dir="ltr"> RET</div><div dir="ltr"> %endmacro</div><div dir="ltr"> </div><div dir="ltr">-%macro SSIM_RD_COL 2</div><div dir="ltr">+%macro SSIM_DIST_HIGH 2</div><div dir="ltr"> vpsrld m6, m0, SSIMRD_SHIFT</div><div dir="ltr"> vpsubd m0, m1</div><div dir="ltr"> </div><div dir="ltr">@@ -388,7 +388,7 @@</div><div dir="ltr"> vpaddq m7, m6</div><div dir="ltr"> %endmacro</div><div dir="ltr"> </div><div dir="ltr">-%macro NORM_FACT_COL 1</div><div dir="ltr">+%macro NORM_FACT_HIGH 1</div><div dir="ltr"> vpsrld m1, m0, SSIMRD_SHIFT</div><div dir="ltr"> vpmuldq m2, m1, m1</div><div dir="ltr"> vpsrldq m1, m1, 4</div><div dir="ltr">@@ -398,6 +398,23 @@</div><div dir="ltr"> vpaddq m3, m1</div><div dir="ltr"> %endmacro</div><div dir="ltr"> </div><div dir="ltr">+%macro SSIM_DIST_LOW 2</div><div dir="ltr">+ vpsrlw m6, m0, SSIMRD_SHIFT</div><div dir="ltr">+ vpsubw m0, m1</div><div dir="ltr">+</div><div dir="ltr">+ vpmaddwd m0, m0, m0</div><div dir="ltr">+ vpmaddwd m6, m6, m6</div><div dir="ltr">+</div><div dir="ltr">+ vpaddd m4, m0</div><div dir="ltr">+ vpaddd m7, m6</div><div dir="ltr">+%endmacro</div><div dir="ltr">+</div><div dir="ltr">+%macro NORM_FACT_LOW 1</div><div dir="ltr">+ vpsrlw m1, m0, SSIMRD_SHIFT</div><div dir="ltr">+ vpmaddwd m1, m1, m1</div><div dir="ltr">+ vpaddd m3, m1</div><div dir="ltr">+%endmacro</div><div dir="ltr">+</div><div dir="ltr"> ; FIXME avoid the spilling of regs to hold 3*stride.</div><div dir="ltr"> ; for small blocks on x86_32, modify pixel pointer instead.</div><div dir="ltr"> </div><div dir="ltr">@@ -16014,7 +16031,7 @@</div><div dir="ltr"> %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr"> </div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> %if HIGH_BIT_DEPTH</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr">@@ -16047,41 +16064,37 @@</div><div dir="ltr"> vpxor m3, m3</div><div dir="ltr"> vpxor m7, m7 ;ac_k</div><div dir="ltr"> .row:</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> ;Col 1-8</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr"> vpmovzxwd m0, [r0] ;fenc</div><div dir="ltr"> vpmovzxwd m1, [r2] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0]</div><div dir="ltr">- vpmovzxbd m1, [r2]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 9-16</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 16] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 16] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 8]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 8]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">-</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 16]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr"> lea r2, [r2 + 2 * r3]</div><div dir="ltr">-%else</div><div dir="ltr">+%elif BIT_DEPTH == 8</div><div dir="ltr">+;col 1- 16</div><div dir="ltr">+ vpmovzxbw m0, [r0] ;fenc</div><div dir="ltr">+ vpmovzxbw m1, [r2] ;recon</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + r1]</div><div dir="ltr"> lea r2, [r2 + r3]</div><div dir="ltr">+%else</div><div dir="ltr">+ %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr"> dec r5d</div><div dir="ltr"> jnz .row</div><div dir="ltr">+</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> vextracti128 xm5, m4, 1</div><div dir="ltr"> vpaddq xm4, xm5</div><div dir="ltr"> punpckhqdq xm2, xm4, xm3</div><div dir="ltr">@@ -16091,7 +16104,23 @@</div><div dir="ltr"> vpaddq xm7, xm5</div><div dir="ltr"> punpckhqdq xm2, xm7, xm3</div><div dir="ltr"> paddq xm7, xm2</div><div dir="ltr">-</div><div dir="ltr">+%else</div><div dir="ltr">+ vextracti128 xm5, m4, 1</div><div dir="ltr">+ vpaddd xm4, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm4, xm3</div><div dir="ltr">+ paddd xm4, xm2</div><div dir="ltr">+ punpckldq xm4, xm4, xm3</div><div dir="ltr">+ punpckhqdq xm2, xm4, xm3</div><div dir="ltr">+ paddd xm4, xm2</div><div dir="ltr">+</div><div dir="ltr">+ vextracti128 xm5, m7, 1</div><div dir="ltr">+ vpaddd xm7, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm7, xm3</div><div dir="ltr">+ paddd xm7, xm2</div><div dir="ltr">+ punpckldq xm7, xm7, xm3</div><div dir="ltr">+ punpckhqdq xm2, xm7, xm3</div><div dir="ltr">+ paddd xm7, xm2</div><div dir="ltr">+%endif</div><div dir="ltr"> movq [r4], xm4</div><div dir="ltr"> movq [r6], xm7</div><div dir="ltr"> RET</div><div dir="ltr">@@ -16104,67 +16133,55 @@</div><div dir="ltr"> vpxor m3, m3</div><div dir="ltr"> vpxor m7, m7 ;ac_k</div><div dir="ltr"> .row:</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> ;Col 1-8</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr"> vpmovzxwd m0, [r0] ;fenc</div><div dir="ltr"> vpmovzxwd m1, [r2] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0]</div><div dir="ltr">- vpmovzxbd m1, [r2]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 9-16</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 16] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 16] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 8]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 8]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 16]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 17-24</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 32] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 32] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 16]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 16]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 32]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 32]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 25-32</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 48] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 48] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 24]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 24]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">-</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 48]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 48]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr"> lea r2, [r2 + 2 * r3]</div><div dir="ltr">-%else</div><div dir="ltr">+%elif BIT_DEPTH == 8</div><div dir="ltr">+;col 1-16</div><div dir="ltr">+ vpmovzxbw m0, [r0] ;fenc</div><div dir="ltr">+ vpmovzxbw m1, [r2] ;recon</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr">+;col 17-32</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 16]</div><div dir="ltr">+ vpmovzxbw m1, [r2 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + r1]</div><div dir="ltr"> lea r2, [r2 + r3]</div><div dir="ltr">+%else</div><div dir="ltr">+ %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr"> dec r5d</div><div dir="ltr"> jnz .row</div><div dir="ltr">+</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> vextracti128 xm5, m4, 1</div><div dir="ltr"> vpaddq xm4, xm5</div><div dir="ltr"> punpckhqdq xm2, xm4, xm3</div><div dir="ltr">@@ -16174,7 +16191,23 @@</div><div dir="ltr"> vpaddq xm7, xm5</div><div dir="ltr"> punpckhqdq xm2, xm7, xm3</div><div dir="ltr"> paddq xm7, xm2</div><div dir="ltr">-</div><div dir="ltr">+%else</div><div dir="ltr">+ vextracti128 xm5, m4, 1</div><div dir="ltr">+ vpaddd xm4, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm4, xm3</div><div dir="ltr">+ paddd xm4, xm2</div><div dir="ltr">+ punpckldq xm4, xm4, xm3</div><div dir="ltr">+ punpckhqdq xm2, xm4, xm3</div><div dir="ltr">+ paddd xm4, xm2</div><div dir="ltr">+</div><div dir="ltr">+ vextracti128 xm5, m7, 1</div><div dir="ltr">+ vpaddd xm7, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm7, xm3</div><div dir="ltr">+ paddd xm7, xm2</div><div dir="ltr">+ punpckldq xm7, xm7, xm3</div><div dir="ltr">+ punpckhqdq xm2, xm7, xm3</div><div dir="ltr">+ paddd xm7, xm2</div><div dir="ltr">+%endif</div><div dir="ltr"> movq [r4], xm4</div><div dir="ltr"> movq [r6], xm7</div><div dir="ltr"> RET</div><div dir="ltr">@@ -16187,119 +16220,89 @@</div><div dir="ltr"> vpxor m3, m3</div><div dir="ltr"> vpxor m7, m7 ;ac_k</div><div dir="ltr"> .row:</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> ;Col 1-8</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr"> vpmovzxwd m0, [r0] ;fenc</div><div dir="ltr"> vpmovzxwd m1, [r2] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0]</div><div dir="ltr">- vpmovzxbd m1, [r2]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 9-16</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 16] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 16] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 8]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 8]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 16]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 17-24</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 32] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 32] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 16]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 16]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 32]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 32]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 25-32</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 48] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 48] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 24]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 24]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 48]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 48]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 33-40</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 64] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 64] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 32]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 32]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 64]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 64]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 41-48</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 80] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 80] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 40]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 40]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 80]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 80]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 49-56</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 96] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 96] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 48]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 48]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 96]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 96]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr"> </div><div dir="ltr"> ;Col 57-64</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 112] ;fenc</div><div dir="ltr">- vpmovzxwd m1, [r2 + 112] ;recon</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 56]</div><div dir="ltr">- vpmovzxbd m1, [r2 + 56]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- SSIM_RD_COL m0, m1</div><div dir="ltr">-</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 112]</div><div dir="ltr">+ vpmovzxwd m1, [r2 + 112]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_HIGH m0, m1</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr"> lea r2, [r2 + 2 * r3]</div><div dir="ltr">-%else</div><div dir="ltr">+%elif BIT_DEPTH == 8</div><div dir="ltr">+;col 1-16</div><div dir="ltr">+ vpmovzxbw m0, [r0] ;fenc</div><div dir="ltr">+ vpmovzxbw m1, [r2] ;recon</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr">+;col 17-32</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 16]</div><div dir="ltr">+ vpmovzxbw m1, [r2 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr">+;col 33-48</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 32]</div><div dir="ltr">+ vpmovzxbw m1, [r2 + 32]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr">+;col 49-64</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 48]</div><div dir="ltr">+ vpmovzxbw m1, [r2 + 48]</div><div dir="ltr">+</div><div dir="ltr">+ SSIM_DIST_LOW m0, m1</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + r1]</div><div dir="ltr"> lea r2, [r2 + r3]</div><div dir="ltr"> %endif</div><div dir="ltr"> dec r5d</div><div dir="ltr"> jnz .row</div><div dir="ltr">+</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> vextracti128 xm5, m4, 1</div><div dir="ltr"> vpaddq xm4, xm5</div><div dir="ltr"> punpckhqdq xm2, xm4, xm3</div><div dir="ltr">@@ -16309,7 +16312,23 @@</div><div dir="ltr"> vpaddq xm7, xm5</div><div dir="ltr"> punpckhqdq xm2, xm7, xm3</div><div dir="ltr"> paddq xm7, xm2</div><div dir="ltr">-</div><div dir="ltr">+%else</div><div dir="ltr">+ vextracti128 xm5, m4, 1</div><div dir="ltr">+ vpaddd xm4, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm4, xm3</div><div dir="ltr">+ paddd xm4, xm2</div><div dir="ltr">+ punpckldq xm4, xm4, xm3</div><div dir="ltr">+ punpckhqdq xm2, xm4, xm3</div><div dir="ltr">+ paddd xm4, xm2</div><div dir="ltr">+</div><div dir="ltr">+ vextracti128 xm5, m7, 1</div><div dir="ltr">+ vpaddd xm7, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm7, xm3</div><div dir="ltr">+ paddd xm7, xm2</div><div dir="ltr">+ punpckldq xm7, xm7, xm3</div><div dir="ltr">+ punpckhqdq xm2, xm7, xm3</div><div dir="ltr">+ paddd xm7, xm2</div><div dir="ltr">+%endif</div><div dir="ltr"> movq [r4], xm4</div><div dir="ltr"> movq [r6], xm7</div><div dir="ltr"> RET</div><div dir="ltr">@@ -16344,7 +16363,7 @@</div><div dir="ltr"> %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr"> </div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> %if HIGH_BIT_DEPTH</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr">@@ -16367,39 +16386,45 @@</div><div dir="ltr"> vpxor m3, m3 ;z_k</div><div dir="ltr"> vpxor m5, m5</div><div dir="ltr"> .row:</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> ;Col 1-8</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr"> vpmovzxwd m0, [r0] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 9-16</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 16] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 8]</div><div dir="ltr">-%else</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr">+</div><div dir="ltr">+ lea r0, [r0 + 2 * r1]</div><div dir="ltr">+%elif BIT_DEPTH == 8</div><div dir="ltr">+;col 1-16</div><div dir="ltr">+ vpmovzxbw m0, [r0] ;src</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+</div><div dir="ltr">+ lea r0, [r0 + r1]</div><div dir="ltr">+%else</div><div dir="ltr"> %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">-</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- lea r0, [r0 + 2 * r1]</div><div dir="ltr">-%else</div><div dir="ltr">- lea r0, [r0 + r1]</div><div dir="ltr">-%endif</div><div dir="ltr"> dec r4d</div><div dir="ltr"> jnz .row</div><div dir="ltr">+</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> vextracti128 xm4, m3, 1</div><div dir="ltr"> vpaddq xm3, xm4</div><div dir="ltr"> punpckhqdq xm2, xm3, xm5</div><div dir="ltr"> paddq xm3, xm2</div><div dir="ltr">+%else</div><div dir="ltr">+ vextracti128 xm4, m3, 1</div><div dir="ltr">+ vpaddd xm3, xm4</div><div dir="ltr">+ punpckhqdq xm2, xm3, xm5</div><div dir="ltr">+ paddd xm3, xm2</div><div dir="ltr">+ punpckldq xm3, xm3, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm3, xm5</div><div dir="ltr">+ paddd xm3, xm2</div><div dir="ltr">+%endif</div><div dir="ltr"> movq [r3], xm3</div><div dir="ltr"> RET</div><div dir="ltr"> </div><div dir="ltr">@@ -16410,61 +16435,59 @@</div><div dir="ltr"> vpxor m3, m3 ;z_k</div><div dir="ltr"> vpxor m5, m5</div><div dir="ltr"> .row:</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> ;Col 1-8</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr"> vpmovzxwd m0, [r0] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 9-16</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 16] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 8]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 17-24</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 32] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 16]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 32]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 25-32</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 48] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 24]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">-</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 48]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr">-%else</div><div dir="ltr">+%elif BIT_DEPTH == 8</div><div dir="ltr">+;col 1-16</div><div dir="ltr">+ vpmovzxbw m0, [r0] ;src</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+;col 17-32</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + r1]</div><div dir="ltr">+%else</div><div dir="ltr">+ %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr"> dec r4d</div><div dir="ltr"> jnz .row</div><div dir="ltr">+</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> vextracti128 xm4, m3, 1</div><div dir="ltr"> vpaddq xm3, xm4</div><div dir="ltr"> punpckhqdq xm2, xm3, xm5</div><div dir="ltr"> paddq xm3, xm2</div><div dir="ltr">+%else</div><div dir="ltr">+ vextracti128 xm4, m3, 1</div><div dir="ltr">+ vpaddd xm3, xm4</div><div dir="ltr">+ punpckhqdq xm2, xm3, xm5</div><div dir="ltr">+ paddd xm3, xm2</div><div dir="ltr">+ punpckldq xm3, xm3, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm3, xm5</div><div dir="ltr">+ paddd xm3, xm2</div><div dir="ltr">+%endif</div><div dir="ltr"> movq [r3], xm3</div><div dir="ltr"> RET</div><div dir="ltr"> </div><div dir="ltr">@@ -16475,104 +16498,86 @@</div><div dir="ltr"> vpxor m3, m3 ;z_k</div><div dir="ltr"> vpxor m5, m5</div><div dir="ltr"> .row:</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> ;Col 1-8</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr"> vpmovzxwd m0, [r0] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 9-16</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 16] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 8]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 17-24</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 32] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 16]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 32]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 25-32</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 48] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 24]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 48]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 33-40</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 64] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 32]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 64]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 41-48</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 80] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 40]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 80]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 49-56</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 96] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 48]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 96]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr"> </div><div dir="ltr"> ;Col 57-64</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">- vpmovzxwd m0, [r0 + 112] ;src</div><div dir="ltr">-%elif BIT_DEPTH == 8</div><div dir="ltr">- vpmovzxbd m0, [r0 + 56]</div><div dir="ltr">-%else</div><div dir="ltr">- %error Unsupported BIT_DEPTH!</div><div dir="ltr">-%endif</div><div dir="ltr">-</div><div dir="ltr">- NORM_FACT_COL m0</div><div dir="ltr">-</div><div dir="ltr">-%if HIGH_BIT_DEPTH</div><div dir="ltr">+ vpmovzxwd m0, [r0 + 112]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_HIGH m0</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + 2 * r1]</div><div dir="ltr">-%else</div><div dir="ltr">+%elif BIT_DEPTH == 8</div><div dir="ltr">+;col 1-16</div><div dir="ltr">+ vpmovzxbw m0, [r0] ;src</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+;col 17-32</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 16]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+;col 33-48</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 32]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+;col 49-56</div><div dir="ltr">+ vpmovzxbw m0, [r0 + 48]</div><div dir="ltr">+</div><div dir="ltr">+ NORM_FACT_LOW m0</div><div dir="ltr">+</div><div dir="ltr"> lea r0, [r0 + r1]</div><div dir="ltr">+%else</div><div dir="ltr">+ %error Unsupported BIT_DEPTH!</div><div dir="ltr"> %endif</div><div dir="ltr"> dec r4d</div><div dir="ltr"> jnz .row</div><div dir="ltr">+</div><div dir="ltr">+%if HIGH_BIT_DEPTH</div><div dir="ltr"> vextracti128 xm4, m3, 1</div><div dir="ltr"> vpaddq xm3, xm4</div><div dir="ltr"> punpckhqdq xm2, xm3, xm5</div><div dir="ltr"> paddq xm3, xm2</div><div dir="ltr">+%else</div><div dir="ltr">+ vextracti128 xm4, m3, 1</div><div dir="ltr">+ vpaddd xm3, xm4</div><div dir="ltr">+ punpckhqdq xm2, xm3, xm5</div><div dir="ltr">+ paddd xm3, xm2</div><div dir="ltr">+ punpckldq xm3, xm3, xm5</div><div dir="ltr">+ punpckhqdq xm2, xm3, xm5</div><div dir="ltr">+ paddd xm3, xm2</div><div dir="ltr">+%endif</div><div dir="ltr"> movq [r3], xm3</div><div dir="ltr"> RET</div><div class="gmail-m_-8931780846236361189m_-4093124441845140407gmail-yj6qo"></div><br class="gmail-m_-8931780846236361189m_-4093124441845140407gmail-Apple-interchange-newline"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-8931780846236361189m_-4093124441845140407gmail_signature"><div dir="ltr"><i><b>Regards,</b></i><div><i><b>Akil R</b></i></div></div></div></div>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote></div>