<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Hi Min,<br><br>There is overflow with 10-bit as well which causes output mismatch. So, we should use pmaddwd for BIT_DEPTH <= 8 only.<br><br>You can reproduce output mismtach with following CLI-<br>(frame no: 141, check value: m_scratch + 114)<br>CrowdRun_1920x1080_50_10bit_422.yuv --preset fast --aq-mode 0 --sar 2 --range full --no-info --hash=1 --psnr --ssim<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 11, 2015 at 10:12 PM, Min Chen <span dir="ltr"><<a href="mailto:chenm003@163.com" target="_blank">chenm003@163.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>
# User Min Chen <<a href="mailto:chenm003@163.com">chenm003@163.com</a>><br>
# Date 1447258832 21600<br>
# Node ID df66a0f940c87df49318203de0231dca6ad8b4e4<br>
# Parent a74493c5b7ab137c3f082d9a661a7498a883baad<br>
asm: fix Main12 bug in mbtree_propagate_cost, (the IntraCost over 16bits)<br>
---<br>
source/common/x86/mc-a2.asm | 29 +++++++++++++++++++++++++++++<br>
1 files changed, 29 insertions(+), 0 deletions(-)<br>
<br>
diff -r a74493c5b7ab -r df66a0f940c8 source/common/x86/mc-a2.asm<br>
--- a/source/common/x86/mc-a2.asm Wed Nov 11 10:04:57 2015 -0600<br>
+++ b/source/common/x86/mc-a2.asm Wed Nov 11 10:20:32 2015 -0600<br>
@@ -1019,7 +1019,16 @@<br>
por m3, m1<br>
<br>
movd m1, [r1+r5*2] ; prop<br>
+%if (BIT_DEPTH <= 10)<br>
pmaddwd m0, m2<br>
+%else<br>
+ punpckldq m2, m2<br>
+ punpckldq m0, m0<br>
+ pmuludq m0, m2<br>
+ pshufd m2, m2, q3120<br>
+ pshufd m0, m0, q3120<br>
+%endif<br>
+<br>
punpcklwd m1, m4<br>
cvtdq2pd m0, m0<br>
mulpd m0, m6 ; intra*invq*fps_factor>>8<br>
@@ -1063,7 +1072,15 @@<br>
por m3, m1<br>
<br>
movd m1, [r1+r5*2] ; prop<br>
+%if (BIT_DEPTH <= 10)<br>
pmaddwd m0, m2<br>
+%else<br>
+ punpckldq m2, m2 ; DWORD [- 1 - 0]<br>
+ punpckldq m0, m0<br>
+ pmuludq m0, m2 ; QWORD [m1 m0]<br>
+ pshufd m2, m2, q3120<br>
+ pshufd m0, m0, q3120<br>
+%endif<br>
punpcklwd m1, m4<br>
cvtdq2pd m0, m0<br>
mulpd m0, m6 ; intra*invq*fps_factor>>8<br>
@@ -1103,7 +1120,11 @@<br>
pminsd xm3, xm2<br>
<br>
pmovzxwd xm1, [r1+r5*2] ; prop<br>
+%if (BIT_DEPTH <= 10)<br>
pmaddwd xm0, xm2<br>
+%else<br>
+ pmulld xm0, xm2<br>
+%endif<br>
cvtdq2pd m0, xm0<br>
cvtdq2pd m1, xm1 ; prop<br>
%if cpuflag(avx2)<br>
@@ -1145,7 +1166,11 @@<br>
<br>
movd xm1, [r1+r5*2] ; prop<br>
pmovzxwd xm1, xm1<br>
+%if (BIT_DEPTH <= 10)<br>
pmaddwd xm0, xm2<br>
+%else<br>
+ pmulld xm0, xm2<br>
+%endif<br>
cvtdq2pd m0, xm0<br>
cvtdq2pd m1, xm1 ; prop<br>
%if cpuflag(avx2)<br>
@@ -1179,7 +1204,11 @@<br>
<br>
movzx r6d, word [r1+r5*2] ; prop<br>
movd xm1, r6d<br>
+%if (BIT_DEPTH <= 10)<br>
pmaddwd xm0, xm2<br>
+%else<br>
+ pmulld xm0, xm2<br>
+%endif<br>
cvtdq2pd m0, xm0<br>
cvtdq2pd m1, xm1 ; prop<br>
%if cpuflag(avx2)<br>
<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote></div><br></div>