<div dir="ltr">Sorry . Both are same primitive. I will correct it and resend the two patches. </div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 25, 2015 at 3:34 PM, Deepthi Nandakumar <span dir="ltr"><<a href="mailto:deepthi@multicorewareinc.com" target="_blank">deepthi@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Thu, Jun 25, 2015 at 2:19 PM, <span dir="ltr"><<a href="mailto:rajesh@multicorewareinc.com" target="_blank">rajesh@multicorewareinc.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"># HG changeset patch<br>
# User Rajesh Paulraj<<a href="mailto:rajesh@multicorewareinc.com" target="_blank">rajesh@multicorewareinc.com</a>><br>
# Date 1435219198 -19800<br>
# Thu Jun 25 13:29:58 2015 +0530<br>
# Node ID a03487d6295cf89b065eff36e5c1ec4ee4253243<br>
# Parent b1af4c36f48a4500a4912373ebcda9a5540b5c15<br>
asm: sse4 10bit code for sign primitive<br>
<br>
calSign 6.16x 356.91 2197.63<br>
<br>
diff -r b1af4c36f48a -r a03487d6295c source/common/x86/asm-primitives.cpp<br>
--- a/source/common/x86/asm-primitives.cpp Wed Jun 24 10:36:15 2015 -0500<br>
+++ b/source/common/x86/asm-primitives.cpp Thu Jun 25 13:29:58 2015 +0530<br>
@@ -1097,6 +1097,7 @@<br>
p.saoCuOrgE3[0] = PFX(saoCuOrgE3_sse4);<br>
p.saoCuOrgE3[1] = PFX(saoCuOrgE3_sse4);<br>
p.saoCuOrgB0 = PFX(saoCuOrgB0_sse4);<br>
+ p.sign = x265_calculateSign_sse4;<br>
<br></blockquote></span><div>This should be PFX().<br> <br></div><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
LUMA_ADDAVG(sse4);<br>
CHROMA_420_ADDAVG(sse4);<br>
diff -r b1af4c36f48a -r a03487d6295c source/common/x86/loopfilter.asm<br>
--- a/source/common/x86/loopfilter.asm Wed Jun 24 10:36:15 2015 -0500<br>
+++ b/source/common/x86/loopfilter.asm Thu Jun 25 13:29:58 2015 +0530<br>
@@ -40,6 +40,7 @@<br>
cextern pw_2<br>
cextern pw_1023<br>
cextern pb_movemask<br>
+cextern pw_1<br>
<br>
<br>
;============================================================================================================<br>
@@ -1419,3 +1420,49 @@<br>
<br>
.end:<br>
RET<br>
+<br>
+;-----------------------------------------------------------------------------<br>
+; void calSign(int8_t *dst, const pixel *src1, const pixel *src2, const int endX)<br>
+;-----------------------------------------------------------------------------<br>
+%if HIGH_BIT_DEPTH<br>
+INIT_XMM sse4<br>
+cglobal calculateSign, 4, 7, 5<br>
+ mova m0, [pw_1]<br>
+ mov r4d, r3d<br>
+ shr r3d, 4<br>
+ add r3d, 1<br>
+ mov r5, r0<br>
+ movu m4, [r0 + r4]<br>
+.loop<br>
+ movu m1, [r1] ; m2 = pRec[x]<br>
+ movu m2, [r2] ; m3 = pTmpU[x]<br>
+<br>
+ pcmpgtw m3, m1, m2<br>
+ pcmpgtw m2, m1<br>
+ pand m3, m0<br>
+ por m3, m2<br>
+ packsswb m3, m3<br>
+ movh [r0], xm3<br>
+<br>
+ movu m1, [r1 + 16] ; m2 = pRec[x]<br>
+ movu m2, [r2 + 16] ; m3 = pTmpU[x]<br>
+<br>
+ pcmpgtw m3, m1, m2<br>
+ pcmpgtw m2, m1<br>
+ pand m3, m0<br>
+ por m3, m2<br>
+ packsswb m3, m3<br>
+ movh [r0 + 8], xm3<br>
+<br>
+ add r0, 16<br>
+ add r1, 32<br>
+ add r2, 32<br>
+ dec r3d<br>
+ jnz .loop<br>
+<br>
+ mov r6, r0<br>
+ sub r6, r5<br>
+ sub r4, r6<br>
+ movu [r0 + r4], m4<br>
+ RET<br>
+%endif<br>
diff -r b1af4c36f48a -r a03487d6295c source/common/x86/loopfilter.h<br>
--- a/source/common/x86/loopfilter.h Wed Jun 24 10:36:15 2015 -0500<br>
+++ b/source/common/x86/loopfilter.h Thu Jun 25 13:29:58 2015 +0530<br>
@@ -37,7 +37,8 @@<br>
void PFX(saoCuOrgB0_ ## cpu)(pixel* rec, const int8_t* offsetBo, int ctuWidth, int ctuHeight, intptr_t stride); \<br>
void PFX(saoCuStatsE2_ ## cpu)(const pixel *fenc, const pixel *rec, intptr_t stride, int8_t *upBuff1, int8_t *upBufft, int endX, int endY, int32_t *stats, int32_t *count); \<br>
void PFX(saoCuStatsE3_ ## cpu)(const pixel *fenc, const pixel *rec, intptr_t stride, int8_t *upBuff1, int endX, int endY, int32_t *stats, int32_t *count); \<br>
- void PFX(calSign_ ## cpu)(int8_t *dst, const pixel *src1, const pixel *src2, const int endX);<br>
+ void PFX(calSign_ ## cpu)(int8_t *dst, const pixel *src1, const pixel *src2, const int endX); \<br>
+ void PFX(calculateSign_ ## cpu)(int8_t *dst, const pixel *src1, const pixel *src2, const int endX);<br>
<br></blockquote></div></div><div>Whats the difference between calculateSign_ and calSign_? They have the same function signature and are assigned to the same primitive?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
DECL_SAO(sse4);<br>
DECL_SAO(avx2);<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote></div><br></div></div>
<br>_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<br></blockquote></div><br></div>