<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div><br>this version is good.</div>
<div>of course, you may use %rep to reduce text lines, don't need modify, just for future</div>
<div><br>At 2015-02-02 19:48:10,praveen@multicorewareinc.com wrote:<br>># HG changeset patch<br>># User Praveen Tiwari<br>># Date 1422877414 -19800<br>># Branch stable<br>># Node ID 8d03acd70332ccf642fc7222bf6f9e7f005983ba<br>># Parent 8e1f8ca9d4112d8ad9801bf79518482306ff55ce<br>>blockfill_s_16x16 sse2 asm code optimization<br>><br>>eliminated branch instructions and optimized LEA instruction<br>><br>>diff -r 8e1f8ca9d411 -r 8d03acd70332 source/common/x86/blockcopy8.asm<br>>--- a/source/common/x86/blockcopy8.asm Mon Feb 02 17:03:40 2015 +0530<br>>+++ b/source/common/x86/blockcopy8.asm Mon Feb 02 17:13:34 2015 +0530<br>>@@ -1771,57 +1771,58 @@<br>> RET<br>> <br>> ;-----------------------------------------------------------------------------<br>>-; void blockfill_s_%1x%2(int16_t* dst, intptr_t dstride, int16_t val)<br>>+; void blockfill_s_16x16(int16_t* dst, intptr_t dstride, int16_t val)<br>> ;-----------------------------------------------------------------------------<br>>-%macro BLOCKFILL_S_W16_H8 2<br>> INIT_XMM sse2<br>>-cglobal blockfill_s_%1x%2, 3, 5, 1, dst, dstStride, val<br>>-<br>>-mov r3d, %2/8<br>>+cglobal blockfill_s_16x16, 3, 4, 1, dst, dstStride, val<br>> <br>> add r1, r1<br>>+lea r3, [3 * r1]<br>> <br>> movd m0, r2d<br>>-pshuflw m0, m0, 0<br>>-pshufd m0, m0, 0<br>>-<br>>-.loop:<br>>- movu [r0], m0<br>>- movu [r0 + 16], m0<br>>-<br>>- movu [r0 + r1], m0<br>>- movu [r0 + r1 + 16], m0<br>>-<br>>- movu [r0 + 2 * r1], m0<br>>- movu [r0 + 2 * r1 + 16], m0<br>>-<br>>- lea r4, [r0 + 2 * r1]<br>>- movu [r4 + r1], m0<br>>- movu [r4 + r1 + 16], m0<br>>-<br>>- movu [r0 + 4 * r1], m0<br>>- movu [r0 + 4 * r1 + 16], m0<br>>-<br>>- lea r4, [r0 + 4 * r1]<br>>- movu [r4 + r1], m0<br>>- movu [r4 + r1 + 16], m0<br>>-<br>>- movu [r4 + 2 * r1], m0<br>>- movu [r4 + 2 * r1 + 16], m0<br>>-<br>>- lea r4, [r4 + 2 * r1]<br>>- movu [r4 + r1], m0<br>>- movu [r4 + r1 + 16], m0<br>>-<br>>- lea r0, [r0 + 8 * r1]<br>>-<br>>- dec r3d<br>>- jnz .loop<br>>-<br>>+pshuflw m0, m0, 0<br>>+pshufd m0, m0, 0<br>>+<br>>+movu [r0], m0<br>>+movu [r0 + 16], m0<br>>+movu [r0 + r1], m0<br>>+movu [r0 + r1 + 16], m0<br>>+movu [r0 + 2 * r1], m0<br>>+movu [r0 + 2 * r1 + 16], m0<br>>+<br>>+movu [r0 + r3], m0<br>>+movu [r0 + r3 + 16], m0<br>>+lea r0, [r0 + 4 * r1]<br>>+movu [r0], m0<br>>+movu [r0 + 16], m0<br>>+<br>>+movu [r0 + r1], m0<br>>+movu [r0 + r1 + 16], m0<br>>+movu [r0 + 2 * r1], m0<br>>+movu [r0 + 2 * r1 + 16], m0<br>>+movu [r0 + r3], m0<br>>+movu [r0 + r3 + 16], m0<br>>+lea r0, [r0 + 4 * r1]<br>>+movu [r0], m0<br>>+movu [r0 + 16], m0<br>>+<br>>+movu [r0 + r1], m0<br>>+movu [r0 + r1 + 16], m0<br>>+movu [r0 + 2 * r1], m0<br>>+movu [r0 + 2 * r1 + 16], m0<br>>+movu [r0 + r3], m0<br>>+movu [r0 + r3 + 16], m0<br>>+lea r0, [r0 + 4 * r1]<br>>+movu [r0], m0<br>>+movu [r0 + 16], m0<br>>+<br>>+movu [r0 + r1], m0<br>>+movu [r0 + r1 + 16], m0<br>>+movu [r0 + 2 * r1], m0<br>>+movu [r0 + 2 * r1 + 16], m0<br>>+movu [r0 + r3], m0<br>>+movu [r0 + r3 + 16], m0<br>> RET<br>>-%endmacro<br>>-<br>>-BLOCKFILL_S_W16_H8 16, 16<br>> <br>> INIT_YMM avx2<br>> cglobal blockfill_s_16x16, 3, 4, 1<br>>_______________________________________________<br>>x265-devel mailing list<br>>x265-devel@videolan.org<br>>https://mailman.videolan.org/listinfo/x265-devel<br></div></div>