<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">FYI,<br>
<br>
I forgot to comment in the commit message:<br>
<br>
I kept the original code for 64 bit because while using r2b works
in 64 bits, using it severely hurt performance to the point that
it was well below c code.<br>
<br>
This is probably due to how things like instruction order, length
and layout in memory can affect performance(see agner docs).<br>
<br>
This probably leaves some room for performance improvements since
most X265 assembler is based primarily on algorithms implementing
x265 functionality but not all aspects of processor function.
While writing optimized assembler for every processor is
unrealistic, the assembler of each simd level could be optimized
for the latest processor that supports it.(i.e. the sse4 assembler
could be optimized to support the latest processor that support
only up to sse4). <br>
<br>
One downside is these types of optimizations are more likely to
generate code that looks like the jumbled code generated by a
compiler and thus be less easy to read, understand and maintain.
Of course more comments can help here.<br>
<br>
Is x265 interested in such optimizations?<br>
<br>
On 02/25/2015 08:31 PM, Deepthi Nandakumar wrote:<br>
</div>
<blockquote
cite="mid:CAAEo3uhewxeNnxTYc0XxGYEose6bAum+Z3apwge1K-yZt82t+g@mail.gmail.com"
type="cite">
<div dir="ltr">Thanks, pushed.<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Feb 26, 2015 at 8:03 AM, <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:dtyx265@gmail.com" target="_blank">dtyx265@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"># HG
changeset patch<br>
# User David T Yuen <<a moz-do-not-send="true"
href="mailto:dtyx265@gmail.com">dtyx265@gmail.com</a>><br>
# Date 1424917924 28800<br>
# Node ID 13346cb90bff040492f0688226f44182bb6b97d8<br>
# Parent 74c716607444c77b9d5ea1dce5b99c875f0b20fe<br>
Fixed 32 bit bug in intrapred dc4 sse2<br>
<br>
Changed register written from to one that supports low byte
access in 32 bit<br>
Also moved pw_257 constant to const-a.asm<br>
<br>
diff -r 74c716607444 -r 13346cb90bff
source/common/x86/const-a.asm<br>
--- a/source/common/x86/const-a.asm Tue Feb 24 13:39:16
2015 +0530<br>
+++ b/source/common/x86/const-a.asm Wed Feb 25 18:32:04
2015 -0800<br>
@@ -37,6 +37,7 @@<br>
const pw_32, times 16 dw 32<br>
const pw_128, times 16 dw 128<br>
const pw_256, times 16 dw 256<br>
+const pw_257, times 16 dw 257<br>
const pw_512, times 16 dw 512<br>
const pw_1023, times 8 dw 1023<br>
const pw_1024, times 16 dw 1024<br>
diff -r 74c716607444 -r 13346cb90bff
source/common/x86/intrapred8.asm<br>
--- a/source/common/x86/intrapred8.asm Tue Feb 24 13:39:16
2015 +0530<br>
+++ b/source/common/x86/intrapred8.asm Wed Feb 25 18:32:04
2015 -0800<br>
@@ -65,8 +65,6 @@<br>
pw_planar32_L: dw 31, 30, 29, 28, 27, 26, 25, 24<br>
pw_planar32_H: dw 23, 22, 21, 20, 19, 18, 17, 16<br>
<br>
-pw_257: times 8 dw 257<br>
-<br>
const ang_table<br>
%assign x 0<br>
%rep 32<br>
@@ -80,6 +78,7 @@<br>
cextern pw_8<br>
cextern pw_16<br>
cextern pw_32<br>
+cextern pw_257<br>
cextern pw_1024<br>
cextern pb_unpackbd1<br>
cextern multiL<br>
@@ -144,12 +143,21 @@<br>
paddw m2, m1<br>
psraw m2, 2<br>
packuswb m2, m2<br>
+%if ARCH_X86_64<br>
movq r4, m2<br>
mov [r0], r4b<br>
shr r4, 8<br>
mov [r0 + r1], r4b<br>
shr r4, 8<br>
mov [r0 + r1 * 2], r4b<br>
+%else<br>
+ movd r2d, m2<br>
+ mov [r0], r2b<br>
+ shr r2, 8<br>
+ mov [r0 + r1], r2b<br>
+ shr r2, 8<br>
+ mov [r0 + r1 * 2], r2b<br>
+%endif<br>
.end:<br>
RET<br>
<br>
_______________________________________________<br>
x265-devel mailing list<br>
<a moz-do-not-send="true"
href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a moz-do-not-send="true"
href="https://mailman.videolan.org/listinfo/x265-devel"
target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
x265-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a>
<a class="moz-txt-link-freetext" href="https://mailman.videolan.org/listinfo/x265-devel">https://mailman.videolan.org/listinfo/x265-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>