[x264-devel] [Christian Heine <sennindemokrit at gmx.net>] [patch] mmxext optimized 8x8 block transforms (sub8_dct8/add8_idct8)
System administration
admin at via.ecp.fr
Thu Aug 25 10:41:34 CEST 2005
The deleted attachment is at:
<http://www.videolan.org/~admin/20050825-videolan/x264-dct8-idct8-mmxext.diff>
----- Forwarded message from Christian Heine <sennindemokrit at gmx.net> -----
From: Christian Heine <sennindemokrit at gmx.net>
Date: Thu, 25 Aug 2005 03:20:21 +0200
To: x264-devel at videolan.org
Subject: [patch] mmxext optimized 8x8 block transforms (sub8_dct8/add8_idct8)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511
X-Spam-Status: No, score=-8.4 required=5.0 tests=FORGED_RCVD_HELO,IN_REP_TO,
RCVD_IN_ORBS,UNIFIED_PATCH autolearn=failed version=3.0.3
Hi,
attached is a patch based on rev. 287 that implements
* x264_sub8x8_dct8_mmxext
* x264_add8x8_idct8_mmxext
which are 3.3 and 4.0 times faster than their C counterparts
(respectively) on my AthlonXP.
Of course they produce bit identical output compared to the C
implementation and the overall speed gain was 2.23% for my non- sythetic
test inputs. An SSE2 optimized version is also possible ( only a typing
exercise ) but will only result in a minor speep up ( estimated 3.8/4.5
times faster than C ) since only a few parts can be optimized for SSE2.
So far it assembles with nasm. I haven't tested it for other assemblers.
btw. I noticed that the C sub8x8_dct8 isn't exactly the inverse of
add8x8_idct8. I wonder if this is really intended ( to add
compressability with quant/dequant perhaps ) or just a bug.
regards,
Christian
----- End forwarded message -----
--
System administration <admin at via.ecp.fr>
VIA, Ecole Centrale Paris, France
--
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html
More information about the x264-devel
mailing list