[x264-devel] [Christian Heine <sennindemokrit at gmx.net>] [patch] mmxext optimized 8x8 block transforms (sub8_dct8/add8_idct8)

Thu Aug 25 10:41:34 CEST 2005

 The deleted attachment is at:
    <http://www.videolan.org/~admin/20050825-videolan/x264-dct8-idct8-mmxext.diff>

----- Forwarded message from Christian Heine <sennindemokrit at gmx.net> -----

From: Christian Heine <sennindemokrit at gmx.net>
Date: Thu, 25 Aug 2005 03:20:21 +0200
To: x264-devel at videolan.org
Subject: [patch] mmxext optimized 8x8 block transforms (sub8_dct8/add8_idct8)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511
X-Spam-Status: No, score=-8.4 required=5.0 tests=FORGED_RCVD_HELO,IN_REP_TO,
	RCVD_IN_ORBS,UNIFIED_PATCH autolearn=failed version=3.0.3

Hi,

attached is a patch based on rev. 287 that implements
* x264_sub8x8_dct8_mmxext
* x264_add8x8_idct8_mmxext
which are 3.3 and 4.0 times faster than their C counterparts 
(respectively) on my AthlonXP.

Of course they produce bit identical output compared to the C 
implementation and the overall speed gain was 2.23% for my non- sythetic 
test inputs. An SSE2 optimized version is also possible ( only a typing 
exercise ) but will only result in a minor speep up ( estimated 3.8/4.5 
times faster than C ) since only a few parts can be optimized for SSE2.

So far it assembles with nasm. I haven't tested it for other assemblers.

btw. I noticed that the C sub8x8_dct8 isn't exactly the inverse of 
add8x8_idct8. I wonder if this is really intended ( to add 
compressability with quant/dequant perhaps ) or just a bug.

regards,
Christian

----- End forwarded message -----

-- 
System administration <admin at via.ecp.fr>
VIA, Ecole Centrale Paris, France

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html