[x264-devel] [Patch] zigzag SSE2 Version 2

Axel Zeuner axel.zeuner at gmx.de
Sun Aug 10 14:54:30 CEST 2008


Hello,

the attached patch contains an improved version of the zigzag SSE2 functions. 
Loads and stores are now aligned in the scan_8x8 functions, this should 
increase the performance on Core2.

x86_64 checkasm --bench k10 results:
zigzag_scan_4x4_field_c: 122
zigzag_scan_4x4_field_mmx: 82
zigzag_scan_4x4_field_sse2: 63
zigzag_scan_4x4_frame_c: 229
zigzag_scan_4x4_frame_sse2: 81
zigzag_scan_8x8_field_c: 751
zigzag_scan_8x8_field_sse2: 265
zigzag_scan_8x8_frame_c: 766
zigzag_scan_8x8_frame_sse2: 343
zigzag_sub_4x4_field_c: 383
zigzag_sub_4x4_field_sse2: 161
zigzag_sub_4x4_frame_c: 397
zigzag_sub_4x4_frame_sse2: 165

x86_64 checkasm --bench k8 results:
zigzag_scan_4x4_field_c: 133
zigzag_scan_4x4_field_mmx: 107
zigzag_scan_4x4_field_sse2: 102
zigzag_scan_4x4_frame_c: 218
zigzag_scan_4x4_frame_sse2: 130
zigzag_scan_8x8_field_c: 692
zigzag_scan_8x8_field_sse2: 449
zigzag_scan_8x8_frame_c: 757
zigzag_scan_8x8_frame_sse2: 550
zigzag_sub_4x4_field_c: 398
zigzag_sub_4x4_field_sse2: 238
zigzag_sub_4x4_frame_c: 403
zigzag_sub_4x4_frame_sse2: 262

Regards,
Axel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264-zigzag2-sse2-v2.diff
Type: text/x-diff
Size: 12227 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20080810/d5d53d41/attachment.diff 


More information about the x264-devel mailing list