[x264-devel] [PATCH] zigzag SSE2

Guillaume Poirier gpoirier at mplayerhq.hu
Fri May 2 14:30:02 CEST 2008


Hello,

Axel Zeuner wrote:
> Hello,
> two patches against git HEAD are attached:
> - x264-zigzag-sse2.diff contains SSE2 implementations of the zigzag functions. 
> - x264-timeasm.diff contains timeasm, a timing code to check the effects of 
> the changes made. The program is a hack, it does no checks and was tested 
> only on linux x86/x86-64 using gcc.
>
> I would like to see results on other processors in 32-bit and 64-bit mode 
> before one may start discuss about inclusion of these functions into git. 
>
> Two results as printed by timeasm follow:
>   
I tested your patch on 2 Core2 machines:

The first one has a Penryn core:

Architecture: x86-64
model name      : Intel(R) Core(TM)2 Duo CPU     E8200  @ 2.66GHz
x264: using random seed 1704265240
---------------------------------------------
zigzag frame
offset determination: 424 clocks
c - sub_4x4: 40 clocks
ref - sub_4x4: 40 clocks
new - sub_4x4: 16 clocks
offset determination: 424 clocks
c - scan_4x4: 16 clocks
ref - scan_4x4: 16 clocks
new - scan_4x4: 16 clocks
offset determination: 424 clocks
c - scan_8x8: 88 clocks
ref - scan_8x8: 88 clocks
new - scan_8x8: 72 clocks
---------------------------------------------
zigzag field
offset determination: 424 clocks
c - sub_4x4: 40 clocks
ref - sub_4x4: 40 clocks
new - sub_4x4: 16 clocks
offset determination: 424 clocks
c - scan_4x4: 8 clocks
ref - scan_4x4: 8 clocks
new - scan_4x4: 8 clocks
offset determination: 424 clocks
c - scan_8x8: 64 clocks
ref - scan_8x8: 64 clocks
new - scan_8x8: 32 clocks
---------------------------------------------
dct
offset determination: 424 clocks
c - sub8x8_dct: 328 clocks
ref - sub8x8_dct: 120 clocks
new - sub8x8_dct: 80 clocks
offset determination: 424 clocks
c - sub16x16_dct: 1304 clocks
ref - sub16x16_dct: 488 clocks
new - sub16x16_dct: 304 clocks
offset determination: 424 clocks
c - add8x8_idct: 600 clocks
ref - add8x8_idct: 128 clocks
new - add8x8_idct: 88 clocks
offset determination: 424 clocks
c - add16x16_idct: 2392 clocks
ref - add16x16_idct: 512 clocks
new - add16x16_idct: 344 clocks



the second one has a Conroe core:

Architecture: x86-64
model name      : Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz
x264: using random seed 1918169994
---------------------------------------------
zigzag frame
offset determination: 427 clocks
c - sub_4x4: 42 clocks
ref - sub_4x4: 42 clocks
new - sub_4x4: 35 clocks
offset determination: 427 clocks
c - scan_4x4: 21 clocks
ref - scan_4x4: 21 clocks
new - scan_4x4: 21 clocks
offset determination: 427 clocks
c - scan_8x8: 91 clocks
ref - scan_8x8: 91 clocks
new - scan_8x8: 84 clocks
---------------------------------------------
zigzag field
offset determination: 427 clocks
c - sub_4x4: 42 clocks
ref - sub_4x4: 42 clocks
new - sub_4x4: 35 clocks
offset determination: 427 clocks
c - scan_4x4: 14 clocks
ref - scan_4x4: 14 clocks
new - scan_4x4: 14 clocks
offset determination: 427 clocks
c - scan_8x8: 70 clocks
ref - scan_8x8: 70 clocks
new - scan_8x8: 42 clocks
---------------------------------------------
dct
offset determination: 427 clocks
c - sub8x8_dct: 343 clocks
ref - sub8x8_dct: 133 clocks
new - sub8x8_dct: 119 clocks
offset determination: 427 clocks
c - sub16x16_dct: 1323 clocks
ref - sub16x16_dct: 490 clocks
new - sub16x16_dct: 441 clocks
offset determination: 427 clocks
c - add8x8_idct: 609 clocks
ref - add8x8_idct: 133 clocks
new - add8x8_idct: 119 clocks
offset determination: 427 clocks
c - add16x16_idct: 2408 clocks
ref - add16x16_idct: 511 clocks
new - add16x16_idct: 441 clocks



More information about the x264-devel mailing list