[x264-devel] [PATCH] zigzag SSE2
Guillaume Poirier
gpoirier at mplayerhq.hu
Fri May 2 14:30:02 CEST 2008
Hello,
Axel Zeuner wrote:
> Hello,
> two patches against git HEAD are attached:
> - x264-zigzag-sse2.diff contains SSE2 implementations of the zigzag functions.
> - x264-timeasm.diff contains timeasm, a timing code to check the effects of
> the changes made. The program is a hack, it does no checks and was tested
> only on linux x86/x86-64 using gcc.
>
> I would like to see results on other processors in 32-bit and 64-bit mode
> before one may start discuss about inclusion of these functions into git.
>
> Two results as printed by timeasm follow:
>
I tested your patch on 2 Core2 machines:
The first one has a Penryn core:
Architecture: x86-64
model name : Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz
x264: using random seed 1704265240
---------------------------------------------
zigzag frame
offset determination: 424 clocks
c - sub_4x4: 40 clocks
ref - sub_4x4: 40 clocks
new - sub_4x4: 16 clocks
offset determination: 424 clocks
c - scan_4x4: 16 clocks
ref - scan_4x4: 16 clocks
new - scan_4x4: 16 clocks
offset determination: 424 clocks
c - scan_8x8: 88 clocks
ref - scan_8x8: 88 clocks
new - scan_8x8: 72 clocks
---------------------------------------------
zigzag field
offset determination: 424 clocks
c - sub_4x4: 40 clocks
ref - sub_4x4: 40 clocks
new - sub_4x4: 16 clocks
offset determination: 424 clocks
c - scan_4x4: 8 clocks
ref - scan_4x4: 8 clocks
new - scan_4x4: 8 clocks
offset determination: 424 clocks
c - scan_8x8: 64 clocks
ref - scan_8x8: 64 clocks
new - scan_8x8: 32 clocks
---------------------------------------------
dct
offset determination: 424 clocks
c - sub8x8_dct: 328 clocks
ref - sub8x8_dct: 120 clocks
new - sub8x8_dct: 80 clocks
offset determination: 424 clocks
c - sub16x16_dct: 1304 clocks
ref - sub16x16_dct: 488 clocks
new - sub16x16_dct: 304 clocks
offset determination: 424 clocks
c - add8x8_idct: 600 clocks
ref - add8x8_idct: 128 clocks
new - add8x8_idct: 88 clocks
offset determination: 424 clocks
c - add16x16_idct: 2392 clocks
ref - add16x16_idct: 512 clocks
new - add16x16_idct: 344 clocks
the second one has a Conroe core:
Architecture: x86-64
model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz
x264: using random seed 1918169994
---------------------------------------------
zigzag frame
offset determination: 427 clocks
c - sub_4x4: 42 clocks
ref - sub_4x4: 42 clocks
new - sub_4x4: 35 clocks
offset determination: 427 clocks
c - scan_4x4: 21 clocks
ref - scan_4x4: 21 clocks
new - scan_4x4: 21 clocks
offset determination: 427 clocks
c - scan_8x8: 91 clocks
ref - scan_8x8: 91 clocks
new - scan_8x8: 84 clocks
---------------------------------------------
zigzag field
offset determination: 427 clocks
c - sub_4x4: 42 clocks
ref - sub_4x4: 42 clocks
new - sub_4x4: 35 clocks
offset determination: 427 clocks
c - scan_4x4: 14 clocks
ref - scan_4x4: 14 clocks
new - scan_4x4: 14 clocks
offset determination: 427 clocks
c - scan_8x8: 70 clocks
ref - scan_8x8: 70 clocks
new - scan_8x8: 42 clocks
---------------------------------------------
dct
offset determination: 427 clocks
c - sub8x8_dct: 343 clocks
ref - sub8x8_dct: 133 clocks
new - sub8x8_dct: 119 clocks
offset determination: 427 clocks
c - sub16x16_dct: 1323 clocks
ref - sub16x16_dct: 490 clocks
new - sub16x16_dct: 441 clocks
offset determination: 427 clocks
c - add8x8_idct: 609 clocks
ref - add8x8_idct: 133 clocks
new - add8x8_idct: 119 clocks
offset determination: 427 clocks
c - add16x16_idct: 2408 clocks
ref - add16x16_idct: 511 clocks
new - add16x16_idct: 441 clocks
More information about the x264-devel
mailing list