[x264-devel] SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9)
Loren Merritt
git at videolan.org
Wed Sep 21 21:34:41 CEST 2011
x264 | branch: master | Loren Merritt <pengvado at akuvian.org> | Mon Aug 15 18:18:55 2011 +0000| [b7fa2ff50ef74eb8a27e675f8e418754965115e2] | committer: Jason Garrett-Glaser
SSSE3/SSE4 9-way fully merged i4x4 analysis (sad/satd_x9)
i4x4 analysis cycles (per partition):
penryn sandybridge
184-> 75 157-> 54 preset=superfast (sad)
281->165 225->124 preset=faster (satd with early termination)
332->165 263->124 preset=medium
379->165 297->124 preset=slower (satd without early termination)
This is the first code in x264 that intentionally produces different behavior
on different cpus: satd_x9 is implemented only on ssse3+ and checks all intra
directions, whereas the old code (on fast presets) may early terminate after
checking only some of them. There is no systematic difference on slow presets,
though they still occasionally disagree about tiebreaks.
For ease of debugging, add an option "--cpu-independent" to disable satd_x9
and any analogous future code.
> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=b7fa2ff50ef74eb8a27e675f8e418754965115e2
---
common/common.c | 2 +
common/osdep.h | 1 +
common/pixel.c | 12 ++
common/pixel.h | 26 ++--
common/x86/const-a.asm | 1 +
common/x86/pixel-a.asm | 455 +++++++++++++++++++++++++++++++++++++++++++++++-
common/x86/pixel.h | 6 +
encoder/analyse.c | 83 ++++++----
encoder/encoder.c | 2 +
tools/checkasm.c | 61 ++++++-
x264.c | 3 +
x264.h | 3 +-
12 files changed, 591 insertions(+), 64 deletions(-)
Diff: http://git.videolan.org/gitweb.cgi/x264.git/?a=commitdiff;h=b7fa2ff50ef74eb8a27e675f8e418754965115e2
More information about the x264-devel
mailing list