[x264-devel] x264 Development Newsletter: Vol. 28

Jason Garrett-Glaser jason at x264.com
Wed Mar 7 03:19:49 CET 2012


This is the twenty-eighth x264 development newsletter. This is a
regular email containing updates on fixes and improvements in the most
recent x264 push, along with updates on what's coming next.  Previous
versions can be found in the mailing list archives.

Fixes:

Fix a register preservation issue in r2141.  Didn't appear to cause
any actual problems.

Fix interlaced with extremal small slice-max-sizes (i.e. that result
in single-supermb slices).

Fix RGB input (BGR/BGRA worked correctly, which were what most people
were using).

Add error handling for the out-of-tree build support added last time,
and fix ICL compilation with out of tree building.

Fix a rare overflow in x86 10-bit asm for intra_satd_x3_16x16.  Caused
slightly incorrect analysis.

Fix a possible stack-alignment crash; x264_cavlc_init now needs to be
stack-aligned.

Fix incorrect zero-extension assumptions in 64-bit by converting
strides and such to intptr_t.  This should slightly improve
performance and fix compilation with Clang.  Note that this change is
NOT IN "STABLE", because -- while it is a bugfix -- it touches so much
code that I don't want to drop it into stable yet.

Improvements:

Support yasm -f win64 in x86in.asm; not necessary for x264, but useful
for other applications using x86inc.asm that prefer doing that.

Simplify coeff_level_run; it doesn't need to store run lengths any
more, so drop that feature.

Add a small per-MB cost penalty for lowres VBV analysis -- helps to
avoid predictors going nuts on very static/simple frames.

Add row re-encoding support to VBV for improved accuracy.  Without
sliced-threads and slice-max-size/slice-max-mbs, VBV is now incredibly
accurate, possibly 100% so.  With either of those, it's not perfect,
but better than before.  Has a small speed penalty, but should improve
quality in any case with difficult VBV settings.  This is mainly
intended for extremely difficult VBVs, e.g. single-frame.

Use tzcnt (BMI1) instead of bsf where applicable in x86 asm; it's
backwards compatible, because "tzcnt" is actually coded as "rep bsf".

x86inc: switch to using amdnops.  "intelnop" allows extremely large
numbers of prefixes on "nop" instructions, which cause some AMD CPUs
to choke horribly.

Add full-recon API option, so that calling applications can ask for
the fully reconstructed frame (e.g. with deblocking, even if
deblocking isn't necessary for the encoding process, as in
unreferenced B-frames) without using dump-yuv.

Sliced-threads: do hpel and deblock after returning from the main
encode call.  Dramatically lowers latency (~14% in my tests at preset
superfast).  Improves performance even if the next frame is encoded
immediately after, because the hpel/deblock threads from the previous
frame can continue running while the next frame's lookahead
(singlethreaded) runs.

Upcoming:

Google Code-In is done, but a bunch of NEON assembly still needs review.

x262 is under development: a best-in-class MPEG-2 encoder built using
the x264 framework.  It works well enough to be vaguely usable now,
but is still highly experimental and needs more work -- developers
welcome!

Jason Garrett-Glaser

The x264 Team


More information about the x264-devel mailing list