[x264-devel] implementing Cluster farming?
Antoine Gerschenfeld
gerschen at gmail.com
Wed Mar 18 09:59:45 CET 2009
Hello,
I've been using a homegrown, Makefile-based (!) implementation of x264
cluster-farming for about
a year now across ~70 GHz of Core2 CPUs, with linear scaling (which,
in my case, means 2-3x realtime
720p with high-end settings). While the details of this
implementation (which, at this point, only runs
on OS X) are not really relevant, I think I've encountered a few
issues that apply to the broader case
of any x264 cluster-based implementation (which, ideally, should be
MPI-based for portability reasons).
1) When doing a CRF run, you have to identify "good" scenecut points
(that would be IDRs in
an equivalent serial x264 run), ideally before you start doing any
real encoding. Personally, I've found
that having to discard the beginning of a slow-running encoding job
(because it was started at an arbitrary
boundary instead of a scene cut) rather costly, so I've opted for a
very fast "0-pass", where I start an instance
of "x264 -m 0 --me dia -r 1" every 5000 frames and look for scene cuts
in order to know where to start the
real encoding jobs. This ends up being faster, even though you have to
decode the video twice.
2) For a second-pass run (using one of the above CRF runs as first
pass), the split-at-scenecuts problem
is already taken care of, but you have to maintain the global rate
control state across multiple jobs
(x264farm does this by doing its own 2-pass RC ; in an hypothetical
x264-mpi, copying some
x264_ratecontrol_t 's should be enough).
3) In any case, I've found it useful to have a frame-accurate decoder
compiled within x264 (as opposed to,
say, a pipe from avs2yuv), especially when >100 MB/s of YV12 data has
to be passed around. As an
alternative to reading everything from raw YUV, I hacked together a
decoder in muxers.c based on
FFMPEGSource (a library developed by Myrsloik which gives libavcodec
frame-accurate seeking on
most sources). This tends to be instrumental in avoiding bottlenecks
(from disk reads or on the network).
Personally, I've found found the performance/reliability (of my
cobbled-together implementation) good enough
that I'm considering buying multiple half-1U Q8200s (at 400€ a pop)
in a rack instead of a Nehalem machine...
Best regards,
Antoine Gerschenfeld
More information about the x264-devel
mailing list