[x264-devel] multi-process encoding problem

Sun Jan 29 16:42:42 CET 2012

On Sun, Jan 29, 2012 at 8:32 AM, aviad rozenhek <aviadr1 at gmail.com> wrote:

>
>
> On Sun, Jan 29, 2012 at 13:06, Jason Garrett-Glaser <jason at x264.com>wrote:
>
>> On Sun, Jan 29, 2012 at 2:57 AM, aviad rozenhek <aviadr1 at gmail.com>
>> wrote:
>> > Dear Experts,
>> >
>> > we're using x264 CLI to transcode 1080p video to 240p, using default
>> setting
>> > [--vf resize:432,240]
>> > we noticed that the transcode was slow, achieving only ~70fps on an
>> 2-cpu,
>> > 8-core E5520 "Gainestown" machine, while utilizing only a very small
>> portion
>> > of the available CPU.
>> > my analysis led me to think that the bottleneck is the down-scaling
>> stage.
>>
>> Transcode means to decode a video, then encode it.
>>
>> Decoding 1080p video is vastly more processor-intensive than encoding
>> 432x240 video.
>>
>> You're probably going to be massively bottlenecked by the decoding
>> step, which x264 doesn't have anything to do with.
>>
>> Jason
>>
>
> very true.
> that's exactly why we are trying to solve the issue by running multiple
> instances of the x264 process, each working interdependently on a separate
> fragments of the source video. by doing a 5X multi-process encode, we are
> effectively decoding and down-scaling with at least 5 threads, thus
> bypassing the decode/downscale bottleneck.
>
> however, as I mentioned, we ran into issues with concatenation of the
> files, and the issue appears not to be related to timestamps.
> is there anything else [such as some magic flags in the first keyframe]
> that can explain the "jitter" or "lag" that we experience when playing the
> sewn-together output file?
>
>
Splitting an encode into segments with vbv constraints is a bad idea.
each segment of the file will be compliant within itself, but there is no
guarantee that the concatenation of all the streams will still be compliant.

with these settings x264 is told to assume that the vbv buffer is 90% full
(see the default value for --vbv-init) on each segment.
there's no guarantee that any of the segments will end with this occupancy
which will then be likely to cause under or overflows upon their
concatenation with the next segment.

I would instead suggest going back to using a single instance, but apply
the demuxer thread patch so you can tell the demuxer that is decoding your
original .mpg file to multithread the decode.
This would alleviate the decode bottleneck problem, though it would not
alleviate a downscale bottleneck...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x264-devel/attachments/20120129/afc403f3/attachment.html>