[x264-devel] multi-process encoding problem

Mon Jan 30 15:02:03 CET 2012

On Sun, Jan 29, 2012 at 17:42, Steven Walters <kemuri9 at gmail.com> wrote:

> On Sun, Jan 29, 2012 at 8:32 AM, aviad rozenhek <aviadr1 at gmail.com> wrote:
>
>>
>>
>> On Sun, Jan 29, 2012 at 13:06, Jason Garrett-Glaser <jason at x264.com>wrote:
>>
>>> On Sun, Jan 29, 2012 at 2:57 AM, aviad rozenhek <aviadr1 at gmail.com>
>>> wrote:
>>> > Dear Experts,
>>> >
>>> > we're using x264 CLI to transcode 1080p video to 240p, using default
>>> setting
>>> > [--vf resize:432,240]
>>> > we noticed that the transcode was slow, achieving only ~70fps on an
>>> 2-cpu,
>>> > 8-core E5520 "Gainestown" machine, while utilizing only a very small
>>> portion
>>> > of the available CPU.
>>> > my analysis led me to think that the bottleneck is the down-scaling
>>> stage.
>>>
>>> Transcode means to decode a video, then encode it.
>>>
>>> Decoding 1080p video is vastly more processor-intensive than encoding
>>> 432x240 video.
>>>
>>> You're probably going to be massively bottlenecked by the decoding
>>> step, which x264 doesn't have anything to do with.
>>>
>>> Jason
>>>
>>
>> very true.
>> that's exactly why we are trying to solve the issue by running multiple
>> instances of the x264 process, each working interdependently on a separate
>> fragments of the source video. by doing a 5X multi-process encode, we
>> are effectively decoding and down-scaling with at least 5 threads, thus
>> bypassing the decode/downscale bottleneck.
>>
>> however, as I mentioned, we ran into issues with concatenation of the
>> files, and the issue appears not to be related to timestamps.
>> is there anything else [such as some magic flags in the first keyframe]
>> that can explain the "jitter" or "lag" that we experience when playing the
>> sewn-together output file?
>>
>>
> Splitting an encode into segments with vbv constraints is a bad idea.
> each segment of the file will be compliant within itself, but there is no
> guarantee that the concatenation of all the streams will still be compliant.
>
> with these settings x264 is told to assume that the vbv buffer is 90% full
> (see the default value for --vbv-init) on each segment.
> there's no guarantee that any of the segments will end with this occupancy
> which will then be likely to cause under or overflows upon their
> concatenation with the next segment.
>
> I would instead suggest going back to using a single instance, but apply
> the demuxer thread patch so you can tell the demuxer that is decoding your
> original .mpg file to multithread the decode.
> This would alleviate the decode bottleneck problem, though it would not
> alleviate a downscale bottleneck...
>
>

Hi,
thanks for your input.

we re-transcoded the video without using vbv constraints, but still got the
same hiccup in playback :-(
aside from VBV constraints, are you aware of any additional conceptual
hurdles to doing fragmented, multi-process encodings?

Currently, The single instance method simply isnt fast enough for us.
I am not aware of any demuxer thread patch floating around, I thought
decoding was threaded by default.
I am using builds from http://x264.nl/

I think I heard somewhere that youtube are also doing fragmented encoding
of video in order to massively speed up transcoding of a single title over
multiple CPUs and indeed machines. multi-threaded encoding is great, but
sometimes you need the additional horsepower of multi-processing.

I have attached a link to very small [3Mb] zip containing input sample,
transcoded output sample and log file.
http://www.datafilehost.com/download-c04d7c3f.html

Appreciate your help,
A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x264-devel/attachments/20120130/32a4baaa/attachment.html>