[x264-devel] Re: Scalability

Christian Bienia cbienia at CS.Princeton.EDU
Tue Feb 27 21:50:32 CET 2007


Hi,

> Just that x264 uses some statically allocated arrays of size 16.
> There's no particular reason it couldn't support an arbitrarily large
> number of threads, but it was slightly easier this way.
> 
> For testing, change the #define X264_THREAD_MAX 16 (in common.h) to some
> bigger number. Then I'll see about dynamic allocation.

Ok, I've increased the limit and submitted a second run.


> BTW, using 8 threads on a lots-of-cpus system should not be confused with 
> using 8 cpus. You need at least 12 threads to get optimal scaling on 8 
> cpus. (Probably the most prominent reason is that not all frames take the 
> same amount of time to encode.)

How can differences in encoding time cause lower scalability? Doesn't a
thread grab an new frame as soon as it is done? So as long as no other
limitation (such as the you described below for high numbers of threads)
causes a delay, the work distribution mechanism should keep all threads
busy all the time.


> > Yes. X264 encodes several frames in parallel. That means a frame is encoded 
> > while its reference(s) aren't yet completely encoded. That works because x264 
> > checks for the current frame that the motion vector doesn't go in the area of 
> > the reference frame that hasn't been encoded yet.
> >
> > And that is done by waiting that at least a few row(s) of macroblock in the 
> > reference frame is encoded before starting a new frame. The consequence is 
> > that if your video has only 36 rows ( 576p ), you can only have at most 36 
> > concurrent threads ( and actually half less, since you must at least wait 2 
> > rows ).
> >
> > So 576p won't benefit from more than 18 threads.
> 
> The minimum spacing between threads is 3 rows. 1 row for the macroblock
> in progress, 6 pixels for deblock+mc filters, and then something for
> motion vectors. 10 pixels would be a rather short mv limit, so x264 uses
> at least 26.
> 
> The check for waiting is performed pairwise between reference frames.
> If frame 2 doesn't actually depend on frame 1, it won't wait for the
> thread encoding frame 1. Consequences:
> Using B-frames increases the max scalability, so 576p with 1 consecutive
> B-frame could use 24 threads.
> It will even encode multiple GOPs in parallel if the number of threads
> is greater than the GOP length. The I-frame doesn't depend on anything,
> so it starts encoding as soon as it's assigned a thread. (Though you'd
> have to remove the code which enforces the aforementioned limit. I don't
> remember why I added it in the first place, it should be safe to just
> remove. This might be what tripped up Alex, but anyway it does work for 
> me.)

How difficult would it be to use a finer granularity? Could multiple
threads encode a single frame (without slices), or are there any data
dependencies?

- Chris

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list