[x264-devel] Re: Scalability

Loren Merritt lorenm at u.washington.edu
Tue Feb 27 21:14:56 CET 2007


On Tue, 27 Feb 2007, Mathieu Monnier wrote:
> On Tue, 27 Feb 2007, Christian Bienia wrote:
>
>> #Threads      1     2     4     8    16   32    64
>>  Time/[s]  2280  1265   662   345   200  202   201
>> Speedup       1   1.8  3.45  6.61 11.39 11.3 11.35
>>
>> The new threading code is indeed better than the old one, but x264 still
>> has problems scaling beyond 8 CPUs. Are there any known reasons for
>> that?

Just that x264 uses some statically allocated arrays of size 16.
There's no particular reason it couldn't support an arbitrarily large
number of threads, but it was slightly easier this way.

For testing, change the #define X264_THREAD_MAX 16 (in common.h) to some
bigger number. Then I'll see about dynamic allocation.

BTW, using 8 threads on a lots-of-cpus system should not be confused with 
using 8 cpus. You need at least 12 threads to get optimal scaling on 8 
cpus. (Probably the most prominent reason is that not all frames take the 
same amount of time to encode.)

> Yes. X264 encodes several frames in parallel. That means a frame is encoded 
> while its reference(s) aren't yet completely encoded. That works because x264 
> checks for the current frame that the motion vector doesn't go in the area of 
> the reference frame that hasn't been encoded yet.
>
> And that is done by waiting that at least a few row(s) of macroblock in the 
> reference frame is encoded before starting a new frame. The consequence is 
> that if your video has only 36 rows ( 576p ), you can only have at most 36 
> concurrent threads ( and actually half less, since you must at least wait 2 
> rows ).
>
> So 576p won't benefit from more than 18 threads.

The minimum spacing between threads is 3 rows. 1 row for the macroblock
in progress, 6 pixels for deblock+mc filters, and then something for
motion vectors. 10 pixels would be a rather short mv limit, so x264 uses
at least 26.

The check for waiting is performed pairwise between reference frames.
If frame 2 doesn't actually depend on frame 1, it won't wait for the
thread encoding frame 1. Consequences:
Using B-frames increases the max scalability, so 576p with 1 consecutive
B-frame could use 24 threads.
It will even encode multiple GOPs in parallel if the number of threads
is greater than the GOP length. The I-frame doesn't depend on anything,
so it starts encoding as soon as it's assigned a thread. (Though you'd
have to remove the code which enforces the aforementioned limit. I don't
remember why I added it in the first place, it should be safe to just
remove. This might be what tripped up Alex, but anyway it does work for 
me.)

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list