[x264-devel] Re: Scalability
Loren Merritt
lorenm at u.washington.edu
Tue Feb 27 21:14:56 CET 2007
On Tue, 27 Feb 2007, Mathieu Monnier wrote:
> On Tue, 27 Feb 2007, Christian Bienia wrote:
>
>> #Threads 1 2 4 8 16 32 64
>> Time/[s] 2280 1265 662 345 200 202 201
>> Speedup 1 1.8 3.45 6.61 11.39 11.3 11.35
>>
>> The new threading code is indeed better than the old one, but x264 still
>> has problems scaling beyond 8 CPUs. Are there any known reasons for
>> that?
Just that x264 uses some statically allocated arrays of size 16.
There's no particular reason it couldn't support an arbitrarily large
number of threads, but it was slightly easier this way.
For testing, change the #define X264_THREAD_MAX 16 (in common.h) to some
bigger number. Then I'll see about dynamic allocation.
BTW, using 8 threads on a lots-of-cpus system should not be confused with
using 8 cpus. You need at least 12 threads to get optimal scaling on 8
cpus. (Probably the most prominent reason is that not all frames take the
same amount of time to encode.)
> Yes. X264 encodes several frames in parallel. That means a frame is encoded
> while its reference(s) aren't yet completely encoded. That works because x264
> checks for the current frame that the motion vector doesn't go in the area of
> the reference frame that hasn't been encoded yet.
>
> And that is done by waiting that at least a few row(s) of macroblock in the
> reference frame is encoded before starting a new frame. The consequence is
> that if your video has only 36 rows ( 576p ), you can only have at most 36
> concurrent threads ( and actually half less, since you must at least wait 2
> rows ).
>
> So 576p won't benefit from more than 18 threads.
The minimum spacing between threads is 3 rows. 1 row for the macroblock
in progress, 6 pixels for deblock+mc filters, and then something for
motion vectors. 10 pixels would be a rather short mv limit, so x264 uses
at least 26.
The check for waiting is performed pairwise between reference frames.
If frame 2 doesn't actually depend on frame 1, it won't wait for the
thread encoding frame 1. Consequences:
Using B-frames increases the max scalability, so 576p with 1 consecutive
B-frame could use 24 threads.
It will even encode multiple GOPs in parallel if the number of threads
is greater than the GOP length. The I-frame doesn't depend on anything,
so it starts encoding as soon as it's assigned a thread. (Though you'd
have to remove the code which enforces the aforementioned limit. I don't
remember why I added it in the first place, it should be safe to just
remove. This might be what tripped up Alex, but anyway it does work for
me.)
--Loren Merritt
--
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html
More information about the x264-devel
mailing list