[x264-devel] Updated (and hopefully final) threaded slicetype patch

David DeHaven dave at sagetv.com
Tue Apr 7 18:43:17 CEST 2009


>>>> Here's a decent article on the Linux 2.6 scheduler:
>>>> http://www.ibm.com/developerworks/linux/library/l-scheduler/
>>>
>>> That article is 3 years old.
>>
>> Has it really changed that much? I'll admit the last time I bashed on
>> the kernel significantly was well... three years ago.
>
> I don't remember when the tickless option was introduced.  I may have
> been in the last three years.

Looks like it started with 2.6.17, which was almost three years ago  
but still after I was messing around in it. Interesting stuff.

One signal per frame isn't a high load so I can't see why the slowdown  
in performance, unless it's a flaw in the condition implementation or  
how it's being used. I'll take a closer look at it when I get a  
chance. I've used both spinlocks and conditions to synchronize buffer  
management at rates much higher than what's being used here, in some  
cases mutexes do cause excessive overhead.


It seems *marginally* faster running on a 1.66 GHz Core Duo Mac Mini  
with Mac OS X 10.5.6. I used the arguments used for the regression  
test with --interlaced and --threads 3 so it would run at 100% CPU  
load (usually shows less variance between runs that way). The clip  
(497 frames of NTSC video) has a lot of motion, sports news covering a  
football game (that's American football, not soccer :)

$ ./x264 --crf 26 -b2 -m5 -r2 --me hex -8 -w --cqm jvt --nr 100 -- 
interlaced --threads 3 ../capture-720-480-yv12.y4m -o foo.h264

Warning, this sequence might be interlaced
yuv4mpeg: 720x480 at 15712911/524288fps, 8:9
x264 [info]: using SAR=8/9
x264 [info]: using cpu capabilities: MMX2 Cache64
x264 [info]: profile High, level 3.0
x264 [info]: slice I:7     Avg QP:23.95  size: 22357  PSNR Mean Y: 
44.44 U:51.41 V:52.30 Avg:45.20 Global:37.29
x264 [info]: slice P:325   Avg QP:31.76  size: 11338  PSNR Mean Y: 
33.10 U:39.73 V:41.34 Avg:34.45 Global:33.84
x264 [info]: slice B:165   Avg QP:35.35  size:  3402  PSNR Mean Y: 
31.03 U:38.44 V:40.03 Avg:32.45 Global:32.13
x264 [info]: consecutive B-frames: 32.7% 67.3%  0.0%
x264 [info]: mb I  I16..4: 37.8% 23.9% 38.4%
x264 [info]: mb P  I16..4:  8.7% 23.2%  6.4%  P16..4: 40.9% 12.4%   
2.3%  0.0%  0.0%    skip: 6.0%
x264 [info]: mb B  I16..4:  1.1%  0.0%  0.0%  B16..8: 43.9%  3.0%   
1.3%  direct:12.3%  skip:38.4%  L0:30.5% L1:51.4% BI:18.1%
x264 [info]: 8x8 transform  intra:57.8%  inter:39.8%
x264 [info]: ref P L0  46.9% 39.0%  9.6%  4.5%
x264 [info]: ref B L0  38.3% 61.7%
x264 [info]: ref B L1  43.1% 56.9%
x264 [info]: SSIM Mean Y:0.8806004
x264 [info]: PSNR Mean Y:32.569 U:39.466 V:41.060 Avg:33.937 Global: 
33.228 kb/s:2123.89

[first three runs dumped to ensure everything was loaded into Mac OS  
X's unified buffer cache]

encoded 497 frames, 17.97 fps, 2124.58 kb/s
encoded 497 frames, 18.30 fps, 2125.28 kb/s
encoded 497 frames, 18.21 fps, 2125.78 kb/s
encoded 497 frames, 18.27 fps, 2124.47 kb/s
encoded 497 frames, 18.18 fps, 2125.64 kb/s
encoded 497 frames, 18.28 fps, 2125.84 kb/s
encoded 497 frames, 18.05 fps, 2125.38 kb/s
encoded 497 frames, 18.27 fps, 2125.23 kb/s
encoded 497 frames, 18.22 fps, 2124.91 kb/s
encoded 497 frames, 18.07 fps, 2124.27 kb/s



With pthread condition patch:

Warning, this sequence might be interlaced
yuv4mpeg: 720x480 at 15712911/524288fps, 8:9
x264 [info]: using SAR=8/9
x264 [info]: using cpu capabilities: MMX2 Cache64
x264 [info]: profile High, level 3.0
x264 [info]: slice I:7     Avg QP:23.95  size: 22357  PSNR Mean Y: 
44.44 U:51.41 V:52.30 Avg:45.20 Global:37.29
x264 [info]: slice P:325   Avg QP:31.76  size: 11342  PSNR Mean Y: 
33.09 U:39.75 V:41.34 Avg:34.45 Global:33.84
x264 [info]: slice B:165   Avg QP:35.36  size:  3413  PSNR Mean Y: 
31.02 U:38.45 V:40.02 Avg:32.45 Global:32.13
x264 [info]: consecutive B-frames: 32.7% 67.3%  0.0%
x264 [info]: mb I  I16..4: 37.8% 23.9% 38.4%
x264 [info]: mb P  I16..4:  8.7% 23.2%  6.4%  P16..4: 40.9% 12.4%   
2.3%  0.0%  0.0%    skip: 6.0%
x264 [info]: mb B  I16..4:  1.1%  0.0%  0.0%  B16..8: 44.2%  2.9%   
1.3%  direct:12.2%  skip:38.3%  L0:30.6% L1:51.2% BI:18.1%
x264 [info]: 8x8 transform  intra:57.8%  inter:39.7%
x264 [info]: ref P L0  46.8% 39.0%  9.6%  4.6%
x264 [info]: ref B L0  38.5% 61.5%
x264 [info]: ref B L1  43.0% 57.0%
x264 [info]: SSIM Mean Y:0.8805641
x264 [info]: PSNR Mean Y:32.565 U:39.480 V:41.054 Avg:33.935 Global: 
33.225 kb/s:2125.37

encoded 497 frames, 18.32 fps, 2124.85 kb/s
encoded 497 frames, 18.46 fps, 2126.22 kb/s
encoded 497 frames, 18.42 fps, 2125.54 kb/s
encoded 497 frames, 18.42 fps, 2126.20 kb/s
encoded 497 frames, 18.34 fps, 2125.46 kb/s
encoded 497 frames, 18.42 fps, 2125.11 kb/s
encoded 497 frames, 18.40 fps, 2125.22 kb/s
encoded 497 frames, 17.92 fps, 2125.40 kb/s
encoded 497 frames, 18.34 fps, 2125.67 kb/s
encoded 497 frames, 18.41 fps, 2125.75 kb/s

The low anomalies (17.9x fps) were from Apple Mail checking mail in  
the background. This wasn't an ideal setup by any means, but it would  
be representative of a normal user system.

If there's a better way to benchmark, let me know. I'll try it on the  
Linux box when I get the time.

-DrD-



More information about the x264-devel mailing list