[x264-devel] Updated (and hopefully final) threaded slicetype patch
David DeHaven
dave at sagetv.com
Tue Apr 7 18:43:17 CEST 2009
>>>> Here's a decent article on the Linux 2.6 scheduler:
>>>> http://www.ibm.com/developerworks/linux/library/l-scheduler/
>>>
>>> That article is 3 years old.
>>
>> Has it really changed that much? I'll admit the last time I bashed on
>> the kernel significantly was well... three years ago.
>
> I don't remember when the tickless option was introduced. I may have
> been in the last three years.
Looks like it started with 2.6.17, which was almost three years ago
but still after I was messing around in it. Interesting stuff.
One signal per frame isn't a high load so I can't see why the slowdown
in performance, unless it's a flaw in the condition implementation or
how it's being used. I'll take a closer look at it when I get a
chance. I've used both spinlocks and conditions to synchronize buffer
management at rates much higher than what's being used here, in some
cases mutexes do cause excessive overhead.
It seems *marginally* faster running on a 1.66 GHz Core Duo Mac Mini
with Mac OS X 10.5.6. I used the arguments used for the regression
test with --interlaced and --threads 3 so it would run at 100% CPU
load (usually shows less variance between runs that way). The clip
(497 frames of NTSC video) has a lot of motion, sports news covering a
football game (that's American football, not soccer :)
$ ./x264 --crf 26 -b2 -m5 -r2 --me hex -8 -w --cqm jvt --nr 100 --
interlaced --threads 3 ../capture-720-480-yv12.y4m -o foo.h264
Warning, this sequence might be interlaced
yuv4mpeg: 720x480 at 15712911/524288fps, 8:9
x264 [info]: using SAR=8/9
x264 [info]: using cpu capabilities: MMX2 Cache64
x264 [info]: profile High, level 3.0
x264 [info]: slice I:7 Avg QP:23.95 size: 22357 PSNR Mean Y:
44.44 U:51.41 V:52.30 Avg:45.20 Global:37.29
x264 [info]: slice P:325 Avg QP:31.76 size: 11338 PSNR Mean Y:
33.10 U:39.73 V:41.34 Avg:34.45 Global:33.84
x264 [info]: slice B:165 Avg QP:35.35 size: 3402 PSNR Mean Y:
31.03 U:38.44 V:40.03 Avg:32.45 Global:32.13
x264 [info]: consecutive B-frames: 32.7% 67.3% 0.0%
x264 [info]: mb I I16..4: 37.8% 23.9% 38.4%
x264 [info]: mb P I16..4: 8.7% 23.2% 6.4% P16..4: 40.9% 12.4%
2.3% 0.0% 0.0% skip: 6.0%
x264 [info]: mb B I16..4: 1.1% 0.0% 0.0% B16..8: 43.9% 3.0%
1.3% direct:12.3% skip:38.4% L0:30.5% L1:51.4% BI:18.1%
x264 [info]: 8x8 transform intra:57.8% inter:39.8%
x264 [info]: ref P L0 46.9% 39.0% 9.6% 4.5%
x264 [info]: ref B L0 38.3% 61.7%
x264 [info]: ref B L1 43.1% 56.9%
x264 [info]: SSIM Mean Y:0.8806004
x264 [info]: PSNR Mean Y:32.569 U:39.466 V:41.060 Avg:33.937 Global:
33.228 kb/s:2123.89
[first three runs dumped to ensure everything was loaded into Mac OS
X's unified buffer cache]
encoded 497 frames, 17.97 fps, 2124.58 kb/s
encoded 497 frames, 18.30 fps, 2125.28 kb/s
encoded 497 frames, 18.21 fps, 2125.78 kb/s
encoded 497 frames, 18.27 fps, 2124.47 kb/s
encoded 497 frames, 18.18 fps, 2125.64 kb/s
encoded 497 frames, 18.28 fps, 2125.84 kb/s
encoded 497 frames, 18.05 fps, 2125.38 kb/s
encoded 497 frames, 18.27 fps, 2125.23 kb/s
encoded 497 frames, 18.22 fps, 2124.91 kb/s
encoded 497 frames, 18.07 fps, 2124.27 kb/s
With pthread condition patch:
Warning, this sequence might be interlaced
yuv4mpeg: 720x480 at 15712911/524288fps, 8:9
x264 [info]: using SAR=8/9
x264 [info]: using cpu capabilities: MMX2 Cache64
x264 [info]: profile High, level 3.0
x264 [info]: slice I:7 Avg QP:23.95 size: 22357 PSNR Mean Y:
44.44 U:51.41 V:52.30 Avg:45.20 Global:37.29
x264 [info]: slice P:325 Avg QP:31.76 size: 11342 PSNR Mean Y:
33.09 U:39.75 V:41.34 Avg:34.45 Global:33.84
x264 [info]: slice B:165 Avg QP:35.36 size: 3413 PSNR Mean Y:
31.02 U:38.45 V:40.02 Avg:32.45 Global:32.13
x264 [info]: consecutive B-frames: 32.7% 67.3% 0.0%
x264 [info]: mb I I16..4: 37.8% 23.9% 38.4%
x264 [info]: mb P I16..4: 8.7% 23.2% 6.4% P16..4: 40.9% 12.4%
2.3% 0.0% 0.0% skip: 6.0%
x264 [info]: mb B I16..4: 1.1% 0.0% 0.0% B16..8: 44.2% 2.9%
1.3% direct:12.2% skip:38.3% L0:30.6% L1:51.2% BI:18.1%
x264 [info]: 8x8 transform intra:57.8% inter:39.7%
x264 [info]: ref P L0 46.8% 39.0% 9.6% 4.6%
x264 [info]: ref B L0 38.5% 61.5%
x264 [info]: ref B L1 43.0% 57.0%
x264 [info]: SSIM Mean Y:0.8805641
x264 [info]: PSNR Mean Y:32.565 U:39.480 V:41.054 Avg:33.935 Global:
33.225 kb/s:2125.37
encoded 497 frames, 18.32 fps, 2124.85 kb/s
encoded 497 frames, 18.46 fps, 2126.22 kb/s
encoded 497 frames, 18.42 fps, 2125.54 kb/s
encoded 497 frames, 18.42 fps, 2126.20 kb/s
encoded 497 frames, 18.34 fps, 2125.46 kb/s
encoded 497 frames, 18.42 fps, 2125.11 kb/s
encoded 497 frames, 18.40 fps, 2125.22 kb/s
encoded 497 frames, 17.92 fps, 2125.40 kb/s
encoded 497 frames, 18.34 fps, 2125.67 kb/s
encoded 497 frames, 18.41 fps, 2125.75 kb/s
The low anomalies (17.9x fps) were from Apple Mail checking mail in
the background. This wasn't an ideal setup by any means, but it would
be representative of a normal user system.
If there's a better way to benchmark, let me know. I'll try it on the
Linux box when I get the time.
-DrD-
More information about the x264-devel
mailing list