[x264-devel] Re: CABAC performance

Tue May 29 23:26:12 CEST 2007

On Tue, 29 May 2007, Son Minh Tran wrote:
> Loren Merritt a écrit :
>
>> Encoding CABAC takes the same amount of time as decoding CABAC. But the 
>> encoder does lots of other computations, most of which are analysis only 
>> and not duplicated in the decoder. So CABAC is a large fraction of the 
>> decode time, and a small fraction of the encode time.
>> 
>> Where did you hear "16% bitrate and 25-30% speed"?
>
> I take these data from the article "Video coding with h.264/AVC: tools, 
> performance and complexity" written by Jorn Ostermann, Jan Bormans, Peter 
> List, Detlev MArpe, Matthias Narroschke, Fernando  Pereira, Thomas 
> Stockhammer and Thomas Wedi 
> (http://iphome.hhi.de/marpe/download/h264_casm04.pdf) See on page 23.

That doesn't really explain with what settings they measured it, 
furthermore they measured memory accesses not cpu cycles. My only guess is 
that they included the CABAC encodes from rate-distortion optimization. JM 
fully CABAC encodes each RD candidate. But for the purpose of RDO an 
encoder only cares how many bits the candidate would take, not what the 
values of those bits are, so during RDO x264 just computes CAB and 
skips the AC part. Still not 30%, I measured at most 8% speedup from that 
patch (r330).

> Anyway on any tutorial of CABAC you can see the average bit-rate saving is of 
> 9-14%. Cabac is an arithmetic coder, which is time-consuming. Maybe the 
> replacement of multiplication with pre-calculated table does reduce this 
> load?

Nope, replacing multiplication with a table is actually a pessimization.
At least on a Core2 or a K8, multiplication is the same speed as L1 cache 
access, but h264 munges the data a bit to reduce table size, so the 
munging takes another couple instructions.
Also, multiply allows you to postpone renormalization until you 
read/write a byte to the stream, while the table method requires you to 
renormalize after every decision.
The proposal that introduced the current algorithm (JVT-C061) benchmarked 
it on a P3, but they included several changes (not just the table), and 
didn't compare it to postponed renormalization.

For an alternative to CABAC with the above optimizations, see 
libavcodec/rangecoder.h

> I have a cross question right now. You said that the max value of option -r 
> for x264 is 16. But actually in the standard , it can go up to 31 (the max 
> value of *num_ref_idx_l0_active_minus1 and **num_ref_idx_l1_active_minus1*)?

"-r16" uses 16 reference frames.
"-r16 --interlaced" uses 32 reference fields.

--Loren Merritt