[x264-devel] [PATCH] Added support for CABAC zero bytes insertion

Wed Apr 17 23:22:10 CEST 2019

On Fri, 12 Apr 2019, Jay N. Shingala wrote:

> Dear x264 developers,
>
> This query is regarding the patch submitted (quite a while ago) on CABAC zero word insertion which is a requirement for bit-stream conformance.
>
> For reference, here is an excerpt of section 7.4.2.10 of AVC/H.264 specification describing the need for zero word insertion when the CABAC bin count to bits count ratio is higher than constrained limits.
>
> "cabac_zero_word is a byte-aligned sequence of two bytes equal to 0x0000.
>
> Let NumBytesInVclNALunits be the sum of the values of NumBytesInNALunit for all VCL NAL units of a coded picture
>
> Let  BinCountsInNALunits  be  the  number  of  times  that  the  parsing  process  function  DecodeBin( ),  specified  in
> clause 9.3.3.2,  is  invoked  to  decode  the  contents  of  all  VCL  NAL  units  of  a  coded  picture.   When
> entropy_coding_mode_flag is equal to 1,  it is a requirement of bitstream conformance that  BinCountsInNALunits shall
> not exceed ( 32 ÷ 3 ) * NumBytesInVclNALunits + ( RawMbBits * PicSizeInMbs ) ÷ 32.
>
> NOTE - The constraint on the maximum number of bins resulting from decoding the contents of the slice layer NAL units can be
> met  by  inserting  a  number  of  cabac_zero_word  syntax  elements  to  increase  the  value  of  NumBytesInVclNALunits.  Each
> cabac_zero_word is represented in a NAL unit by the three-byte sequence 0x000003 (as a result of the constraints on NAL unit
> contents that result in requiring inclusion of an emulation_prevention_three_byte for each cabac_zero_word)."
>
> This patch will be useful for strict bit stream conformance in x264.
> It is important to note that the overall performance impact was negligible as the latency cycle of "bin_cnt" incrementing in cabac_encode_decision() and cabac_encode_bypass() is well hidden.
>
> Request you to please provide comments on the conformance requirement and the suitability of this patch in x264.

I can sorta explain why it exists in the standard, and why I always
ignored that clause of the standard.

A Level is a bundle of "if you want to decode worst-case examples of
streams with this Level, your decoder had better have resources XYZ". One
of the resources that is needed, is the throughput of the cabac decoder.

For some reason I never saw explained, the standard didn't include cabac
directly as a Level parameter, instead it uses bitrate and resolution as a
proxy. Which is usually fine; bitrate is a pretty good proxy for number of
cabac decisions.

But for some unusual streams, that get an unusually large amount of
compression benefit from cabac, then that assumed relation can fail. (I
don't know off-hand how rare this is.) And if such a condition is combined
with large absolute bitrate, then the decoder can be left processing
slightly more cabac decisions than its Level promised it to be capable of.

The standard's perverse solution for this edge case is: Tell the encoder
to stop compressing so well, so that the padded bitrate resumes being a
good proxy for the number of cabac decisions. And it doesn't tell you to
just virtually pad it for the purpose of checking Level compliance, it
tells you to actually make the actual compression worse.

All of that could have been an off-by-default option for pedantic
standard-correctness that someone could enable if they ever found a
use-case where it matters. But implementing it requires an extra
instruction in the cabac inner loop, that you don't get to skip just
because you turn off the padding. Which is only a tiny speed cost, but is
a cost I decided not to pay for a feature whose "benefit" is "occasionally
make the compression worse".

--Loren Merritt