[x264-devel] Spec-incompliant rbsp_alignemnt_zero_bit

Fri Nov 2 16:43:00 CET 2018

Hello,

I want to raise a topic that has already been discussed before: In line
174 of common/cabac.c x264 adds a sort of watermark into the RBSP
trailing bits. Earlier discussions of this topic happened in November
2008 and March 2009 on this mailing list and also on doom9:
https://forum.doom9.org/showthread.php?t=168298
In both cases it has been claimed that it is syntactically impossible
for rbsp_alignment_zero_bit to be anything other than zero:
On Tue Nov 11 19:33:41 CET 2008 Loren Merritt wrote:
> Then you're misparsing where the slice ends. rbsp_alignment_zero_bit
> is not merely required to be 0, it's syntactically impossible to make
> it any other value. The last 1 and any 0s thereafter are the
> rbsp_trailing_bits, and all data before that is part of the payload.
> The whole point of rbsp_trailing is so that you can know (to single-
> bit precision) where the payload ends. If there were another way to
> know which bit is the last bit, the standard wouldn't have included
> rbsp_trailing.
This is wrong. There is another way to know which bit is the last bit.
And nevertheless rbsp_trailing_bits is not superfluous (more on this later).
The decoding process described in the standard begins with NAL units
with known length nal_unit( NumBytesInNALunit ). Then the rbsp_bytes are
extracted from this and then this is parsed according to the parsing
process specified in the standard for the nal_unit_type indicated in the
first byte of the NAL. In case of normal slices this is a
slice_layer_without_partitioning_rbsp( ), whose syntax reads as follows:

slice_layer_without_partitioning_rbsp( ) {
    slice_header( )
    slice_data( )
    rbsp_slice_trailing_bits( )
}

This syntax structure can simply be parsed from the beginning to the
end. In case of CABAC, the end of the slice_data( ) is explicitly coded
via an end_of_slice_flag. All the bits in the RBSP after the
end_of_slice_flag indicating that the slice_data( ) structure has come
to an end therefore must be the rbsp_slice_trailing_bits( ), i.e. for a
conformant bitstream, they MUST adhere to the syntax and semantics of
rbsp_slice_trailing_bits( ). Said syntax essentially means that the next
bit must be 1 (it is the rbsp_stop_one_bit), then there must be zero
bits (rbsp_alignment_zero_bit) until the position is byte-aligned (if it
wasn't already byte-aligned) and in CABAC-mode there may be an even
number of 0x00 bytes afterwards. That's it. Setting one of the
rbsp_alignment_zero_bit to 1 doesn't make it the new rbsp_stop_one_bit
(because that is by the syntax the first bit after the slice_data( )
structure), but just makes the bitstream spec-incompliant. The analyzer
complaining in the linked doom9 thread was completely right to point out
that this is out-of-spec.

And now to the question: why did one add the rbsp_trailing_bits( ) (and
its slice counterpart) at all when one can find out the end of the SODB
by parsing the data from beginning to end?
I cannot look into the mind of the standard creators, but I see several
reasons for this:
1. By using this technique, they can use a more_rbsp_data( ) function in
the syntax. In other words, given the way the syntax currently is, the
rbsp_trailing_bits( ) structures are necessary for correct parsing. In
contrast, MPEG-2 makes it differently and therefore it has to rely on
flags to indicate the actual end of its structures (by actual end I mean
that the next_start_code() and the unnecessary zero bytes before it are
not part of the actual structure in question). This is done in the
picture_header(), where the extra_bit_picture flags are used to indicate
the end.
2. They probably already had an eye towards their annex B format. (This
is also the only reason why they included the
emulation_prevention_three_byte in the main process and not only in the
annex B format; after all, those emulation_prevention_three_byte are
only useful if the NAL unit is framed by annex B start codes; they are
useless when they are framed with their length prefixed (as mp4 and
Matroska do it.) It allows to include zero bytes between the NAL units
in annex B format and makes it simple to nevertheless determine the real
end of the important data. The reason for including a mechanism to
include padding of arbitrary byte-length might be that one can not
encode a single byte padding at the transport stream layer of a
transport stream. (The adaption field usually used for such things is at
least two bytes long in itself.) Other container might have similar
restrictions, so they included a simple mechanism to fill the packets.

By the way: The more_rbsp_data( ) mentioned above includes the following
in its definition in 7.2:
"Otherwise, the RBSP data is searched for the last (least significant,
right-most) bit equal to 1 that is present in the RBSP. Given the
position of this bit, which is the first bit (rbsp_stop_one_bit) of the
rbsp_trailing_bits( ) syntax structure, the following applies:"
This does not imply that the last bit set to 1 is by definition the
rbsp_stop_one_bit. Remember that the standard only defines the syntax of
a spec-compliant bitstream and for them the above statement is true (an
easy corollary of the definition, but not the definition). This
statement does not mean that one is dispensed from following the syntax
described in 7.3 to the letter.

Greetings
Andreas Rheinhardt

PS: A big thanks to Loren Merritt: I just notice that he is active in
x264 development for more than ten years! Lots of people probably just
come and go (thanks to them, too, of course), but he stayed.