[RFC] New architecture for a video server

Fri Aug 2 00:54:54 CEST 2002

Dear friends,

The needs of streaming technologies are evolving very fast, and we've 
seen many requests that we're unable to fulfill with the current VLC 
and VLS architectures. The range of wanted functionalities include 
MPEG-2 to MPEG-4 transcoding, MPEG-4 streaming, VOD, RTP, RTSP. The 
purpose of this mail is to make propositions to move towards a 
greater modularity, allowing us to support more input and output 
formats.

VLS streaming architecture is that of a streamer, not of an encoder. 
That is, it only deals with the MPEG system layer, without touching 
the PES's. This architecture, though quite adequate for broadcasting 
MPEG-2 TS, proves inadequate for the recent protocols we want to 
support.

Indeed, many of the features requested (transcoding, MPEG-4 FlexMux) 
compell us to go down to the Elementary Stream level, and have a 
per-ES processing. In addition, working on the ES's enables us to fix 
broken streams on-the-fly (such as the common MPEG-1 sequence header 
problem), and even to directly stream ES files. Consequently, I want 
to push for a new server solution, based on a different architecture, 
closer to an encoder's.

The basic idea is to demultiplex, process each ES separately, 
remultiplex, and send.

+------+    +-------+    +---------------+    +-----+   +--------+
| read | -> | demux | -> | ES processing | -> | mux |-> | output |
+------+    +-------+    +---------------+    +-----+   +--------+

Since there is a lot of work to do, we should reuse code from 
existing projects, as much as possible. Libvlc's input module 
understands many more formats than VLS's, so I propose that we plug 
the new server after the input thread of VLC, and take advantage of 
the recent --codec switch (a multiplexer then being a special codec 
type). In addition, having it inside libvlc allows us to use it from 
other applications, in particular I'm thinking of an Apache module. 
The multiplexing and output part can probably be based on VLS's 
codebase.

This approach has one drawback : when reading a TS file, the TS 
stream that is sent only vaguely resembles to the packets given. That 
is, everything is demultiplexed and remultiplexed, so all existing 
SIs are lost. For such stream, VLS or VLMS still have an advantage.

Similarly, VLC is currently unable to decode several programs within 
the same input. That is we couldn't stream more than one program from 
a satellite input. Again, VLS will still be useful for that kind of 
situations, until VLC's input is extended to support decoding 
multiple programs.

The following chapters describes how the new server architecture 
would be integrated into libvlc. Note that this document doesn't give 
any hook for "scheduling" streams, that is start and stop streams at 
some precise time. IMHO, it is the job of an external program to 
start and stop servers, and is thus out of the scope of this document.

1. Architecture
    ============
1.1 Packetizer

The general idea is to hijack the decoders of VLC with '--codec 
packetizer,none'. This will prevent the spawning of video & audio 
output, and of video & audio decoders, and launch our own threads, 
which will take the ES and packetize them in ES packets.

For each Elementary Stream :
+------------------+     +------------+     +------------------+
| [type] bitstream | --> | ES parsing | --> | ES Packetization |
+------------------+     +------------+     | & CR calculation |
                                             +------------------+

<------------------>     <----------------------------------...>
     input thread      |             packetizer thread
                [decoder_fifo_t]

The packetizer thread is specific to the type of ES we're dealing 
with. We will need a packetizer "decoder" (read : plug-in) for :
  - MPEG-1/2 video
  - MPEG-1/2 audio
  - A/52
  - SPU
  - MPEG-4 video & audio (FlexMux)

We can also have special packetizer plug-ins which would act as 
transcoders (MPEG-2 to MPEG-4 for instance), or even encoders (raw 
YUV -> MPEG).

1.2 Multiplexor and output

The user will also pass '--sout udp/ts:@239.239.0.1:1234' to 
configure a stream output instance, which will run in one of the 
packetizer threads (à la aout3). The instance will load an access 
plug-in (for physical access to the file descriptor), and a TS 
multiplexor (in the future we may choose to support other streaming 
formats).

This last plug-in will be responsible for merging the "packetized" 
elementary streams (it's not PES yet ; the PES header hasn't been 
written) coming from the packetizer "decoders", and add PCR and SI.

                  PCR  SI (libdvbpsi)
                   |   |
                   v   v
+----------+     +-----+
| Video ES | --> |     |
+----------+     |     |     +-------------------+     +----------+
                  | TS  | --> | Raw packetization | --> | Physical |
+----------+     | MUX |     |  (including RTP)  |     |  layer   |
| Audio ES | --> |     |     +-------------------+     +----------+
+----------+     +-----+
                     ^
                     |
           optional stuffing (CBR)

<...------->     <----->     <------------------------------------>
  packetizer       tsmux                 access plug-in

                  <------------------------------->     <---------->
              running in one of the packetizer threads      optional
                                                         sout thread

We will have the following access plug-ins :
  - udp (raw packets over UDP)
  - rtp
  - file (can be used for PS to TS converter)

A drawback of this architecture : we have a lot of threads. 1 thread 
for the input, 1 thread for the interface, 1 thread per ES, and 
optionally one thread for the stream output. However, on current 
machines VLS and VLMS only take up a margin of the CPU power, so that 
we can afford a little more CPU time.

2. The packetizer plug-in
    ======================
This plug-in takes Elementary Stream data (bitstream) from the input, 
and builds ES packets with it. It scans the bitstream for logical 
structures, to construct convenient packet boundaries.

For instance, the MPEG video packetizer plug-in will scan for 
PICTURE_HEADER startcodes, and put one picture per packet. The packet 
will be tagged with a CR date. This date defines the instant when the 
last "raw" packet (read : TS) coming from this PES must be sent onto 
the network, at worst. That way we ensure that the decoder has enough 
time to decode the frame. Please note that this isn't the PTS (though 
it is calculated from the PTS), since in MPEG the presentation order 
isn't the decoding order (remember ?).

It is also in this step that the sequence header will be repeated, if 
necessary.

3. The global TS multiplexer
    =========================
This plug-in doesn't need to run in its own thread, and will 
periodically be run in a packetizer thread (when FIFOs are big 
enough). It takes ES packets from the incoming FIFOs, constructs the 
PES headers, splits the PES in TS's, and assigns them an emission 
date, which will be less than or equal to the date tagged in the ES 
packet by the packetizer.

It will periodically insert PAT/PMT packets (coming from libdvbpsi, 
constituted with info gathered by the packetizer from the input), and 
PCR packets (derived from the system clock and the emission dates).

Finally, if the user requested the stream to be constant bitrate 
(CBR), the TS multiplexor will add TS packets for stuffing. The 
resulting TS packets are placed in a FIFO, in the right order, with 
emission dates.

4. The access plug-in
    ==================
This plug-in runs in the same thread as the TS multiplexor, and takes 
packets from the latter FIFO to write to the physical medium. In case 
of a file output, the packets are just written one by one, and the 
emission dates are not taken into account.

In case of a raw UDP output, the access plug-in creates buffers of 
1316 bytes and fills them with TS packets. The buffer are then queued 
until the emission date of the first TS packet. At that time, the 
stream output thread (running at a real-time priority) will pick it 
up and write it to the network device. This thread is optional and is 
very similar to the audio output thread.

RTP output will add to this scheme an extra header before the 1316 
bytes, with the emission date of the first TS packet. It will be 
handled by the access plug-in layer.

5. Conclusion
    ==========
Comments are welcome. From the mails we get in the mailing-list, I 
feel that such features are pretty urgent. Therefore, I will start 
working on this as soon as the aout3 is in the CVS (which is only a 
few days away).

-- 
Christophe Massiot.

-- 
This is the vls-devel mailing-list, see http://www.videolan.org/vls-devel/
To unsubscribe, please read http://www.videolan.org/lists.html
If you are in trouble, please contact <postmaster at videolan.org>