[RFC] New audio output architecture

Mon May 6 22:36:50 CEST 2002

way to go !

Christophe Massiot wrote:
> 
> À (At) 23:31 +0200 3/05/2002, Christophe Massiot écrivait (wrote) :
> 
> >I'm updating my document with a few major new ideas I've had today,
> >and after that I think I'm gonna start writing the core functions.
> 
> Enjoy. I rewrote the ending.
> 
> 1. General architecture of aout3
>     =============================
> Whereas aout2 relied on the DSP plug-in behavior, aout3 is designed
> for modern callback-based audio API, such as Mac OS X CoreAudio. Of
> course DSP can still be implemented, but using some kind of emulation.
> 
> | The following schematic has changed :
> 
>   +------+    +---------+    +---------+    +---------+
>   | adec | -> |   Pre-  | -> |  Mix &  | -> |  Post-  | -> buffer #0
>   |      |    | filters |    | Downmix |    | filters |    buffer #-1
>   +------+    +---------+    +---------+    +---------+    buffer #-2    +--+
>                                                            buffer #-3 -> |HW|
>                                                                          +--+
>   <--------------------->    <------------------------>    <---------------->
>     audio decoder thread         audio mixer thread         audio output thr
> 
>               <--------->    <--------->    <--------->    <---------------->
> modules :    aout filters    aout mixer    aout filters        aout aal
> 
> [aal = architecture abstraction layer]
> 
> As you see, the major idea behind aout3 is to split the audio output
> thread in two threads. As a matter of fact, the current aout fulfills
> two contradictory missions : heavy calculation for mixing and
> resampling, and wait for a VERY accurate date to DMA the buffer to
> the hardware.
> 
> The audio mixer thread takes up almost all functions of the current
> audio output. The new audio output IO thread only cares about taking
> the first spooled buffer and DMAing it to the hardware. The latter
> thread implementation will greatly differ between architectures. For
> instance the Mac OS X CoreAudio plug-in will not need spawning a
> thread, since this is already done by CoreAudio's IO thread. The DSP
> plug-in on the contrary will launch a new thread, and spend its time
> doing blocking write() calls.
> 
> On systems having real-time capabilities, the audio output IO thread
> should be assigned a VERY HIGH priority, so that DMA transfers are
> not even slightly delayed. This is the case in CoreAudio.
> 
> With that architecture, the only way we will have buffer underruns is
> if the audio mixer thread doesn't have time to prepare the samples.
> This implies a very long burst in the CPU load and on real-time
> systems, one will want to assign the audio mixer thread a high (but
> not as high as audio output's) priority.
> 
> And before every DMA transfer we will check the date and print a
> message if we're late. That way, at least we will know when we have
> to expect an audio underrun. Better than nothing.
> 
> 2. aout3 data format
>     =================
> The schematic in chapter 1 is oversimplified. The audio output needs
> to understand a handful of input and output formats : u8, s8, u16,
> s16, float32, fixed24. It also must deal with an arbitrary number of
> channels (for instance #defined to 6), and with several input streams
> (this is debated [*]). We can no longer have an output per format (as
> it is the case in aout2), so we need some simplification.
> 
> I propose that the internal format for samples in the audio output be
> float32. It seems to be a very popular format ; for instance it is
> the native output format of liba52, and the native input format of
> Mac OS X CoreAudio. It allows for more precision in the samples, and
> as such I think it is the best choice that we have.
> 
> float32 processing may take more CPU time than currently. I may hurt
> your feelings : we do not care. On all machines where VLC runs, the
> audio output takes up 0 % CPU. We can afford trading more CPU for
> more precision. I also think we should use more expensive dithering
> algorithms for resampling. Audio output is the main source of
> complains from our users, we must do something, even at the expense
> of a higher CPU load.
> 
> | However, embedded systems usually do not have a floating-point unit, so
> | we will also have a simpler mode using fixed24, the native format of
> | libmad. It is not required for plug-in developers to support both float32
> | and fixed24, since it is expected that embedded systems do not need as
> | many features as workstations (for instance downmixing will probably
> | never be useful on an embedded system).
> 
> [*] In case VLC reads several streams at once, there may be several
> instances of audio decoders at the same time, and thus several
> streams to mix before output. I'm not sure this is a good idea. In
> another thread I will speak about the multi-stream aspect of VLC,
> which can be debated.
> 
> | 3. aout3 filters
> |    =============
> aout3 is built as a pipeline of filters which have very different
> roles. A filter is a unit converting one stream format to another, a
> stream format being :
> 
> typedef struct audio_sample_format_s
> {
>      int i_type; /* u8, s8, u16, s16, float32, fixed24... */
>      int i_rate;
>      int i_channels; /* 1..6 */
> } audio_sample_format_t;
> 
> 3.1 Filter plug-ins
> 
> A filter plug-in takes one stream as input, and outputs one stream,
> with one or several parameters changed. Consequently, there will be
> three basic types of filter plug-ins :
> - Converters, from one type to another (ie. u16 -> float32)
> - Resamplers (change i_rate), either because the hardware doesn't support
> the rate of the stream (48000 -> 44100), or because we have clock
> problems and need to go a little faster or slower (48000 -> 47500)
> - Special effects plug-ins, which change the samples without changing
> the format ; for instance attenuation, balance or graphics effects.
> 
> For optimization purposes, a filter plug-in can combine several
> operations. For instance, a filter can convert (u16 -> float32) and
> resample (48000 -> 44100) at the same time.
> 
> When needing a conversion (ie. for the pre-filters and post-filters
> passes), the aout core functions will call modules for candidates for
> the whole conversion, eg. :
> { u16, 48000, 2 } -> { float32, 44100, 2 }
> If there is no candidate, the core functions will split the transformation :
> { u16, 48000, 2 } -> { float32, 48000, 2 }
> { float32, 48000, 2 } -> { float32, 44100, 2 }
> The type conversion will occur at the beginning (pre-filters pass) or
> at the end (post-filters pass).
> 
> In all cases, the user will have the ability to modify the pipeline
> and add or delete filters, provided the continuity of the formats is
> preserved.
> 
> 3.2 Mixer plug-in
> 
> The mixer is a special type of filter, in that it takes several
> streams as inputs, and outputs one stream. The input streams must all
> be of the same rates and types. Only two types will be supported :
> float32 and fixed24.
> Pre-filters are in charge of converting the streams to fulfill this
> requirement.
> 
> The number of channels in the output stream will depend on the
> hardware capabilities. When necessary, downmixing or upmixing will be
> performed on the fly.
> 
> We can implement several mixers with different complexities. For
> instance workstations can use a float32 mixer with dithering and
> precise downmixing. Embedded systems may use a much faster fixed24
> mixer with limited accuracy.
> 
> 4. aout data flow
>     ==============
> 
> | 4.1 Pre-filters
> 
>   +------+    +-----------+    +-----------+    +----------+    +-----------+
>   | adec | -> | Converter | -> | Resampler | -> | Optional | -> | amix thr. |
>   |      |    |           |    |           |    | effects  |    |           |
>   +------+    +-----------+    +-----------+    +----------+    +-----------+
> 
>   <-------------------------------------------------------->    <----------->
>                   audio decoder thread                        audio mixer thr
> 
> Pre-filters transform the samples from the arbitrary format of the
> decoder to the native format of the audio mixer.
> 
> Please notice that these operations take place in the audio decoder
> thread, and not in the audio mixer thread. I see three advantages :
> - when working with several streams, the conversion and resampling of
> every stream can be done in parallel for performance reasons (SMP
> machines) ;
> - the conversion is done immediately, so that the buffer can be
> immediately reused by the decoder ; we do not need setting up complex
> buffering services with the decoder, which is free to allocate its
> own buffers ;
> - the filters are processed just after the decoding of the samples,
> which improves the processor cache efficiency, thus giving better
> performance.
> 
> And one drawback :
> - the aout_PlayBuffer function doesn't return immediately. This must
> be taken into account when designing decoders.
> 
> When the audio decoder gives samples to output, it assigns them a
> calculated timestamp. The audio output tries to figure out an
> estimated timestamp of the output time of the first sample, with
> information provided by the architecture abstraction layer. If the
> two dates aren't equal, the audio output core functions may decide to
> 1. do nothing if the difference is small 2. resample the current
> buffer to be in sync at the end of the buffer 3. skip samples if
> we're _really_ late. The resampling is done on a per-stream basis,
> because some decoders may be late while others aren't.
> 
> | 4.2 Mixer
> 
> The mixer combine several input streams into one output stream with
> the same number of channels as the hardware. See § 3.2 for more
> information on the data flow inside the mixer plug-in.
> 
> The mixer runs in its own thread because it can multiplex streams
> coming from several adec threads. When there is only one decoder (the
> majority of the cases), it may seem smarter to run the mixer inside
> the audio decoder thread, for the same reasons as the pre-filters,
> and because the audio mixer doesn't have hard real-time constraints.
> We may choose to support such behavior. In case suddenly a second
> stream appears, we can either do the mixing in one of the decoder
> thread, or spawn a new mixer thread.
> 
> | 4.3 Post-filters
> 
> Post-filters are run after the mixer, in the same thread, to control
> global properties such as volume or balance. They also convert the
> samples to the hardware type (float32 -> u16).
> 
>   +-------+    +----------+    +-----------+
>   | mixer | -> | Optional | -> | Converter | -> spooled buffer for IO thread
>   |       |    | effects  |    |           |
>   +-------+    +----------+    +-----------+
> 
>   <---------------------------------------->
>              audio mixer thread
> 
> | 4.4 Architecture abstraction layer
> 
> The AAL runs a loop which regularly triggers the transfer of the
> eldest spooled buffer to the hardware. No transformation is done. The
> AAL will provide the core functions with information on the playback,
> for instance if we're late or in advance in comparison with the
> timestamp of the buffer.
> 
> The AAL has very strong real-time constraints and will thus run in
> its own thread, either provided by the operating system (Mac OS X) or
> spawned on the fly.
> 
> --
> Christophe Massiot.
> 
> --
> This is the vlc-devel mailing-list, see http://www.videolan.org/vlc/
> To unsubscribe, please read http://www.videolan.org/lists.html
> If you are in trouble, please contact <postmaster at videolan.org>

-- 
This is the vlc-devel mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://www.videolan.org/lists.html
If you are in trouble, please contact <postmaster at videolan.org>