[RFC] New audio output architecture

Thu May 2 12:03:47 CEST 2002

Some comments:

1) Floating point calculations on a CPU with no hardware FPU support is 
not going to work. So vlc needs some integer based calculations here for 
such systems (even embedded ones).

2) If you say to use float32 as native format, does that automatically 
mean floating point calculations? That is the question here.

3) Sacrificing a bit of CPU load for better sound infrastructure is not 
a big issue for embedded systems. So I see no real problem here.

4) Having buffers between aout subsystem and sound hardware seems like a 
good idea to prevent buffer underruns.

Greetings,
Jean-Paul Saman

Christophe Massiot wrote:
> Dear friends,
> 
> We're having more and more problems with our current audio output 
> architecture, and I am fully convinced that the only option we have, is 
> to annihilate it. Nuke it.
> 
> I have spent quite a long time debugging aout_macosx.c, trying to avoid 
> cracks and gltiches whenever the CPU load is a little higher than usual. 
> This is pointless. From what I have learned with this experience, the 
> problem is that our architecture is completely f*cked up.
> 
> In this document I would like to propose ideas for a new audio output 
> architecture, code-named aout3. I'm not saying here that _I_ will write 
> it, though if nobody volunteers, I'll eventually end up doing it, with 
> my usual huge latency. Comments from audio specialists are welcome, 
> because I'm definitely not one of them (and, I must say, I'm really dumb 
> as for audio matters).
> 
> 0. Current audio output implementation (code-named aout2)
>    ======================================================
> In this paragraph I'll try to emphasize on what's wrong in aout2. First, 
> please bear in mind that aout2 is probably the oldest piece of code in 
> VLC, and has been written at the very beginning of the project, when I 
> hadn't even started writing the first MPEG video decoder... We were 
> young and hadn't much experience.
> 
>  +----------+               +----------+     +--------+     +-----------+
>  |   Audio  | ------------> |   Mix &  | --> | UNIQUE | --> | soundcard |
>  | decoders | [aout_fifo_t] | Resample |     | buffer |     +-----------+
>  +----------+               +----------+     +--------+
> 
>                             <------------------------->     <----------->
>                                    audio output                plug-in
> 
> [if you can't see this drawing, try with a fixed-size font such as Courier]
> 
> Here is the course of action of the audio output thread :
> 
>   Mix & Resample aout fifos <------------------------------------------+
>                                                                        |
>   Write a unique buffer of size AOUT_BUFFER_DURATION                   |
>                                                                        |
>   Ask the hardware how much bytes are remaining in the card's buffer,  |
>   and calculate the output date of the next byte [pf_getbufinfo]       |
>                                                                        |
>   Write the unique buffer in the card's buffer [pf_play]               |
>                                                                        |
>   Block until this is done --------------------------------------------+
> 
> A first problem, which arises the complexity of the audio output, is the 
> multiplicity of the aout_fifo_t formats. We currently support :
>  - unsigned 8-bit
>  - signed 8-bit
>  - unsigned 16-bit
>  - signed 16-bit
> Each in stereo or mono mode. That makes 8 input formats to account for. 
> There is one audio output per format. Incidentally, we don't accept the 
> popular float32 (liba52) or fixed24 (libmad) formats, and multi-channel 
> output (5.1) is not even scheduled.
> 
> The second problem, and the most important, is the unique buffer. Since 
> we only have 100 ms of data in store, if, for one reason or another, the 
> scheduler doesn't schedule the thread for 100 ms, we're lost. There is 
> not much to do for the DSP plug-in used on *NIX-like OSes, but hey, you 
> know what, kernel developers have done some major advances these last 10 
> years.
> 
> Take for instance Mac OS X CoreAudio architecture, of which I am now 
> officially a major fan. Writing the buffer is not done by a simple 
> system call (write), but by a clever mechanism of callback. Whenever 
> CoreAudio is starving, it wakes up (with a VERY high priority) a thread 
> called the IO thread which calls your callback, so that data you have 
> prepared in advanced can be DMAed immediately. Unfortunately, with the 
> unique buffer, there is no way we could prepare data in advance. This is 
> true also for DSP output plug-in, when we're stuck in the write() system 
> call, we could already start preparing the next buffer.
> 
> 
> 1. General architecture of aout3
>    =============================
> Whereas aout2 relied on the DSP plug-in behavior, aout3 is designed for 
> modern callback-based audio API, such as Mac OS X CoreAudio. Of course 
> DSP can still be implemented, but using some kind of emulation.
> 
>  +------+    +----------+    +---------+    +---------+
>  | adec | -> |   Mix &  | -> | Sound   | -> | Channel | -> buffer #0
>  |      | -> | Resample |    | effects |    | downmix |    buffer #-1
>  +------+    +----------+    +---------+    +---------+    buffer #-2
>                                                            buffer #-3 -> HW
> 
>              <---------------------------------------->    <-------------->
>                         audio mixer thread                 audio output thr
> 
>              <---------->    <------------------------>    <-------------->
>               aout core        aout filter plug-in(s)        aout plug-in
> 
> As you see, the major idea behind aout3 is to split the audio output 
> thread in two threads. As a matter of fact, the current aout fulfills 
> two contradictory missions : heavy calculation for mixing and 
> resampling, and wait for a VERY accurate date to DMA the buffer to the 
> hardware.
> 
> The audio mixer thread takes up almost all functions of the current 
> audio output. The new audio output IO thread only cares about taking the 
> first spooled buffer and DMAing it to the hardware. The latter thread 
> implementation will greatly differ between architectures. For instance 
> the Mac OS X CoreAudio plug-in will not need spawning a thread, since 
> this is already done by CoreAudio's IO thread. The DSP plug-in on the 
> contrary will launch a new thread, and spend its time doing blocking 
> write() calls.
> 
> On systems having real-time capabilities, the audio output IO thread 
> should be assigned a VERY HIGH priority, so that DMA transfers are not 
> even slightly delayed. This is the case in CoreAudio.
> 
> With that architecture, the only way we will have buffer underruns is if 
> the audio mixer thread doesn't have time to prepare the samples. This 
> implies a very long burst in the CPU load and on real-time systems, one 
> will want to assign the audio mixer thread a high (but not as high as 
> audio output's) priority.
> 
> And before every DMA transfer we will check the date and print a message 
> if we're late. That way, at least we will know when we have to expect an 
> audio underrun. Better than nothing.
> 
> 
> 2. aout3 data flow
>    ===============
> The schematic in chapter 1 is oversimplified. The audio output needs to 
> understand a handful of input and output formats : u8, s8, u16, s16, 
> float32, fixed24. It also must deal with an arbitrary number of channels 
> (for instance #defined to 6), and with several input streams (this is 
> debated [**]). We can no longer have an output per format (as it is the 
> case in aout2), so we need some simplification.
> 
> I propose that the internal format for samples in the audio output be 
> float32. It seems to be a very popular format ; for instance it is the 
> native output format of liba52, and the native input format of Mac OS X 
> CoreAudio. It allows for more precision in the samples, and as such I 
> think it is the best choice that we have.
> 
> float32 processing may take more CPU time than currently. I may hurt 
> your feelings : we do not care. On all machines where VLC runs, the 
> audio output takes up 0 % CPU. We can afford trading more CPU for more 
> precision. I also think we should use more expensive dithering 
> algorithms for resampling. Audio output is the main source of complains 
> from our users, we must do something, even at the expense of a higher 
> CPU load. [*]
> 
> Since all internal calculations will be done on float32, we need a bunch 
> of converters to and from float32. The flow of operations on data is as 
> follows :
> 
>  -------------------------------------------------+------------------------
>  Conversion from the adec format to float32       | AOUT_CONVERTER plug-in
>                                                   +------------------------
>  Mixing several input streams & resampling        | main audio mixer module
>                                                   +------------------------
>  Sound effects [optional]                         | AOUT_FILTER plug-in
>                                                   +------------------------
>  Downmixing if the output plug-in doesn't support | AOUT_FILTER plug-in #2
>  as many channels                                 |
>                                                   +------------------------
>  Conversion from float32 to native output format  | AOUT_CONVERTER plug-in
>  -------------------------------------------------+------------------------
> 
> Notes
> [*] I'm not completely sure about that. The consequences of float32 on 
> embedded systems without hardware FPU support need to be evaluated. For 
> these systems, fixed24 (the native format of libmad) may be more clever. 
> Perhaps it would be a good idea to have a version of the audio mixer 
> using fixed24 for embedded systems. Caution, I'm not saying that we 
> should support two native formats at the same time, I'm just saying that 
> there could be a #define AOUT_FORMAT fixed24 for some architectures. 
> This implies adapting the sound effects and downmixing modules too, but 
> embedded systems probably do not need such complicated things...
> 
> [**] In case VLC reads several streams at once, there may be several 
> instances of audio decoders at the same time, and thus several streams 
> to mix before output. I'm not sure this is a good idea. In another 
> thread I will speak about the multi-stream aspect of VLC, which can be 
> debated.
> 
> 
> 3. APIs
>    ====
> 
> All these APIs do not pretend to be exhaustive. It's just a quick look 
> on what needs to be done.
> 
> 3.1 Decoder API (aout_ext-dec.h)
> 
> I suggest that we use the same approach as for the video output. It is 
> indeed far easier to understand than having a single shared structure 
> without any function at all.
> 
> void * aout_NewStream( int i_format, int i_channels, int i_rate );
> void aout_EndStream( void * p_stream );
> void aout_PlaySound( void * p_stream, byte_t * p_samples, size_t i_size,
>                      mtime_t play_date );
> [play_date is MANDATORY]
> 
> When the audio mixer thread has got all samples for all streams, it can 
> start mixing the streams. In case all streams do not have the same 
> number of channels, the highest number will be chosen. It may also 
> change the sound frequency if necessary.
> 
> In order to avoid too many malloc()s, it might be a good idea to have a 
> system of buffer cache, such as what we did for the input buffers.
> 
> 3.2 AOUT_CONVERTER plug-in
> 
> An aout converter module shall be a very simple object :
> 
> int Probe( int * i_input_format, int i_output_format, int i_channels );
> byte_t * Convert( int i_input_format, int i_output_format, int i_channels,
>                   byte_t * p_input_samples, byte_t * p_output_samples,
>                   size_t i_input_size, size_t * pi_output_size );
> [if p_output_samples is too small, returns a pointer to the first sample 
> of p_input_samples which haven't been converted]
> 
> 3.3 AUDIO_OUTPUT plug-in
> 
> An audio output plug-in deals with the hardware. It is also very simple :
> int aout_Init( ... ); -> spawns the audio output IO thread if necessary
> static void IOCallback( ... ) -> takes the first buffer prepared by the 
> audio mixer and DMA it to the hardware
> 
> 3.4 AOUT_FILTER plug-in
> 
> An AOUT_FILTER takes a buffer with n channels and returns a buffer with 
> m channels and maybe another transformation (such as displaying a graph, 
> or equalizing, or whatever...).
> 
> 
> 4. Suggested actions
>    =================
> This document is a request for comments. So for now, comments are 
> welcome. The next step will be a call for volunteers and we'll see who 
> does what.
> 

-- 
This is the vlc-devel mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://www.videolan.org/lists.html
If you are in trouble, please contact <postmaster at videolan.org>