[RFC] New audio output architecture
Christophe Massiot
massiot at via.ecp.fr
Wed May 1 23:44:19 CEST 2002
Dear friends,
We're having more and more problems with our current audio output
architecture, and I am fully convinced that the only option we have,
is to annihilate it. Nuke it.
I have spent quite a long time debugging aout_macosx.c, trying to
avoid cracks and gltiches whenever the CPU load is a little higher
than usual. This is pointless. From what I have learned with this
experience, the problem is that our architecture is completely f*cked
up.
In this document I would like to propose ideas for a new audio output
architecture, code-named aout3. I'm not saying here that _I_ will
write it, though if nobody volunteers, I'll eventually end up doing
it, with my usual huge latency. Comments from audio specialists are
welcome, because I'm definitely not one of them (and, I must say, I'm
really dumb as for audio matters).
0. Current audio output implementation (code-named aout2)
======================================================
In this paragraph I'll try to emphasize on what's wrong in aout2.
First, please bear in mind that aout2 is probably the oldest piece of
code in VLC, and has been written at the very beginning of the
project, when I hadn't even started writing the first MPEG video
decoder... We were young and hadn't much experience.
+----------+ +----------+ +--------+ +-----------+
| Audio | ------------> | Mix & | --> | UNIQUE | --> | soundcard |
| decoders | [aout_fifo_t] | Resample | | buffer | +-----------+
+----------+ +----------+ +--------+
<-------------------------> <----------->
audio output plug-in
[if you can't see this drawing, try with a fixed-size font such as Courier]
Here is the course of action of the audio output thread :
Mix & Resample aout fifos <------------------------------------------+
|
Write a unique buffer of size AOUT_BUFFER_DURATION |
|
Ask the hardware how much bytes are remaining in the card's buffer, |
and calculate the output date of the next byte [pf_getbufinfo] |
|
Write the unique buffer in the card's buffer [pf_play] |
|
Block until this is done --------------------------------------------+
A first problem, which arises the complexity of the audio output, is
the multiplicity of the aout_fifo_t formats. We currently support :
- unsigned 8-bit
- signed 8-bit
- unsigned 16-bit
- signed 16-bit
Each in stereo or mono mode. That makes 8 input formats to account
for. There is one audio output per format. Incidentally, we don't
accept the popular float32 (liba52) or fixed24 (libmad) formats, and
multi-channel output (5.1) is not even scheduled.
The second problem, and the most important, is the unique buffer.
Since we only have 100 ms of data in store, if, for one reason or
another, the scheduler doesn't schedule the thread for 100 ms, we're
lost. There is not much to do for the DSP plug-in used on *NIX-like
OSes, but hey, you know what, kernel developers have done some major
advances these last 10 years.
Take for instance Mac OS X CoreAudio architecture, of which I am now
officially a major fan. Writing the buffer is not done by a simple
system call (write), but by a clever mechanism of callback. Whenever
CoreAudio is starving, it wakes up (with a VERY high priority) a
thread called the IO thread which calls your callback, so that data
you have prepared in advanced can be DMAed immediately.
Unfortunately, with the unique buffer, there is no way we could
prepare data in advance. This is true also for DSP output plug-in,
when we're stuck in the write() system call, we could already start
preparing the next buffer.
1. General architecture of aout3
=============================
Whereas aout2 relied on the DSP plug-in behavior, aout3 is designed
for modern callback-based audio API, such as Mac OS X CoreAudio. Of
course DSP can still be implemented, but using some kind of emulation.
+------+ +----------+ +---------+ +---------+
| adec | -> | Mix & | -> | Sound | -> | Channel | -> buffer #0
| | -> | Resample | | effects | | downmix | buffer #-1
+------+ +----------+ +---------+ +---------+ buffer #-2
buffer #-3 -> HW
<----------------------------------------> <-------------->
audio mixer thread audio output thr
<----------> <------------------------> <-------------->
aout core aout filter plug-in(s) aout plug-in
As you see, the major idea behind aout3 is to split the audio output
thread in two threads. As a matter of fact, the current aout fulfills
two contradictory missions : heavy calculation for mixing and
resampling, and wait for a VERY accurate date to DMA the buffer to
the hardware.
The audio mixer thread takes up almost all functions of the current
audio output. The new audio output IO thread only cares about taking
the first spooled buffer and DMAing it to the hardware. The latter
thread implementation will greatly differ between architectures. For
instance the Mac OS X CoreAudio plug-in will not need spawning a
thread, since this is already done by CoreAudio's IO thread. The DSP
plug-in on the contrary will launch a new thread, and spend its time
doing blocking write() calls.
On systems having real-time capabilities, the audio output IO thread
should be assigned a VERY HIGH priority, so that DMA transfers are
not even slightly delayed. This is the case in CoreAudio.
With that architecture, the only way we will have buffer underruns is
if the audio mixer thread doesn't have time to prepare the samples.
This implies a very long burst in the CPU load and on real-time
systems, one will want to assign the audio mixer thread a high (but
not as high as audio output's) priority.
And before every DMA transfer we will check the date and print a
message if we're late. That way, at least we will know when we have
to expect an audio underrun. Better than nothing.
2. aout3 data flow
===============
The schematic in chapter 1 is oversimplified. The audio output needs
to understand a handful of input and output formats : u8, s8, u16,
s16, float32, fixed24. It also must deal with an arbitrary number of
channels (for instance #defined to 6), and with several input streams
(this is debated [**]). We can no longer have an output per format
(as it is the case in aout2), so we need some simplification.
I propose that the internal format for samples in the audio output be
float32. It seems to be a very popular format ; for instance it is
the native output format of liba52, and the native input format of
Mac OS X CoreAudio. It allows for more precision in the samples, and
as such I think it is the best choice that we have.
float32 processing may take more CPU time than currently. I may hurt
your feelings : we do not care. On all machines where VLC runs, the
audio output takes up 0 % CPU. We can afford trading more CPU for
more precision. I also think we should use more expensive dithering
algorithms for resampling. Audio output is the main source of
complains from our users, we must do something, even at the expense
of a higher CPU load. [*]
Since all internal calculations will be done on float32, we need a
bunch of converters to and from float32. The flow of operations on
data is as follows :
-------------------------------------------------+------------------------
Conversion from the adec format to float32 | AOUT_CONVERTER plug-in
+------------------------
Mixing several input streams & resampling | main audio mixer module
+------------------------
Sound effects [optional] | AOUT_FILTER plug-in
+------------------------
Downmixing if the output plug-in doesn't support | AOUT_FILTER plug-in #2
as many channels |
+------------------------
Conversion from float32 to native output format | AOUT_CONVERTER plug-in
-------------------------------------------------+------------------------
Notes
[*] I'm not completely sure about that. The consequences of float32
on embedded systems without hardware FPU support need to be
evaluated. For these systems, fixed24 (the native format of libmad)
may be more clever. Perhaps it would be a good idea to have a version
of the audio mixer using fixed24 for embedded systems. Caution, I'm
not saying that we should support two native formats at the same
time, I'm just saying that there could be a #define AOUT_FORMAT
fixed24 for some architectures. This implies adapting the sound
effects and downmixing modules too, but embedded systems probably do
not need such complicated things...
[**] In case VLC reads several streams at once, there may be several
instances of audio decoders at the same time, and thus several
streams to mix before output. I'm not sure this is a good idea. In
another thread I will speak about the multi-stream aspect of VLC,
which can be debated.
3. APIs
====
All these APIs do not pretend to be exhaustive. It's just a quick
look on what needs to be done.
3.1 Decoder API (aout_ext-dec.h)
I suggest that we use the same approach as for the video output. It
is indeed far easier to understand than having a single shared
structure without any function at all.
void * aout_NewStream( int i_format, int i_channels, int i_rate );
void aout_EndStream( void * p_stream );
void aout_PlaySound( void * p_stream, byte_t * p_samples, size_t i_size,
mtime_t play_date );
[play_date is MANDATORY]
When the audio mixer thread has got all samples for all streams, it
can start mixing the streams. In case all streams do not have the
same number of channels, the highest number will be chosen. It may
also change the sound frequency if necessary.
In order to avoid too many malloc()s, it might be a good idea to have
a system of buffer cache, such as what we did for the input buffers.
3.2 AOUT_CONVERTER plug-in
An aout converter module shall be a very simple object :
int Probe( int * i_input_format, int i_output_format, int i_channels );
byte_t * Convert( int i_input_format, int i_output_format, int i_channels,
byte_t * p_input_samples, byte_t * p_output_samples,
size_t i_input_size, size_t * pi_output_size );
[if p_output_samples is too small, returns a pointer to the first
sample of p_input_samples which haven't been converted]
3.3 AUDIO_OUTPUT plug-in
An audio output plug-in deals with the hardware. It is also very simple :
int aout_Init( ... ); -> spawns the audio output IO thread if necessary
static void IOCallback( ... ) -> takes the first buffer prepared by
the audio mixer and DMA it to the hardware
3.4 AOUT_FILTER plug-in
An AOUT_FILTER takes a buffer with n channels and returns a buffer
with m channels and maybe another transformation (such as displaying
a graph, or equalizing, or whatever...).
4. Suggested actions
=================
This document is a request for comments. So for now, comments are
welcome. The next step will be a call for volunteers and we'll see
who does what.
--
Christophe Massiot.
--
This is the vlc-devel mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://www.videolan.org/lists.html
If you are in trouble, please contact <postmaster at videolan.org>
More information about the vlc-devel
mailing list