Integrating Sound & Graphics: Systems Issues
This note is abstracted from my 1996 Siggraph Tutorial.
The media library it is based on is documented here.
Real-Time Issues
A major advantage of using dedicated hardware such as sound cards and MIDI
modules is that real-time perormance and reliability issues are taken care of in
the boxes. However compromises in synthesis control and timbre are often too
great for many applications and we have to do our own synthesis. Also delays are
introduced controlling the external devices.
In this section we discuss issues associated with making sound synthesis and
graphics coexist on the same computer.
First we review operating system services required.
Resource Allocation and Management
Memory locking and bounded allocation and release times
CPU cycles (process and thread scheduling)
I/O (network, serial port, sound hardware and graphics hardware)
Synchronization
clocks (time references)
event synchrony
Bounded Latency Input/Output
scheduler guarantees
Resource Monitoring
I/O queue size
processor utilization
simplesynth.c
Now we introduce a simple example program developed for sound synthesis on
the SGI IRIX platform.
The IRIX audio library implements a FIFO queue for sound output to the D/A
convertors. To avoid audible clicks, this queue must always contain a minimum
number of samples. We refer to this as the low water mark. It is tempting to
prevent buffer underrun by stuffing as many sound samples in the buffer as we
can. This greedy strategy has an important failing: unbounded
latency. What we want for interactive sound synthesis and synchronisation is
bounded latency. Although application requirements vary, the 10+/-1 mS rule of
thumb is good to keep in mind. So we have to keep samples in the buffer but not
more than a certain high water mark.
When the synthesis process do when it has filled to the high water mark it
should release control of the CPU so that another process (perhaps a graphics
animation) can run. Now there is a real danger that other processes will occupy
the CPU so long that the FIFO will drain below the low water mark. This has
been deftly handled in the SGI audio library by providing a semaphor for the
FIFO queue. The sound driver takes note when the sample drains below a defined
low water mark. By careful use of UNIXs group semaphor mechanism, the
select(2) system call, it is possible to request the CPU when the low water mark
event occurs.
For this to work reliably however the synthesis process priority has to be
set so that it is guaranteed to be the next process to run when the operating
system reschedules.
These ideas have been embodied in a simple sound synthesiser listed below:
/*
simplesynth.c
Example of friendly scheduling of real-time sound
synthesizer.
Adrian Freed
Copyright 1996. UC Regents. All Rights Reserved\
cc -O2 -mips2 -o simplesynth simplesynth.c -lm -laudio
chown root simplesynth
chmod +s simplesynth
simplesynth
*/
/* for select(2) */
#include <unistd.h>
#include <sys/types.h>
#include <bstring.h>
#include <sys/time.h>
/* SGI audio library */
#include <audio.h>
/* for scheduler control */
#include <sys/schedctl.h>
/* for memory lock */
#include <sys/lock.h>
#include <math.h>
#define PI 3.14159265358979323
#define SRATE 44100.0 /* Hz */
#define OUTPUTQUEUESIZE 512 /* Samples */
main()
{
/* initialize audio driver
for 16-bit 44100kHz monophonic sample source to DACs */
ALport alp;
ALconfig alc;
int dacfd;
alc = ALnewconfig();
ALsetwidth (alc, AL_SAMPLE_16);
ALsetqueuesize (alc, OUTPUTQUEUESIZE);
ALsetchannels(alc, (long)1);
alp = ALopenport("obuf", "w", alc);
{
long pvbuf[2];
long pvlen=2;
pvbuf[0] = AL_OUTPUT_RATE;
pvbuf[1] = AL_RATE_44100;
ALsetparams(AL_DEFAULT_DEVICE, pvbuf, pvlen);
}
/* obtain a file descriptor associated with sound output port */
dacfd =ALgetfd(alp);
/* set process priority high */
if (schedctl (NDPRI,getpid(), NDPHIMIN) < 0)
perror ("schedctl NDPNORMMIN");
/* lock memory to avoid paging */
plock(PROCLOCK);
/* schedctl requires set user id root, so
we need to return to regular user id to avoid
security problems */
setuid(getuid());
/* synthesize */
{
/* time */
double t=0.0;
/* sine wave frequency */
double f = 440.0;
/* high and low water marks */
int hwm =300, lwm=256;
fd_set read_fds, write_fds;
/* largest file descriptor to search for */
int nfds=dacfd+1;
/* vector size */
#define VSIZE 32
short samplebuffer[VSIZE];
for(;;)
{
/* compute sine wave samples
while the sound output buffer is
below the high water mark
*/
while(ALgetfilled(alp)<hwm)
{
int i;
for(i=0;i<VSIZE;++i)
{
/* appropriately scaled sine wave */
samplebuffer[i] = 32767.0f *sin(2.0*PI*f*t);
t += 1.0/SRATE; /* the march of time */
}
/* send samples out the door */
ALwritesamps(alp, samplebuffer, VSIZE);
}
/* set the low water mark, i.e. when
we want control from select(2) */
ALsetfillpoint(alp, OUTPUTQUEUESIZE-lwm);
/* set up select */
FD_ZERO(&read;_fds); /* clear read_fds */
FD_ZERO(&write;_fds); /* clear write_fds */
FD_SET(dacfd, &write;_fds);
FD_SET(0, &read;_fds);
/* give control back to OS scheduler to put us to sleep
until the Dac queue drains and/or a
character is available from standard input */
if ( select(nfds, &read;_fds, &write;_fds,
(fd_set *)0,
(struct timeval *)0) < 0)
{ /* select reported an error */
ALcloseport(alp); perror("bad select"); exit(1);
}
/* is there a character in the queue? */
if(FD_ISSET(0, &read;_fds))
{
/* this will never block */
char c = getchar();
if(c=='q') /* quit */
break;
else /* tweak frequency */
if((c<='9') && (c>='0'))
f = 440.0+100.0*(c-'0');
}
}
ALcloseport(alp);
}
}
The select(2) call blocks on 2 devices, the DAC fifo and standard input.
This illustrates how synthesis control may be integrated into this user level
real-time scheduling. Note that most I/O on UNIX systems is coordinated through
file descriptors that may be used in such a select(2) call. The SGI MIDI system
works this way, so it is simple to extend the above example into a
MIDI-controlled software synthesizer.
Note that SGI has documented a more complicated example of real-time media
programming in their IRIX Media Guide available as:
http://www.sgi.com/Technology/TechPubs/dynaweb_bin/0530/bin/nph-dynaweb....
It Still Clicks
If you try the simple synthesizer above on your own SGI machine, you may be
disappointed to still hear some clicks from time to time. This is usually from
inteference from many daemons running alongside your program.
The following things commonly disturb real-time performance:
screen savers
disk reorganizers
virus checkers
screen recalibration
networks
media insertion (diskette, CD/ROM, etc)
time daemons
power saving shutdown
directory and file servers
NFS traffic
poorly written interrupt handlers in drivers
file system shutdowns
printer error reporting
e-mail delivery
paging for virtual memory
Many of these daemons and services are controlled on SGI machines in
/etc/config. We reboot our machines with as few daemons as possible, e.g. in
single user mode, for critical real-time performance.
Macintosh and PC Software Synthesis
The Mac O/S, like Windows, does not have a pre-emptive scheduling. Programs
explicitly pass control to each other through a coroutine mechanism. The only
preemption that occurs is through I/O interrupts. So the only way to achieve
real-time is to schedule code running at interrupt time.
On SGI machines, samples are pushed by the user process into a
FIFO. On the PCs, interrupt level code pulls samples that a
user supplied-function provides. The complication of the pull scheme is that the
user-supplied callback function is constrained because it runs at interrupt
level. For example, on the Macintosh it cannot allocate memory. It is also not
wise to spend too much time computing in this routine otherwise pending I/O may
fail.
In the pull scheme latency is controlled by the operating system not by the
user process. For the Power Macintosh, for example, it is hard to achieve
latencies better than 30mS.
Synchronization with Gesture and Graphics
Synchronization on a single machine
Synchronization itself depends on the ability to accurately time when
gestures occur, when samples are heard and when images are seen. It is amazing
that most computers and operating systems dont provide any way to time these
three things to a single clock source. Again the culprit is buffering. We may be
able to time accurately when we are finished computing an image or sound, but we
cant achieve synchronization if we dont know how long the OS and hardware
will take to deliver the media to the user. Again we have to turn to SGI systems
to see how to do this properly. The basic idea is to reference everything to a
highly accurate dependable hardware clock source.
Here is how SGI describes the clock:
dmGetUST(2) returns a high-resolution, unsigned 64-bit
number to processes using the digital media subsystem. The value of UST is the
number of nanoseconds since the system was booted. Though the resolution is 1
nanosecond, the actual accuracy may be somewhat lower and varies from system to
system. Unlike other representations of time under UNIX, the value from which
this timestamp derives is never adjusted; therefore it is guaranteed to be
monotonically increasing.
Then there is a synchronization primitive for each media type. For audio it
works as follows:
ALgetframetime(2) returns an atomic pair of (fnum, time).
For an input port, the time returned is the time at which the returned sample
frame arrived at the electrical input on the machine. For an output port, the
time returned is the time at which the returned sample frame will arrive at the
electrical output on the machine. Algetframetime therefore accounts for all the
latency within the machine, including the group delay of the A/D and D/A
converters.
The SGI video subsystem provides the analogous primitive:
vlGetUSTMSCPair(2).
Gestures communicated using MIDI events are also tagged with the the same
UST clock.
With these primitives in place the application programmers job is simplified
to scheduling the requisite delay between the various media types, so that they
are perceived together by the user.
Synchronization between machines
In the common situation that there is not enough horsepower in a single
machine to do both graphics and audio, we have to achieve synchronization
between machines without a common hardware clock. The key is to have the
machines on the same LAN and use a time daemon (e.g. timed(1)) to synchronize
their clocks.
On a very local area network, this can be achieved to within 1mS. Although
SGI appears to have ommitted a system call that provides an atomic (system
clock, UST) pair, a reasonable pair can be found on a quiet system by comparing
a few dozen repeated requests for these individual times. These pairs have to be
then communicated amongst cooperating machines (since each machine booted at a
different time). Now each machine can coordinate media streams with a common and
fairly accurate clock. There are many applications however where 1mS slop is too
long. The human ear can easily discern relative delays in audio streams of a
mere sample. If any correlation is expected between audio streams (such as 3D
audio, spatialization and stereo), all such streams should be computed on the
same machine.