Audio I/O programming on SGI IRIX

Integrating Sound & Graphics: Systems Issues

This note is abstracted from my 1996 Siggraph Tutorial. The media library it is based on is documented here.

Real-Time Issues

A major advantage of using dedicated hardware such as sound cards and MIDI modules is that real-time perormance and reliability issues are taken care of in the boxes. However compromises in synthesis control and timbre are often too great for many applications and we have to do our own synthesis. Also delays are introduced controlling the external devices.

In this section we discuss issues associated with making sound synthesis and graphics coexist on the same computer.

First we review operating system services required.

Resource Allocation and Management

Memory locking and bounded allocation and release times
CPU cycles (process and thread scheduling)
I/O (network, serial port, sound hardware and graphics hardware)

Synchronization

clocks (time references)
event synchrony

Bounded Latency Input/Output

scheduler guarantees

Resource Monitoring

I/O queue size
processor utilization

simplesynth.c

Now we introduce a simple example program developed for sound synthesis on the SGI IRIX platform.

The IRIX audio library implements a FIFO queue for sound output to the D/A convertors. To avoid audible clicks, this queue must always contain a minimum number of samples. We refer to this as the low water mark. It is tempting to prevent buffer underrun by stuffing as many sound samples in the buffer as we can. This “greedy” strategy has an important failing: unbounded latency. What we want for interactive sound synthesis and synchronisation is bounded latency. Although application requirements vary, the 10+/-1 mS rule of thumb is good to keep in mind. So we have to keep samples in the buffer but not more than a certain high water mark.

When the synthesis process do when it has filled to the high water mark it should release control of the CPU so that another process (perhaps a graphics animation) can run. Now there is a real danger that other processes will occupy the CPU so long that the FIFO will drain below the low water mark. This has been deftly handled in the SGI audio library by providing a semaphor for the FIFO queue. The sound driver takes note when the sample drains below a defined low water mark. By careful use of UNIX’s group semaphor mechanism, the select(2) system call, it is possible to request the CPU when the low water mark event occurs.

For this to work reliably however the synthesis process priority has to be set so that it is guaranteed to be the next process to run when the operating system reschedules.

These ideas have been embodied in a simple sound synthesiser listed below:

/*
	simplesynth.c
	Example of friendly scheduling of real-time sound
	synthesizer.
	Adrian Freed
	Copyright 1996. UC Regents. All Rights Reserved\
	cc -O2 -mips2 -o simplesynth simplesynth.c -lm -laudio
	chown root simplesynth
	chmod +s simplesynth
	simplesynth
*/
/* for select(2) */
#include <unistd.h>
#include <sys/types.h>
#include <bstring.h>
#include <sys/time.h>
/* SGI audio library */
#include <audio.h>
/* for scheduler control */
#include <sys/schedctl.h>
/* for memory lock */
#include <sys/lock.h>
#include <math.h>
#define PI 3.14159265358979323
#define SRATE 44100.0 /* Hz */
#define OUTPUTQUEUESIZE 512 /* Samples */
main()
{
 /* initialize audio driver 
 	for 16-bit 44100kHz monophonic sample source to DACs */
	ALport alp;
	ALconfig alc; 
	int dacfd;	
 	alc = ALnewconfig();
	ALsetwidth (alc, AL_SAMPLE_16);
	ALsetqueuesize (alc, OUTPUTQUEUESIZE);
	ALsetchannels(alc, (long)1);
	alp = ALopenport("obuf", "w", alc);
	{
         	long pvbuf[2];
		long pvlen=2;
		pvbuf[0] = AL_OUTPUT_RATE;
		pvbuf[1] = AL_RATE_44100; 
		ALsetparams(AL_DEFAULT_DEVICE, pvbuf, pvlen);
	}
	/* obtain a file descriptor associated with sound output port */
	 dacfd =ALgetfd(alp);
/* set process priority high */
    	if (schedctl (NDPRI,getpid(), NDPHIMIN) < 0)
               	perror ("schedctl NDPNORMMIN");
/* lock memory to avoid paging */
	plock(PROCLOCK);
	/* schedctl requires set user id root, so
		we need to return to regular user id to avoid
		security problems */ 
	setuid(getuid());
/* synthesize */
	{
		/* time */
		double t=0.0;		
		/* sine wave frequency */
		double f = 440.0; 
		/* high and low water marks */
		int hwm =300, lwm=256; 
		fd_set read_fds, write_fds;
		/* largest file descriptor to search for */
		int nfds=dacfd+1;		   
/* vector size */
#define VSIZE 32
		short samplebuffer[VSIZE];
		for(;;)
		{
		/* 	compute sine wave samples 
			while the sound output buffer is 
			below the high water mark
		*/
			while(ALgetfilled(alp)<hwm)
			{
				int i;
				for(i=0;i<VSIZE;++i)
				{
					/* appropriately scaled sine wave */
					samplebuffer[i] = 32767.0f *sin(2.0*PI*f*t);
					t += 1.0/SRATE; /* the march of time */
				}
				/* send samples out the door */
				ALwritesamps(alp, samplebuffer, VSIZE);
	 		}
	 	/* set the low water mark, i.e. when
			we want control from select(2) */
			ALsetfillpoint(alp, OUTPUTQUEUESIZE-lwm); 
			/* set up select */
		   	FD_ZERO(&read;_fds);	/* clear read_fds        */
	 		FD_ZERO(&write;_fds);	/* clear write_fds        */
			FD_SET(dacfd, &write;_fds);            
			FD_SET(0, &read;_fds); 
		/* give control back to OS scheduler to put us to sleep
			until the Dac queue drains and/or a 
			character is available from standard input */
			 if ( select(nfds, &read;_fds, &write;_fds,
						 (fd_set *)0, 
		                      (struct timeval *)0) < 0)  
			{ /* select reported an error */
					ALcloseport(alp); perror("bad select"); exit(1);
			}
			/* is there a character in the queue? */
	      		if(FD_ISSET(0, &read;_fds))
			{
				/* this will never block */
				char c = getchar(); 
				if(c=='q') /* quit */
					break;
				else	/* tweak frequency */
					if((c<='9') && (c>='0'))
						f = 440.0+100.0*(c-'0');
			}
		}
		ALcloseport(alp);
	}
}

The select(2) call blocks on 2 devices, the DAC fifo and standard input. This illustrates how synthesis control may be integrated into this user level real-time scheduling. Note that most I/O on UNIX systems is coordinated through file descriptors that may be used in such a select(2) call. The SGI MIDI system works this way, so it is simple to extend the above example into a MIDI-controlled software synthesizer.

Note that SGI has documented a more complicated example of real-time media programming in their IRIX Media Guide available as: http://www.sgi.com/Technology/TechPubs/dynaweb_bin/0530/bin/nph-dynaweb....

It Still Clicks

If you try the simple synthesizer above on your own SGI machine, you may be disappointed to still hear some clicks from time to time. This is usually from inteference from many daemons running alongside your program.

The following things commonly disturb real-time performance:

screen savers

disk reorganizers

virus checkers

screen recalibration

networks

media insertion (diskette, CD/ROM, etc)

time daemons

power saving shutdown

directory and file servers

NFS traffic

poorly written interrupt handlers in drivers

file system shutdowns

printer error reporting

e-mail delivery

paging for virtual memory

Many of these daemons and services are controlled on SGI machines in /etc/config. We reboot our machines with as few daemons as possible, e.g. in single user mode, for critical real-time performance.

Macintosh and PC Software Synthesis

The Mac O/S, like Windows, does not have a pre-emptive scheduling. Programs explicitly pass control to each other through a coroutine mechanism. The only preemption that occurs is through I/O interrupts. So the only way to achieve real-time is to schedule code running at interrupt time.

On SGI machines, samples are “pushed” by the user process into a FIFO. On the PC’s, interrupt level code “pulls” samples that a user supplied-function provides. The complication of the pull scheme is that the user-supplied callback function is constrained because it runs at interrupt level. For example, on the Macintosh it cannot allocate memory. It is also not wise to spend too much time computing in this routine otherwise pending I/O may fail.

In the pull scheme latency is controlled by the operating system not by the user process. For the Power Macintosh, for example, it is hard to achieve latencies better than 30mS.

Synchronization with Gesture and Graphics

Synchronization on a single machine

Synchronization itself depends on the ability to accurately time when gestures occur, when samples are heard and when images are seen. It is amazing that most computers and operating systems dont provide any way to time these three things to a single clock source. Again the culprit is buffering. We may be able to time accurately when we are finished computing an image or sound, but we can’t achieve synchronization if we dont know how long the OS and hardware will take to deliver the media to the user. Again we have to turn to SGI systems to see how to do this properly. The basic idea is to reference everything to a highly accurate dependable hardware clock source.

Here is how SGI describes the clock:

“dmGetUST(2) returns a high-resolution, unsigned 64-bit number to processes using the digital media subsystem. The value of UST is the number of nanoseconds since the system was booted. Though the resolution is 1 nanosecond, the actual accuracy may be somewhat lower and varies from system to system. Unlike other representations of time under UNIX, the value from which this timestamp derives is never adjusted; therefore it is guaranteed to be monotonically increasing.”

Then there is a synchronization primitive for each media type. For audio it works as follows:

“ALgetframetime(2) returns an atomic pair of (fnum, time). For an input port, the time returned is the time at which the returned sample frame arrived at the electrical input on the machine. For an output port, the time returned is the time at which the returned sample frame will arrive at the electrical output on the machine. Algetframetime therefore accounts for all the latency within the machine, including the group delay of the A/D and D/A converters.”

The SGI video subsystem provides the analogous primitive: vlGetUSTMSCPair(2).

Gestures communicated using MIDI events are also tagged with the the same UST clock.

With these primitives in place the application programmers job is simplified to scheduling the requisite delay between the various media types, so that they are perceived together by the user.

Synchronization between machines

In the common situation that there is not enough horsepower in a single machine to do both graphics and audio, we have to achieve synchronization between machines without a common hardware clock. The key is to have the machines on the same LAN and use a time daemon (e.g. timed(1)) to synchronize their clocks.

On a very local area network, this can be achieved to within 1mS. Although SGI appears to have ommitted a system call that provides an atomic (system clock, UST) pair, a reasonable pair can be found on a quiet system by comparing a few dozen repeated requests for these individual times. These pairs have to be then communicated amongst cooperating machines (since each machine booted at a different time). Now each machine can coordinate media streams with a common and fairly accurate clock. There are many applications however where 1mS slop is too long. The human ear can easily discern relative delays in audio streams of a mere sample. If any correlation is expected between audio streams (such as 3D audio, spatialization and stereo), all such streams should be computed on the same machine.

Adrian Freed