Unfinished Business: The Elusive Guitar Synthesizer

When I left for Paris after my undergraduate degree in Australia, I brought the beginnings of a guitar synthesizer along to develop. My modest progress at that time to the challenge was a hexaphonic optical pickup and the unfortunate discovery that Phase Lock Loop (PLL) chips of the day would not reliably track any slightly interesting guitar playing I could come up with. Forty years later the situation is not much better. If you are a fingerstyle player of acoustic guitars, commercial guitar synthesizers feel very constraining. Focussing on how to get around their tracking errors and latency is extremely distracting.

We worked on this periodically at CNMAT and got a good boost when Gibson funded our Guitar Innovation Group.

One approach we tackled was to build a 3-axis pickup that could capture the longitudinal waves along each string. They travel faster than the lateral waves so there was hope they would speed up the pitch estimation problem. Separating the lateral and longitudinal waves from this pickup eluded us. This problem may be soluble with current DSP performance and it would be interesting to confirm my suspicion that there is useful data there on a wide range of guitars. So far little has been studied on guitar about these longitudinal waves. They have been observed on classical nylon string guitars and extensively studied on the piano.

Hearing about these challenges Andy Schmeder embarked on an interesting study which resulted in this very promising paper:
Mapping Spectral Frames to Pitch with the Support Vector Machine, Schmeder, AW, ICMC 2004

The contribution here was that machine learning showed promise producing a pitch prediction before the lateral waves had time to travel to the nut or fret and back again. It’s an interesting question how this even might be possible, given the information theoretical limits. It would seem that you just have to wait for the string to get round to oscillating at its fundamental frequency for a while before we have a chance to estimate it. There is mature and solid theory about these constraints, i.e., the Cramer-Rao bounds and Heisenberg uncertainties of the DFT. For the bass guitar and low strings of the guitar the inherent delay is inconveniently long: >20mS. You can hear the impact of this in what is probably the best guitar tracking and resynthesis readily available today: https://www.cycfi.com/2018/06/fast-and-efficient-pitch-detection-synth-t...

Here is why there is useful information for pitch prediction early on in the sounds recorded at the pickup: the distance from the pick point to the pickup is short - usually much shorter than the distance from the pick point to the nut/fret. This means information about the physical configuration of the string is accessible earlier. This configuration is being established as the finger or pick pulls the string in preparation for the pluck.

Now we come to the unfinished business: The machine learning technique Andy used for the pitch prediction was off-line and based on recordings. This approach becomes expensive to represent a broad range of playing techniques on many different guitars. What would be better is to use continual learning to train this estimator/classifier using ground truth established 20+mS later when the strings manifests its actual fundamental frequency. Where this becomes even more interesting is when we use the pitch prediction to disambiguate pitch estimations - especially all-too common octave errors. Currently it is common to use a median filter to mitigate against these errors which adds further delay in the synthesis.

If I am right that the resulting unsupervised double-training loop works well, we will have further unfinished business. To explain this I have to be sure to dispel the common confusion that there is even such a thing as “machine learning.” Machine learning systems have a “human in the loop”. Whether it is distant people who don’t drive who are classifying images to train the machine learning for automobile automata or the data scientists cleaning dirty data or programmers selecting ML algorithms, human judgements and biases are involved. The ethical and political considerations I will reserve for another note but the biases and assumptions in this guitar synthesizer situation are interesting.

The first one is a confusion between fundamental frequency estimation and perceived pitch. Perceived pitch is complicated especially for strings which sound inharmonically which they usually do if struck or plucked (and not bowed). In the case of the low strings the fundamental is often missing so inference of the stretched upper partials is involved. We made a little progress on this when Nicolas Obin visited CNMAT and worked on an extensive sound data base for pitch detection and an evaluation of the common algorithms. We tried to estimate the fundemental frequency and the exponent of the usual approximation to stretched partials from the musical acoustics literature.

The second hazard and unfinished business concerns assumptions about what tuning the guitar might be in. The common enthusiasm to round things to MIDI note numbers and equal temperament should be avoided. It is surprising how little work has been done on how people actually tune their guitars. It is certainly far from equal temperament. I did a quick survey of guitar tuning moments on on-line videos of well-established guitarists. The stories of how electronic tuners don’t work are very common. I often observe careful tuning with an electronic tuner and then fine adjustments away from equal temperament during the first verse of the song. Some performers simply tune entirely by ear from perhaps a tuning fork or single reference.

It’s important for me that further complications of this situation be taken into account because I see the bar being set rather low on evaluations of new pitch detectors for guitar. Unless you are only interested in transcription you have to work harder to characterize the sounds carefully and not assume the guitar is in tune, or is played in tune or has stable tuning. So before I reveal this final complication and more unfinished business I need to explain why I built a concrete guitar at CNMAT. Osman Ishvan and I developed a new magnetic pickup for guitars which could separate the horizontal and vertical components of the lateral wave. The idea was that horizontal wave would be a better thing to analyze for fundemental frequency estimation because it would have less cross talk induced by the vertical component moving the top plate of the guitar. (In that account the guitar is considered being played lapstyle). I found a block of concrete at the base of a wall outside the 1750 Arch Street building and installed strings and our new pickup there onto concrete anchors. The idea was to eliminate the mechanical cross-talk and confirm that we had designed the magnet circuit well enough to eliminate cross talk from magnetic coupling between the strings. This was successful. What we discovered installing the pickup on actual guitars was that more than half the crosstalk (communication between strings) was from motion of the neck which is not a good idea to engineer out of your guitar. So here is the rub: strings coupled this way entrain each other to new vibrational frequencies. How much and when depends on the chord you might be playing and the tuning of the guitar and which strings are coupling. This is another reason why a double training loop might be interesting - but we may have to include techniques akin to those used for audio source separation to mitigate or exploit the important string coupling.