When I left for Paris after my undergraduate degree in Australia, I brought the
beginnings of a guitar synthesizer along to develop. My modest progress
at that time to the challenge was a hexaphonic optical pickup and the unfortunate
discovery that Phase Lock Loop (PLL) chips of the day would not reliably
track any slightly interesting guitar playing I could come up with.
Forty years later the situation is not much better.
If you are a fingerstyle player of acoustic guitars, commercial guitar synthesizers
feel very constraining. Focussing on how to get around their tracking errors
and latency is extremely distracting.
We worked on this periodically at CNMAT and got a good boost when Gibson funded
our Guitar Innovation Group.
One approach we tackled was to build a 3-axis pickup that could capture the longitudinal
waves along each string. They travel faster than the lateral waves so there was hope
they would speed up the pitch estimation problem. Separating the lateral and longitudinal
waves from this pickup eluded us. This problem may be soluble with current DSP
performance and it would be interesting to confirm my suspicion that there is useful
data there on a wide range of guitars. So far little has been studied on guitar about these
longitudinal waves. They have been observed on classical nylon string guitars and extensively studied on the piano.
Hearing about these challenges Andy Schmeder embarked on an interesting study which resulted in this very
promising paper:
Mapping Spectral Frames to Pitch with the Support Vector Machine, Schmeder, AW, ICMC 2004
The contribution here was that machine learning showed promise producing a pitch prediction before the lateral
waves had time to travel to the nut or fret and back again. It’s an interesting question how
this even might be possible, given the information theoretical limits. It would seem that you just have to wait for the string
to get round to oscillating at its fundamental frequency for a while before we have a chance to estimate it.
There is mature and solid theory about these constraints, i.e., the Cramer-Rao bounds and Heisenberg uncertainties of the DFT.
For the bass guitar and low strings of the guitar the inherent delay is inconveniently long: >20mS.
You can hear the impact of this in what is probably the best guitar tracking and resynthesis readily available today:
https://www.cycfi.com/2018/06/fast-and-efficient-pitch-detection-synth-t...
Here is why there is useful information for pitch prediction early on in the sounds recorded at the pickup: the distance from the pick point to the
pickup is short - usually much shorter than the distance from the pick point to the nut/fret. This means information about the physical configuration
of the string is accessible earlier. This configuration is being established as the finger or pick pulls the string in preparation for the pluck.
Now we come to the unfinished business: The machine learning technique Andy used for the pitch prediction was off-line and based on recordings. This
approach becomes expensive to represent a broad range of playing techniques on many different guitars. What would be better is to
use continual learning to train this estimator/classifier using ground truth established 20+mS later when the strings manifests
its actual fundamental frequency. Where this becomes even more interesting is when we use the pitch prediction to disambiguate
pitch estimations - especially all-too common octave errors. Currently it is common to use a median filter to mitigate
against these errors which adds further delay in the synthesis.
If I am right that the resulting unsupervised double-training loop works well, we will have further unfinished business.
To explain this I have to be sure to dispel the common confusion that there is even such a thing as “machine learning.”
Machine learning systems have a “human in the loop”. Whether it is distant people who don’t drive who are classifying images
to train the machine learning for automobile automata or the data scientists cleaning dirty data or programmers
selecting ML algorithms, human judgements and biases are involved. The ethical and political considerations I will
reserve for another note but the biases and assumptions in this guitar synthesizer situation are interesting.
The first one is a confusion between fundamental frequency estimation and perceived pitch. Perceived pitch is complicated
especially for strings which sound inharmonically which they usually do if struck or plucked (and not bowed). In the case of the low
strings the fundamental is often missing so inference of the stretched upper partials is involved. We made a little progress on
this when Nicolas Obin visited CNMAT and worked on an extensive sound data base for pitch detection and an evaluation
of the common algorithms. We tried to estimate the fundemental frequency and the exponent of the usual approximation to stretched partials
from the musical acoustics literature.
The second hazard and unfinished business concerns assumptions about what tuning the guitar might be in. The common enthusiasm
to round things to MIDI note numbers and equal temperament should be avoided. It is surprising how little work has been
done on how people actually tune their guitars. It is certainly far from equal temperament. I did a quick survey
of guitar tuning moments on on-line videos of well-established guitarists. The stories of how electronic tuners don’t work
are very common. I often observe careful tuning with an electronic tuner and then fine adjustments away from equal temperament
during the first verse of the song. Some performers simply tune entirely by ear from perhaps a tuning fork or single
reference.
It’s important for me that further complications of this situation be taken into account because I see the bar
being set rather low on evaluations of new pitch detectors for guitar. Unless you are only interested in transcription you have to
work harder to characterize the sounds carefully and not assume the guitar is in tune, or is played in tune or has stable tuning.
So before I reveal this final complication and more unfinished business I need to explain why I built a concrete guitar at CNMAT.
Osman Ishvan and I developed a new magnetic pickup for guitars which could separate the horizontal and vertical components of the
lateral wave. The idea was that horizontal wave would be a better thing to analyze for fundemental frequency estimation because it
would have less cross talk induced by the vertical component moving the top plate of the guitar. (In that account the guitar is
considered being played lapstyle). I found a block of concrete at the base of a wall outside the 1750 Arch Street building
and installed strings and our new pickup there onto concrete anchors. The idea was to eliminate the mechanical cross-talk
and confirm that we had designed the magnet circuit well enough to eliminate cross talk from magnetic coupling between
the strings. This was successful. What we discovered installing the pickup on actual guitars was that more than half the
crosstalk (communication between strings) was from motion of the neck which is not a good idea to engineer out of
your guitar. So here is the rub: strings coupled this way entrain each other to new vibrational frequencies. How much and when
depends on the chord you might be playing and the tuning of the guitar and which strings are coupling. This is another
reason why a double training loop might be interesting - but we may have to include techniques akin to those used for
audio source separation to mitigate or exploit the important string coupling.