The neat thing about this particular singing synthesizer is that it used a surprisingly sophisticated (especially for the 60s) physical model of the human vocal tract [1], and was perhaps the first use of physical modeling sound synthesis. Vowel shapes were obtained through physical measurements of an actual vocal tract via x-rays. In this case, they were Russian vowels, but were close enough for English.
While this particular kind of speech synthesis[2] isn't really used anymore, it's still fun to play around with. Pink Trombone [3] is a good example of a fun toy that uses a waveguide physical model, similar to the Kelly-Lochbaum model above. I've adapted some of the DSP in Pink Trombone a few times[4][5][6], and used it in some music[7] and projects[8]of mine.
For more in-depth information about specifically doing singing synthesis (as opposed to general speech synthesis) using waveguide physical models, Perry Cook's Dissertation [9] is still considered to be a seminal work. In the early 2000s, there were a handful of follow-ups to physically-based singing synthesis being done at CCRMA. Hui-Ling Lu's dissertation [10] on glottal source modelling for singing purposes comes to mind.
Another excellent, but quite dense, resource I've found helpful for implementing my own waveguide models is Physical Audio Signal Processing, a book available as a hard copy and online [1]. There are also an absolute ton of research papers on these topics which have failed to be summarized anywhere or cited outside the small circle of researchers, so there's a ton of institutional knowledge about physical modeling locked up in academic papers that isn't super accessible.
I've been fascinated by the simplicity of this since I ran into SAM (Software Automatic Mouth) on the C64, but never really taken the time to delve into it. Your links are an amazing resource...
While this particular kind of speech synthesis[2] isn't really used anymore, it's still fun to play around with. Pink Trombone [3] is a good example of a fun toy that uses a waveguide physical model, similar to the Kelly-Lochbaum model above. I've adapted some of the DSP in Pink Trombone a few times[4][5][6], and used it in some music[7] and projects[8]of mine.
For more in-depth information about specifically doing singing synthesis (as opposed to general speech synthesis) using waveguide physical models, Perry Cook's Dissertation [9] is still considered to be a seminal work. In the early 2000s, there were a handful of follow-ups to physically-based singing synthesis being done at CCRMA. Hui-Ling Lu's dissertation [10] on glottal source modelling for singing purposes comes to mind.
1: https://ccrma.stanford.edu/~jos/pasp/Singing_Kelly_Lochbaum_...
2: https://en.wikipedia.org/wiki/Articulatory_synthesis
3: https://dood.al/pinktrombone/
4: https://pbat.ch/proj/voc/
5: https://pbat.ch/sndkit/tract/
6: https://pbat.ch/sndkit/glottis/
7: https://soundcloud.com/patchlore/sets/looptober-2021
8: https://pbat.ch/wiki/vocshape/
9: https://www.cs.princeton.edu/~prc/SingingSynth.html
10: https://web.archive.org/web/20080725195347/http://ccrma-www....