Matt Montag - EEN 540 Speech Signal Processing - Project 2
In this project, the vocal tract is approximated as a series of connected lossless tubes. Data on the cross-sectional area of the vocal tract for vowels /a/, /e/, /i/, /o/, and /u/ was provided. The tubes were downsampled with linear interpolation to obtain a new series of tubes, depending on the sample rate and simulated vocal tract length.
The connected tube model was analyzed to determine the reflection coefficient at each tube boundary. This was then converted to an IIR transfer function representing the transfer from glottal source to output at the lips, which could be applied to a glottal pulse signal to obtain a synthetic human vowel sound. The audio files were generated at a sample rate of 44100 hz.
I made a few tweaks to the system to achieve a more natural vowel sound:
Voiced onset with startup pulse, amplitude envelope, and visible turbulent noise.
Voiced release with rapid pitch and amplitude decay.
Please click on any image below for a full-size version.
Hear what it would sound like to speak through a "concatenated tube" tunnel with perfect transmission at the tube boundaries - a reverberant, metallic sound. This was constructed with tubes of random sizes concatenated at a regular interval. I noted in this experiment that sharper discontinuities in the tube boundaries led to more pronounced, ringing resonance. I used a running-average to make the tube boundaries smooth and more life-like.