To: Distribution From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] Re: Studies on a Vocal Tract Model for Speech Synthesis and Analysis 3 April 1990 Researches of the Electrotechnical Laboratory, Number 905, December 1989. "Studies on a Vocal Tract Model for Speech Synthesis and Analysis" (126 pages including 124 item bibliography) by Hiroshi Ohmura Speech Processing Section Machine Understanding Division Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba-shi Ibaraki 305, JAPAN Phone: 0298-58-5933 In Japanese, with English summary, given below. Summary: Both the articulatory and acoustical sides of speech research are very important for the understanding of speech sound phenomena generated by complicated movements of vocal organs. However, the scope of our information about these articulatory-acoustical relationships is quite limited. Therefore, the development of a vocal tract model integrating speech synthesis and analysis is essential to clarify and systematize all the relationships. The first purpose of these studies is to show that the articulatory data such as vocal tract shape are effective in continuous speech synthesis by use of a simplified vocal tract model controlled dynamically by a compact set of rules. To get a further improvement of synthetic speech, it is necessary to put the articulatory information in real speech to practical use. Therefore, the second purpose is to propose a new vocal tract model which gives a plane of articulatory-acoustical descriptions of speech in more general conditions. Through the synthetic usage of this model, we wish to make a contribution to describing a complete system existing in the relations between these two realms. This monograph consists of an introduction and six chapters. Chapter 2 deals with speech synthesis by rules using a dynamic vocal tract simulator, which is a kind of analog computer system. The control data for the simulator are generated from input sentences by a programming system. This system consists of two parts of articulatory and prosodic rules. The articulatory data such as area functions and durations are stored in a table with all of the intermediate symbols for each phoneme. Through the speech synthesis experiments of Japanese and English fairy tales, it was certified that the articulatory representation of speech functions sufficiently as an effective description of the feature. Chapter 3 deals with actual states in speech wave analysis for extraction of articulatory information. This chapter consists of three parts. The first part describes an adaptive inverse filtering method of vocal tract area function estimation. The assumptions of the vocal tract model are as follows; lossless tube section, zero impedance termination at the lips, pure resistance termination at the glottis, and sound source located at the end of the tube sections. With these assumptions, the adaptive area function estimation method has been developed. The merit of this method is the almost realtime estimation of area functions without using any transcendental information about formants and vocal tract length. However, the applicaiton of the model is limited to vowels and vowel-like sounds because of its all-pole transfer function. The second part describes an iterative method for consonantal area function estimation based on the extended vocal tract model of which the sound source can be located at any section inside the vocal tract. Now, its transfer function becomes a pole-zero function. As a result of the area function estimation for voiceless stops, a problem of modeling for sound source characteristics of consonants has been left to appropriate estimation of those locations which conform to the articulatory points. In the third part we discuss acoustic correlates regarding the manner of articulation of dentals in Japanese. These acoustic quantities are also important, in the same way as vocal tract shapes for specification of speech sound features. Chapter 4 deals with a new vocal tract model constructed under generalized acoustic conditions and recursive calculation methods of transfer functions for typical articulations such as voiceless stops, nasals, and liquids. A digital circuit of this model is presented by reflection coefficients and propagation constants both of which have some frequency characteristics. This model can by used in speech synthesis and analysis fields. Chapter 5 deals with a speech production model including the vocal tract model described in the previous chapter and introduces a basic application algorithm of a generalized vocal tract ARMA model for consonantal area function extimation. The speech production model consists of the sound source frequency characteristic, vocal tract transfer characteristic and the transfer characteristic from the lip-end to an external point. Chapter 6 deals with the computation of the nomograms on systematic articulatory variations and their power spectra as a primary application of the new vocal tract model. Using those nomograms for articulatory interpretations of spectra for vowels and several consonants in real speech, we may conclude that this model is effective in the elucidation of articulatory-spectral relationships in real speech. Chapter 7 in the conclusion of this report. The articulatory-spectral nomograms are expected to be very useful in adding to our stock of knowledge regarding a view of internal acoustic phenomena and a norm which separates consonantal vocal tract characteristics from sound source characteristics. For the future, it is necessary to introduce functional elements of articulation in the presented vocal tract model. Usage of nomographic information for many different situations in real speech makes it possible to find a new approach for consonantal area function estimation, and also to develop a new method of feature extraction. The report is in Japanese, but the author speaks reasonably good English. He was not familiar with the Office of Naval Research, and was initially reluctant to discuss his work with the "military". This reaction has occurred before, but is usually a minor problem that solves itself when the type of research that ONR supports is explained. Nevertheless, American researchers (especially those from national labs) who wish to communicate with Japanese scientists should be sensitive. -------- END OF MEMO-------------------------------------------------- ustical de