Path: news1.ucsd.edu!ihnp4.ucsd.edu!swrinde!newsfeed.internetmci.com!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv
From: andrew@itl.atr.co.jp (Andrew Hunt)
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 1/3
Supersedes: <comp-speech-faq/part1_817055504@rtfm.mit.edu>
Followup-To: comp.speech
Date: 22 Dec 1995 14:10:43 GMT
Organization: ATR International, Japan
Lines: 1813
Approved: news-answers-request@MIT.Edu
Expires: 14 Feb 1996 14:10:32 GMT
Message-ID: <comp-speech-faq/part1_819641432@rtfm.mit.edu>
Reply-To: andrew@itl.atr.co.jp (Andrew Hunt)
NNTP-Posting-Host: bloom-picayune.mit.edu
Summary: Information on Speech Technology
X-Last-Updated: 1995/12/19
Originator: faqserv@bloom-picayune.MIT.EDU
Xref: news1.ucsd.edu comp.speech:6602 comp.answers:13224 news.answers:51624

Archive-name: comp-speech-faq/part1
Last-modified: 1995/12/19
URL: http://www.speech.su.oz.au/comp.speech/


                   COMP.SPEECH FAQ POSTING - PART 1/3


[Note: this document has been automatically extracted from a WWW site:
        http://www.speech.su.oz.au/comp.speech
This may introduce some formatting errors.]


                       COMP.SPEECH FREQUENTLY ASKED QUESTIONS

   The Frequently Asked Questions (FAQ) is a regular posting to comp.speech
   which attempts to answer some of the regular questions in the comp.speech
   newsgroup. The FAQ is not meant to discuss any topic exhaustively. It will
   hopefully provide readers with pointers on where to find useful
   information, especially material available on the Internet.

   If you have not already read the Usenet introductory material posted to
   news.announce.newusers, please do. For help with FTP (file transfer
   protocol) look for a regular posting of anonymous FTP FAQ in comp.misc,
   comp.archives.admin or news.answers.

   This FAQ is posted every 4 weeks to comp.speech, comp.answers and
   news.answers.

   It is also available for ftp from the comp.speech archive site:
     *  ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete

   Or from the news.answers ftp site (and its mirrors):
     *  ftp://rtfm.mit.edu/pub/usenet/comp.speech/*

   Or on the World Wide Web:
     * Australia: http://www.speech.su.oz.au/comp.speech/
     * Britain: http://svr-www.eng.cam.ac.uk/comp.speech/
     * Japan: http://www.itl.atr.co.jp/comp.speech/

   Or by sending email to mail-server@rtfm.mit.edu with the following line in
   the body of the message:
     * send usenet/news.answers/comp-speech-faq/*

   Finally, if you only have email access to the internet, then I suggest you
   obtain the Internet-by-email guide. Send email to mail-server@rtfm.mit.edu
   with the following line in the body of the message:
     * send usenet/news.answers/internet-services/access-via-email

Admin

   About 20 of the 190 WWW pages for the FAQ have been updated in the last
   month. Thanks to the many people who sent in information and new entries.
   Nothing else to report.

Acknowledgements

   Hundreds of people have made contributions to the comp.speech FAQ over the
   last three years; there are too many to name individually. Special thanks
   go to Tony Robinson and Joe Campbell who have been particularly helpful. I
   am grateful to the people at Sydney University, Cambridge University and
   ATR ITL for supporting the FAQ on their WWW sites.

Disclaimer

   The comp.speech WWW pages are provided as is without any express or implied
   warranties. While every effort has been taken to ensure the accuracy of the
   information contained in this article, the author assumes no responsibility
   for errors or omissions, or for damages resulting from the use of the
   information contained herein.

Copyright and Reproduction

   Copyright (c) 1995 by Andrew Hunt, all rights reserved.
   The comp.speech WWW pages may not be distributed for financial gain.
   The comp.speech WWW pages may not be included in any collections or
   compilations without express permission from the author.
   Hyperlinks to the comp.speech WWW pages are encouraged.

Maintainer

   The FAQ posting and the Comp.Speech WWW Site are maintained by

    Andrew Hunt
    ATR Interpreting Telecommunications Research Laboratories
    Hikari-dai 2-2, Seika-cho, Kyoto 619-02, Japan
    andrew@itl.atr.co.jp


___________________________________________________________________________

                                 TABLE OF CONTENTS

  FAQ SECTION 1: GENERAL INFORMATION ON SPEECH TECHNOLOGY

          * Q1.1: What is comp.speech?
          * Q1.2: comp.speech ftp site
          * Q1.3: Common abbreviations and jargon
          * Q1.4: Related newsgroups and mailing lists
          * Q1.5: Related journals and conferences
          * Q1.6: Handicap Aids
          * Q1.7: Speech Databases
          * Q1.8: Speech File Formats and Conversion
          * Q1.9: Speech Laboratory Environments and Audio Editors
          * Q1.10: Speech Research Sites
          * Q1.11: Miscellaneous Software and Resources

  FAQ SECTION 2: SIGNAL PROCESSING

          * Q2.1: What sampling do I need for speech?
          * Q2.2: Finding the pitch of a speech signal
          * Q2.3: How do I find the start and end points of a speech signal?
          * Q2.4: Where can I find FFT software?
          * Q2.5: Signal processing in speech technology
          * Q2.6: Speech sampling and signal processing hardware
          * Q2.7: How do I convert to/from mu-law format?

  FAQ SECTION 3: SPEECH CODING AND COMPRESSION

          * Q3.1: Speech compression techniques
          * Q3.2: References on coding/compression
          * Q3.3: Compression and Coding Software

  FAQ SECTION 4: NATURAL LANGUAGE PROCESSING

          * Q4.1: NLP References and Books
          * Q4.2: NLP Software

  FAQ SECTION 5: SPEECH SYNTHESIS

          * Q5.1: What is speech synthesis?
          * Q5.2: How can speech synthesis be performed?
          * Q5.3: References/Books on Synthesis
          * Q5.4: Speech Synthesis on the WWW
          * Q5.5: Speech Synthesis Software/Hardware

  FAQ SECTION 6: SPEECH RECOGNITION

          * Q6.1: What is speech recognition?
          * Q6.2: How is speech recognition performed?
          * Q6.3: How can I build a simple speech recogniser?
          * Q6.4: References & books on speech recognition
          * Q6.5: Speech Recognition Hardware/Software


___________________________________________________________________________

                       LIST OF SOFTWARE/HARDWARE/INFORMATION

    The comp.speech FAQ provides information on a range of software, hardware
   and resources.

Q1.7: Speech Data

          * Bavarian Archive for Speech Signals
          * BUPT Spoken Digit Database (Chinese)
          * Center for Spoken Language Understanding (CSLU)
          * Examples of IPA Symbols
          * Linguistic Data Consortium (LDC)
          * NOISEX
          * Oxford Acoustic Phonetic Database
          * Phonemic Samples
          * RELATOR project

Q1.9: Speech Processing Environments

          * CSRE: Canadian Speech Research Environment
          * Entropic Signal Processing System (ESPS) and Waves
          * GoldWave
          * Kay Elemetrics Computer Speech Lab
          * Khoros
          * Matlab plus Signal Processing Toolbox
          * MacSpeech Lab II
          * N!Power
          * OGI Speech Tools
          * Ptolemy
          * Signalyze 3.0
          * SoundScope

Q1.11: Miscelaneous Software and Resources

  NETWORK "PHONE" SOFTWARE

          * CyberPhone
          * FAQ: How can I use the Internet as a telephone?
          * NetPhone from Electric Magic Company
          * NEVOT (1.4v) from AT&T BL
          * Internet Phone from VocalTec

  AUDIO PROCESSING SOFTWARE

          * AF version AF3R1
          * MixViews
          * Network Audio System Release 1.1
          * NIST Software - SPHERE and SCORE
          * Sound Processing Kit

  HUMAN AUDIO PERCEPTION

          * Auditory Modeller 1
          * Auditory Modeller 2
          * Auditory Toolbox for Matlab
          * Human Audio Perception Document

  DICTIONARIES AND OTHER LEXICAL TOOLS

          * BEEP dictionary
          * CMU dictionary
          * CUVOLAD dictionary
          * Dictionary
          * Homophone List
          * MRC database
          * Dictionaries on the WWW

  PHONETIC FONTS

          * Summer Institute of Linguistics IPA Fonts
          * Yamada Language Center

Q2.6: Audio Hardware

          * Macintosh Audio Hardware
          * PC Audio Hardware
          * Unix Audio Hardware

Q3.3: Compression Software and Hardware

          * 32 kbps ADPCM
          * CELP 3.2a & LPC
          * 8 Kbit/s CELP on the TMS320C5x family of DSP chips
          * File format conversion
          * G.711/721/723 Compression
          * G.728 LD-CELP vocoder
          * G.728 Compression
          * GSM 06.10 Compression
          * Lernout & Hauspie Speech Coding (5 products)
          * Lernout & Hauspie Speech Coding SDK
          * shorten - a lossless compressor for speech signals
          * TrueSpeech from DSP Group
          * U.S.F.S. 1016 CELP vocoder for DSP56001
          * ToolVox from Voxware

Q4.2: Natural Language Processing

     * Natural Language Software Registry (NLSR) - NLP Tools
     * Part of Speech Tagger

Q5.5: Speech Synthesis

          * AsTeR
          * TheBigMouth
          * CSRE: Canadian Speech Research Environment
          * DECTalk
          * Eloquence
          * Emacspeak - A Speech Output Subsystem For Emacs
          * Infovox Product Range
          * JSRU
          * Klatt-style synthesiser
          * KPE80 - A Klatt Synthesiser and Parameter Editor
          * "learph": Trainable text-to-phoneme software by Antonio Lucca 
          * Lernout and Hauspie Text-To-Speech (3 products)
          * Lernout and Hauspie Text-To-Speech Windows SDK
          * Various Mac Speech Output Applications
          * MacinTalk
          * Monologue for Windows from First Byte
          * Narrator Translator Library
          * Narrator
          * TextToSpeech Kit (NeXT)
          * Orator from Bellcore
          * PAM - A Text-To-Speech Application
          * ProVerbe Speech Engine for Windows
          * ProVoice Developer's Speech Toolkit from First Byte
          * RC Systems V8600/V8601 Text to Speech synthesizers 
          * rsynth
          * SENSYN speech synthesizer
          * SGI Developers Toolbox Synthesiser
          * SIMTEL
          * Sound Bytes DeveloperUs Kit
          * spchsyn.exe
          * Speak
          * Speech Manager and PlainTalk
          * Text to Phoneme Program 1
          * Text to phoneme program 2
          * Text to phoneme program 3
          * Tinytalk
          * TrueTalk
          * TruVoice from Centigram

Q6.5: Speech Recognition

          * AbbotDemo
          * BBN Hark Telephony Recognizer
          * Corona Speech Recognition System
          * Custom Voice(TM) by A&G Graphics Interface
          * D6006 Voice Control Processor
          * DATAVOX - French
          * Digital Dreams Speech Recognition Plug-Ins
          * DragonDictate version 3.0
          * DragonDictate for Windows
          * DragonVoiceTools
          * DSP Semiconductor Recognition Chip
          * EARS: Single Word Recognition Package
          * HM2007 - Speech Recognition Chip
          * Hidden Markov Model Toolkit (HTK) from Entropic 
          * IBM VoiceType Dictation
          * ICSS system from IBM
          * IN3 Voice Command
          * IN3 Voice Command for Windows
          * Kurzweil Voice for Windows
          * Lernout & Hauspie ASR (3 products)
          * Lernout & Hauspie ASR SDK
          * Listen for Windows 2.0 - Verbex Voice Systems
          * Lotec Speech Recognition Package
          * Myers' Hidden Markov Model software
          * NCC Dictate
          * OKI VRP6679 - Speech Recognition Chip
          * Speech Systems Phonetic Engine 500 (PE500)
          * PowerSecretary
          * ProNotes Voice Tools (due late '95)
          * PureSpeech
          * recnet
          * SayIt
          * Simon Says - for NeXT
          * Speech Commander - Verbex Voice Systems
          * 'Speech Recognition Expert' Toolkit for Windows
          * Visual Voice from Stylus Innovation
          * Voice Command Line Interface
          * Voice Control Systems Recognition
          * Visus SpeechKit
          * VCS 2030 & 2060 Voice Dialer
          * Voice-Trek 2.0
          * Creative VoiceAssist
          * Voice Blaster Ver. 4.0
          * VoiceServer for Windows
          * Votan
          * Voice Processing Corporation Speech Recognition Product Line


___________________________________________________________________________

                              FAQ SECTION 1 - GENERAL

          * Q1.1: What is comp.speech?
          * Q1.2: comp.speech ftp site
          * Q1.3: Common abbreviations and jargon
          * Q1.4: Related newsgroups and mailing lists
          * Q1.5: Related journals and conferences
          * Q1.6: Handicap Aids
          * Q1.7: Speech Databases
          * Q1.8: Speech File Formats and Conversion
          * Q1.9: Speech Laboratory Environments and Audio Editors
          * Q1.10: Speech Research Sites
          * Q1.11: Miscellaneous Software and Resources



                             Q1.1: WHAT IS COMP.SPEECH?

   Comp.speech is an unmoderated newsgroup for discussion of speech technology
   and speech science. It covers a wide range of issues from the application
   of speech technology, to research, to products and lots more. By its
   nature, speech technology is an inter-disciplinary field and the newsgroup
   reflects this. However, computer application is the basic theme of the
   group.

   Note: If you don't know what a newsgroup is, then talk to your local system
   administration about how to get access. A useful newsgroups for beginners
   is news.announce.newusers. You might also find the following documents
   useful.

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Usenet
          ?

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to_Fre
          quently_Asked_Questions_about_Usenet

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_post
          ing_to_Usenet

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about_FAQ
          s

   The following is a list of some of the topics covered by comp.speech.
     * Speech Recognition - discussion of methodologies, training, techniques,
       results and applications. This should cover the application of
       techniques including HMMs, neural-nets and so on to the field.

     * Speech Synthesis - discussion concerning theoretical and practical
       issues associated with the design of speech synthesis systems.

     * Speech Coding and Compression - both research and application matters.

     * Phonetic/Linguistic Issues - coverage of linguistic and phonetic issues
       which are relevant to speech technology applications. Could cover
       parsing, natural language processing, phonology and prosodic work.

     * Speech System Design - issues relating to the application of speech
       technology to real-world problems. Includes the design of user
       interfaces, the building of real-time systems and so on.

     * Other matters - relevant conferences, jobs, books, software, hardware,
       and products.



                          Q1.2: COMP.SPEECH FTP SITE

   Tony Robinson maintains the comp.speech ftp site. The ftp site is a
   comprehensive repository of software and information related to speech
   technology. The site is
     * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/

  COMP.SPEECH ARCHIVES

   The comp.speech ftp site provides full archives of the comp.speech
   newsgroup dating back to the creation of the group in 1991. The postings
   are stored in the order in which they arrive. Batches of 1000 articles are
   grouped into gzip'ed tar file. Matching files listing the subjects are also
   provided.
     * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/

  SOFTWARE AND OTHER RESOURCES

   The comp.speech ftp site includes a wide range of useful software and
   resources. Tony has arranged it into a series of sub-directories:

   /analysis : Speech analysis software
          FFT code, a pitch tracker, RASTA code, and IEEE DSP code.

   /auditory : Auditory model software
          AIM, Auditory Toolbox and Lutear.

   /coding : Speech coding software
          ADPCM, CELP 3.2a, G711, G721, G723, GSM, LDCELP, LPC10, Shorten.

   /data : Repository for (small) speech-related databases
          BEEP, CMUDict, Homophone list, hVd database, Peterson Barney
          database

   /dictionaries : Phonetic dictionaries
          BEEP, CMUDict, CUVOALD, Homophone list, MRC database

   /info : Key postings to comp.speech archives by subject
          Lots of interesting info!

   /recognition : Speech recognition software
          AbbotDemo, Ears, Lotec, recnet, sound blaster recognition, whistle

   /simtel_sound : Mirror of the simtel/msdos/sound directory
          Range of useful software

   /simtel_voice : Mirror of the simtel/msdos/voice directory
          Another range of useful software

   /synthesis : Speech synthesis software
          Klatt synthesis software, Klatt parameter editor and rsynth.

   /tools : Miscelaneous tools
          Part-of-speech tagger, OGI speech tools, sox audio file format
          conversion, SPHERE software and more.



                    Q1.3: COMMON ABBREVIATIONS AND JARGON.

     * ANN - Artificial Neural Network.
     * ASR - Automatic Speech Recognition.
     * ASSP - Acoustics Speech and Signal Processing
     * AVIOS - American Voice I/O Society
     * CELP - Code-book Excited Linear Prediction.
     * COLING - COmputational LINGuistics
     * DTW - Dynamic Time Warping.
     * FAQ - Frequently Asked Questions.
     * HMM - Hidden Markov Model.
     * IEEE - Institute of Electrical and Electronics Engineers
     * JASA - Journal of the Acoustic Society of America
     * LPC - Linear Predictive Coding.
     * LVQ - Learned Vector Quantisation.
     * NLP - Natural Language Processing.
     * NN - Neural Network.
     * TI - Texas Instruments.
     * TIMIT - A large speech corpus from TI and MIT - see Q1.7
     * TTS - Text-To-Speech (i.e. synthesis).
     * VQ - Vector Quantisation.



                 Q1.4: RELATED NEWSGROUPS AND MAILING LISTS.

Newsgroups

   comp.ai - Artificial Intelligence newsgroup.
          Postings on general AI issues, language processing and AI
          techniques. The comp.ai FAQ covers NLP, NN and other AI information.

   comp.ai.nat-lang - Natural Language Processing Group
          Postings regarding Natural Language Processing. Set up to cover a
          broard range of related issues and different viewpoints. A
          comp.ai.nat-lang FAQ posting is available.

   comp.ai.nlang-know-rep - Natural Language Knowledge Representation
          Moderated group.

   comp.ai.neural-nets - discussion of Neural Networks and related issues.
          There are often posting on speech related matters - phonetic
          recognition, connectionist grammars and so on. A comp.ai.neural-nets
          FAQ posting is available.

   comp.compression - occasional articles on compression of speech.
          The comp.compression FAQ has some info on audio compression
          standards.

   comp.dcom.telecom - Telecommunications newsgroup.
          Has occasional articles on voice products.

   comp.dsp - discussion of signal processing - hardware and algorithms and
          more.
          Has a good FAQ posting which is also available on the WWW and by ftp
          (addresses below). Has a regular posting of a comprehensive list of
          Audio File Formats.

          + http://www.bdti.com/dsp_faq.htm
          + ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

   comp.multimedia - Multi-Media discussion group.
          Has occasional articles on voice I/O.

   sci.lang - Language.
          Discussion about phonetics, phonology, grammar, etymology and lots
          more. A sci.lang FAQ is available.

   alt.sci.physics.acoustics
          Some discussion of speech production & perception.

   alt.binaries.sounds.* - posting and discussion of sound samples.

Mailing Lists

   [There are many other mailing lists which are not mentioned here. If you
   know of one which should be included in the list, then please submit it.]

   ECTL - Electronic Communal Temporal Lobe
          Founder & Moderator: David Leip. Moderated mailing list for
          researchers with interests in computer speech interfaces. This list
          serves a broad community including persons from signal processing,
          AI, linguistics and human factors. To subscribe, send your name,
          institute, department, daytime phone and email address to:

          + ectl-request@snowhite.cis.uoguelph.ca

   The ECTL archive site is

                 ftp://snowhite.cis.uoguelph.ca/pub/ectl

   Prosody Mailing List
          Unmoderated mailing list for discussion of prosody. The aim is to
          facilitate the spread of information relating to the research of
          prosody by creating a network of researchers in the field. If you
          want to participate, send the following one-line message to

          + listserv@msu.edu
          + subscribe prosody Your Name

   foNETiks
          A moderated monthly newsletter distributed by e-mail. It carries job
          advertisements, notices of conferences, and other news of general
          interest to phoneticians, speech scientists and others. The editors
          are Linda Shockey and Gerry Docherty. To subscribe send the
          following 1 line message to

          + mailbase@mailbase.ac.uk
          + join fonetiks your_first_name your_second_name

   Digital Mobile Radio
          Covers lots of areas include some speech topics including speech
          coding and speech compression. Mail Peter Decker
          dec@dfv.rwth-aachen.de to subscribe.



                    Q1.5: RELATED JOURNALS AND CONFERENCES

   [Note: Also see the list provided in Shikano's WWW site on Speech and
   Acoustics:
   http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e-ww
   w-site.html.]

Product Oriented Magazines

     * Voice News - monthly industry newsletter

    Stoneridge Technical Services
    PO Box 1891, Rockville, MD, 20850, USA
    Phone: (301) 424-0114
     * Voice Technology News
     * Voice Processing Magazine (1-800-854-3112)
     * Speech Technology (no longer published)

Technical Journals

   (There are some contact addresses below.)
     * Computer Speech and Language
     * Speech Communication
     * IEEE Transactions on Speech and Audio Processing
     * IEEE Signal Processing Magazine
     * IEEE Transactions on Acoustics, Speech, and Signal Processing (ASSP)
       (now obsolete)
     * Computational Linguistics (COLING)
     * Journal of the Acoustical Society of America (JASA)
     * AVIOS Journal
     * ASR News

Conferences

     * ICASSP: Intl. Conference on Acoustics Speech and Signal Processing
       (IEEE)
     * ICSLP: Intl. Conference on Spoken Language Processing
     * EUROSPEECH: European Conference on Speech Communication and Technology
     * AVIOS: American Voice I/O Society Conference
     * SST: Australian Speech Science and Technology Conference

Some Contacts Addresses

    Institute of Electrical and Electronics Engineers (IEEE) 

   For IEEE Transactions on Speech and Audio Processing (from Jan 93)
   and IEEE Transactions on Acoustics, Speech, and Signal Processing (ASSP) -
   now obsolete.

    IEEE Service Center
    445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855, USA
    Phone: 1-800-678-IEEE or (201)981-0060

    Harcourt Brace and Company Ltd. 

   For Computer Speech and Language

   Price: $US170 (Institutions), $US75 (Individuals), 4 times per year.

    High Street, Foots Cray, Sidcup
    Kent, DA14 SHP, England

    Association for Computational Linguistics 

   For Computational Linguistics

    MIT Press Journals
    55 Hayward St, Cambridge, MA 02142, USA
    Phone: (617)253-2889



                             Q1.6: HANDICAP AIDS

   Can anyone provide information on speech technology aids for the deaf,
   blind, speech impaired, physically impaired or others who may benefit from
   speech technology?

    SpeechViewer II
     * Platform: IBM Machines from Mod 25 on.
     * Description: SpeechViewer II is a speech therapy tool. It provided
       graphical feedback of various speech features so that speech impaired
       individuals can improve their speech. It works with an audio bandwidth
       of 7.3 Khz and thus allows the therapist to work with sustained vowels
       and fricatives. A wide range of graphics are used to provide adequate
       variability to hold client interest. An extensive set of statistics are
       gathered which allows a therapist to do research or keep therapy
       records. The speech therapy modules are:
          + Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing
          + Skill Building - Pitch, Voicing, Phonology
          + Patterning - Pitch & Loudness - Waveform & Spectrogram, Spectra
          + Clinical Management - Profiles, Models, Client Data
     * Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture Playback
       Adapter). It has a TI TMS320C25 DSP chip. The input sampling rate is
       44.1 Khz stereo, 88.2 Khz mono. This is a 16 bit card. It has the
       following jacks: mic in, stereo line in, stereo line out, speaker out.
       Note: This card is being replaced by Mwave technology. For more info on
       Mwave contact Texas Instruments.
     * Price:
          + The software is $2130 list, $1491 educational, part number
            92F2066.
          + The M-ACPA is $370 list, $222 educational, part number 92F3378.
          + The MicroChannel adapter part number is 92F3379 (same price).
     * Contact: The Psychological Corporation (TPC) [IBM Authorized
       Remarketer]

    Phone: 1-800-228-0752 or contact IBM on 1-800-426-4832.



                            Q1.7: SPEECH DATABASES

   A wide range of speech databases have been collected. These databases are
   primarily for the development of speech synthesis/recognition and for
   linguistic research.

   Some databases are free but most are not. The databases normally require
   lots of storage space (100's of MBytes is not unusual). Do not expect to be
   able to ftp large amounts of speech data.

   In addition to the descriptions of speech databases and speech database
   providers below, information can be obtained from

    LDC: Linguistic Data Consortium
          Provides a very wide range of speech and text data to research and
          commercial users: see below.

    COCOSDA Home Page: http://www.itl.atr.co.jp/cocosda/
          The International Committee for the Co-ordination and
          Standardisation of Speech Databases and Assesment Techniques for
          Speech Input/Output.

    Shikano's WWW site on Speech and Acoustics
          http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource
          /e-www-site.html

    RELATOR Project
          European resource initiative: see below.

   The following speech data resources are described in the FAQ.

          * Bavarian Archive for Speech Signals
          * BUPT Spoken Digit Database (Chinese)
          * Center for Spoken Language Understanding (CSLU)
          * Examples of IPA Symbols
          * Linguistic Data Consortium (LDC)
          * NOISEX
          * Oxford Acoustic Phonetic Database
          * Phonemic Samples
          * RELATOR project



Bavarian Archive for Speech Signals

     * Description: The Bavarian Archive for Speech Signals (BAS) was founded
       in January 1995 as an initiative of the Institute of Phonetics at the
       University of Munich, Germany. The BAS will develop, validate,
       administrate and disseminate corpora of spoken German to the speech
       community as well as to speech engineering industry. Presently the
       following German speech corpora are available on ISO 9660 CDROM:

        Siemens 1000 - SI1000
                5 CDROMs, newspaper corpus, read speech, 10 speakers x 1000
                utterances

        Siemens 100 - SI100
                7 CDROMs, read speech, 101 speakers x 100 sentences

        PhonDat 1 - PD1
                6 CDROMs, new edition in preparation, read speech, 201
                speakers x 450+ sentences

        PhonDat 2 - PD2
                1 CDROM, read speech, 2nd edition, 16 speakers x 200
                sentences, various labelled information

        Verbmobil
                Spontaneous speech recorded in a dialog task (appointment
                scheduling). More information on the VERBMOBIL project:
                http://www.dfki.uni-sb.de/verbmobil

   Corpora in Preparation

        PhonDat I - PD1: 2nd extended edition (Jul 1995)

        Strange Corpora - SC
                Reference Corpora that reflect certain well known problems in
                speech processing, like accents, repair, breaks, hesitations,
                repetitions, extreme F0, backround noise, pathological speech,
                speaker adaptation. The first SC corpus (SC1 Accents) will be
                edited in Jul 1995.

        BAS Edition of Verbmobil Corpora - VM: 2nd extended edition

        Articulatory data - AD: EMA data of speakers of SI1000 corpus

        ERBA: 10000 utterances from a train inquiry task

     * Misc: BAS is currently developing tools for the automatic annotation
       and segmentation of very large speech corpora. This includes the
       automatic detection of variants of pronunciation, a statistical based
       alignment and a rule-based refinement of the outcome. The BAS seeks to
       cooperate with public institutions as well as with industrial partners
       to further develop new German speech databases. BAS can be a platform
       to re-distribute existing German speech.
     * Contact and More Information: The BAS is located at the University of
       Munich, Germany.

    BAS c/o Institut fuer Phonetik
    Schellingstr. 3/II
    80799 Muenchen
    Germany
    Ph: +49-89-21802758 Fax: +49-89-2800362
    email: bas@sun1.phonetik.uni-muenchen.de
    WWW: http://www.phonetik.uni-muenchen.de/BASSeng.html



BUPT Spoken Digit Database (Chinese)

     * Vocabulary : {0, 1/yi/, 2, 3, 4, 5, 6, 7, 8, 9, 1/yao/, /dui/, /cuo/ },
       13 words in total.
     * Size: 1202 speakers in total, 789 Males and 413 Females. Each speaker
       utters each word 2 times. Total of 31252 utterances.
     * Format: 8000Hz 14bit sampling. One utterance per file.
     * Contact:

    GLuck Co.
    195 Berlioz 1C, Nun's Island
    Verdun H3E 1C1, Canada
    e-mail: weigang@zaphod.math.mcgill.ca



Center for Spoken Language Understanding (CSLU)

     * The ISOLET speech database of spoken letters of the English alphabet.
       The speech is high quality (16 kHz with a noise cancelling microphone).
       150 speakers x 26 letters of the English alphabet twice in random
       order. The ISOLET data base can be purchased for $100 by sending an
       email request to vincew@cse.ogi.edu. (This covers handling, shipping
       and medium costs). The data base comes with a technical report
       describing the data.
     * CSLU has a telephone speech corpus of 1000 English alphabets. Callers
       recite the alphabet with brief pauses between letters. This database is
       available to not-for-profit institutions for $100. The data base is
       described in the proceedings of the International Conference on Spoken
       Language Processing.
          + Contact vincew@cse.ogi.edu if interested.
     * CSLU has released for universities its Continuous English Speech
       Corpus. The corpus contains recorded speech from 690 different
       speakers, with label files at various levels - including word level and
       phonetic labels. The data were collected as part of the OGI
       Multi-language telephone corpus. CSLU provides speech corpora to all
       universities without charge. To order a corpus, print the license
       agreement/order form, complete it, and fax it to the CSLU. A
       description of the corpora and an order form are available by anonymous
       ftp:

                 ftp://speech.cse.ogi.edu/pub/releases 

     * Contact: Mike Noel

    email: noel@cse.ogi.edu Phone: (503) 690-1309



Examples of IPA Symbols

  UCLA SOUNDS OF THE WORLD'S LANGUAGES
     * Description: The UCLA Sounds of the World's Languages are available for
       Macintosh users (no DOS based system currently available). The sounds
       are stored in a Hypercard database developed at the UCLA Phonetics
       Laboratory. The aim is to illustrate and teach about the range of
       sounds used in human languages with material on more than 80 languages.
       The set demonstrates particular highlights of the sound systems
       focusing especially on rarer sounds that students may not otherwise
       have a chance to hear from a native speaker. The recordings are based
       on the archives of recordings collected at UCLA, with additional
       contributions from outside collaborators. All the languages can be
       accessed from the list of language names, or by clicking on the
       language name in a set of maps. Support for part of this work was
       provided by NSF. The database currently includes examples of languages
       from Agul and Akan to Zulu.
     * Availability: 15 DSDD disks, requiring about 35 meg of disk space when
       expanded. Available for $50 individual $100 institutions. Prepayment in
       US dollars (checks or international money orders payable to "UC
       Regents") must accompany all orders.
     * Contact: The UCLA Phonetics Laboratory
       Linguistics Department, UCLA, Los Angeles, CA 90095 1543
       Tel: (310) 825-1254
       E-mail: oldfogey@ucla.edu

  JOHN ESLINGS "IPA LABELS"
     * Description: A HyperCard stack which is available for free or a nominal
       fee.
     * Contact: John Esling can be reached by email: pdb@uvvm.uvic.ca.



Linguistic Data Consortium (LDC)

   The LDC was established to broaden the collection and distribution of
   speech and natural language data bases for the purposes of research and
   technology development in automatic speech recognition, natural language
   processing and other areas where large amounts of linguistic data are
   needed. Detailed information on the LDC is now available on the WWW:
   http://www.cis.upenn.edu/~ldc/home.html. The LDC WWW server provides
   information on membership agreements, license agreements, and summaries of
   speech and text corpora available.

    Speech Corpora
     * TIMIT Acoustic-Phonetic Continuous Speech Corpora and NYNEX Telephone
       Version of TIMIT Corpus (NTIMIT)
     * Resource Management Corpora
     * Air Travel Information System (ATIS) Corpora (multiple)
     * ARPA Continuous Speech Recognition Corpora (WSJ etc)
     * Switchboard Corpus of Recorded Telephone Conversations and Switchboard
       Corpus Excerpts (Credit Card Conversations)
     * Texas Instruments 46-Word Speaker-Dependent Isolated Word Corpus (TI46)
     * Texas Instruments Speaker-Independent Connected-Digit Corpus (TIDIGITS)
     * Road Rally Conversational Speech Corpus
     * HCRC Map Task Corpus
     * Air Traffic Control Corpus (ATC0)
     * SPIDRE Speaker Identification Corpus
     * YOHO Speaker Verification Corpus
     * OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone Corpus
     * BRAMSHILL
     * MACROPHONE
     * King Corpus for Speaker Verification Research
     * WSJCAM0: Cambridge Read News Corpus
     * TRAINS Spoken dialog corpus
     * NYNEX PhoneBook Database
     * Frontiers in Speech Processing

    Text Corpora
     * Association for Computational Linguistics Data Collection Initiative
       (ACL/DCI)
     * The Penn Treebank Project - Release 2
     * TIPSTER Information Retrieval Text Research Collection
     * United Nations Parallel Text Corpus (English, French, Spanish)
     * Japanese Language Financial New
     * European Corpus Initiative-1

    Lexical Databases
     * CELEX Lexical Database
     * COMLEX : COMmon LEXical Database of English (English syntax and
       pronunciation)

    For more information:

   Contact:

    Linguistic Data Consortium
    441 Williams Hall, University of Pennsylvania, Philadelphia, PA
    19104-6305, USA
    Phone: +1 (215) 898-0464 Fax: +1 (215) 573-2175
    e-mail: ldc@unagi.cis.upenn.edu

   WWW:
          http://www.cis.upenn.edu/~ldc/home.html

   Anonymous ftp:
           ftp://ftp.cis.upenn.edu/pub/ldc/



NOISEX-92

     * Description: Database of recording of various noises available on 2
       CDROMs. Some material from the same source is available by anonymous
       ftp in the IEEE's Signal Processing Information Base. The samples
       include
          + Voice babble
          + Factory noise
          + HF radio channel noise, pink noise, white noise
          + Various military noises; fighter jets (Buccaneer, F16), destroyer
            noises (engine room, operations room), tank noise (Leopard, M109),
            machine gun
          + Volvo 340
     * Availability 1: The cost of this database is 135 Pounds Sterling for
       the set of two CD-ROMs. Send payment with order to:
       The Speech Research Unit,
       Ex1, DRA Malvern, St.Andrew's Road,
       Malvern, Worcestershire, WR14 3PS, UK
       Tel +44-684-894074 Fax +44-684-894384
       Note: The supply of CD-ROMs is limited so please check that they are
       still available before placing an order. The only acceptable methods of
       payment are cheques (from the UK only) or bank drafts in Pounds
       Sterling drawn on a UK bank. They should be made payable to:-
       Public Sub Account HMG 4768.
     * Availability 2: Information on how to obtain a copy of the NATO RSG.10
       NOISE-ROM-0 can be obtained from the DRA Speech Research Unit (address
       above) or from:
       Dr. Herman Steeneken,
       TNO Institute for Perception,
       P.O. Box 23, 3769 ZG Soesterberg,
       The Netherlands.
     * Examples: The IEEE samples of the NOISEX database are available by
       anonymous ftp (the data files average around 10MB).
       ftp://bellona.cs.rice.edu/spib/data/noise/



Oxford Acoustic Phonetic Database

     * Available on compact disc, from J. Pickering and B. Rosner. It contains
       data on vowel-consonant and consonant-vowel combinations in both
       stressed and unstressed locations. The language covered include French,
       German, Hungarian, Italian, Japanese, British English, Spanish and
       English. For further information write to

    Electronic Publishing, Oxford University
    Press, Walton Street, Oxford OX2 6DP, UK.
    The ISBN is 0-19-268086-2
     * Contact:

    Prof. B. Rosner
    Dept. of Experimental Psychology
    South Parks Rd, Oxford, OX1 3UD, UK
    email: burton.rosner@wolfson.ox.ac.uk



Phonemic Samples

     * Some basic data. The following ftp sites have samples of English
       phonemes (American accent I believe) in Sun audio format files. See
       Question 1.8 for information on audio file formats.

          ftp://sounds.sdsu.edu/.1/phonemes: This ftp site appears to be
          obsolete. Does anyone know a new address?

          ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes: There appears to be
          some config problem with this ftp server.

          ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes



The RELATOR project

     * Description: RELATOR is a European-wide consortium of researchers who,
       with the support of the European Commission, are striving to establish
       a European repository of linguistic resources. Linguistic resources
       comprise a variety of spoken and written language materials, including
       lexicons, grammars, corpora, and spoken language databases. RELATOR
       will ensure that the requirements of the European language processing
       community receive attention.
       The RELATOR WWW pages provide information on the consortium, The
       languages currently covered by the RELATOR consortium include Danish,
       Dutch, English, French, German, Greek, Italian, Portuguese, Spanish
       plus multilingual resources. The resources include both text and
       speech.
     * WWW: http://cristal.icp.grenet.fr/Relator/homepage.html



                   Q1.8: SPEECH FILE FORMATS AND CONVERSION

   Q2.7 of this FAQ has information on mu-law coding.

   A very good and very comprehensive list of audio file formats is prepared
   by Guido van Rossum. The list is posted regularly to comp.dsp and
   alt.binaries.sounds.misc, amongst others. It includes information on
   sampling rates, hardware, compression techniques, file format definitions,
   format conversion, standards, programming hints and lots more. It is also
   available by ftp from

           ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2 



            Q1.9: SPEECH LABORATORY ENVIRONMENTS AND AUDIO EDITORS

   First, what is a Speech Laboratory Environment? A speech lab is a software
   package which provides the capability of recording, playing, analysing,
   processing, displaying and storing speech. Your computer will require audio
   input/output capability. The different packages vary greatly in features
   and capability - best to know what you want before you start looking
   around.

   Most general purpose audio editing packages will be able to process speech
   but do not necessarily have some specialised capabilities for speech (e.g.
   formant analysis).

   The following article provides a good survey.
     * Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An
       Evaluation" Journal of Speech and Hearing Research, pp 314-332, April
       1992.

   The following is a list of the speech labs described in the FAQ.

          * CSRE: Canadian Speech Research Environment
          * Entropic Signal Processing System (ESPS) and Waves
          * GoldWave
          * Kay Elemetrics Computer Speech Lab
          * Khoros
          * Matlab plus Signal Processing Toolbox
          * MacSpeech Lab II
          * N!Power
          * OGI Speech Tools
          * Ptolemy
          * Signalyze 3.0
          * SoundScope



CSRE: Canadian Speech Research Environment

     * Platform: IBM/AT-compatibles
     * Description: CSRE is a microcomputer-based system designed to support
       speech research. CSRE provides a low-cost facility in support of speech
       research, using mass-produced and widely-available hardware. The
       project is non-profit, and relies on the cooperation of researchers at
       a number of institutions and fees generated when the software is
       distributed. Functions include speech capture, editing, and replay;
       several alternative spectral analysis procedures, with color and
       surface/3D displays; parameter extraction/tracking and tools to
       automate measurement and support data logging; alternative
       pitch-extraction systems; parametric speech (KLATT80) and non-speech
       acoustic synthesis, with a variety of supporting productivity tools;
       and an experiment generator, to support behavioral testing using a
       variety of common testing protocols. A paper about the whole package
       can be found in:
          + Jamieson D.G. et al, "CSRE: A Speech Research Environment", Proc.
            of the Second Intl. Conf. on Spoken Language Processing Edmonton:
            University of Alberta, pp. 1127-1130.
     * Hardware: Can use a range of data aqcuisition/DSP hardware
     * Cost: Distributed on a cost recovery basis.
     * Availability: For more information on availability contact

    AVAAZ Innovations Inc.
    P.O.Box 8040
    1225 Wonderland Rd. N
    London, Ontario, CANADA, N6G 2B0
    Tel : (519) 472-7944 Fax : (519) 472-7814
    Email: info@avaaz.com
     * Note: Also included in Q5.5 on speech synthesis packages.



Entropic Signal Processing System (ESPS) and Waves

     * Platform: Range of Unix platforms.
     * Description: ESPS is a comprehensive set of speech analysis/processing
       tools for the UNIX environment. The package includes UNIX commands, and
       a comprehensive C library (which can be accessed from other languages).
       Waves is a graphical front-end for speech processing. Speech waveforms,
       spectrograms, pitch traces etc can be displayed, edited and processed
       in X windows and Openwindows (versions 2 & 3). Waves also includes a
       signal labelling utility which provides multiple feature labelling and
       useful features for fast labelling of large speech databases. Other
       Entropic products are HTK (see Q6.5) and TrueTalk (see Q5.5).
     * Misc: A more detailed description is provided on the Entropic WWW pages
       (http://www.entropic.com/esps.html).
     * Cost: On request.
     * Contact:

    Entropic Research Laboratory, Washington Research Laboratory
    600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003
    (202) 547-1420
    email: info@entropic.com
    WWW: http://www.entropic.com/



GoldWave

     * Platform: Windows
     * Description: GoldWave is a digital audio editor for Microsoft Windows.
       It features realtime amplitude/spectrum oscilloscopes, large file
       editing, effects, and support for a wide variety of sound formats.
          + Editing of multiple waveforms and large waveforms
          + Realtime amplitude/spectrum oscilloscopes
          + Resizable device controls window for accessing audio devices
          + Realtime fast forward and rewind playback
          + Effects: distortion, Doppler, echo, filter, mechanize, offset,
            pan, volume shaping, invert, resample, transpose, etc
          + Multiple file formats and conversions: .WAV, .AU, .IFF, .VOC,
            .SND, .MAT, .AIFF, and raw data
          + CD-ROM controls window
   More information is available on the GoldWave home page.
     * Cost: Shareware
     * Availability: Through the GoldWave home page:
       http://web.cs.mun.ca/~chris3/goldwave/goldwave.html
     * Contact: Chris Craig: chris3@cs.mun.ca



Kay Elemetrics CSL (Computer Speech Lab) 4300

     * Platform: Minimum IBM PC-AT compatible with extended memory (min 2MB)
       with at least VGA graphics. Optimal would be 386 or 486 machine with
       more RAM for handling larger amounts of data.
     * Description: Speech analysis package, with optional separate LPC
       program for analysis/synthesis. Uses its own file format for data, but
       has some ability to export data as ascii. The main editing/analysis
       prog (but not the LPC part) has its own macro language, making it easy
       to perform repetitive tasks. Probably not much use without the extra
       LPC program, which also allows manipulation of pitch, formant and
       bandwidth parameters.

       Hardware includes an internal DSP board for the PC (requires ISA slot),
       and an external module containing signal processing chips which does
       A/D and D/A conversion.
     * Misc: A programmers kit is available for programming signal processing
       chips (experts only). A speaker and microphone are supplied. Manuals
       are included.
     * Cost: Recently approx 6000 pounds sterling.
     * Contact: 

    UK distributors are Wessex Electronics,
    114-116 North Street, Downend, Bristol, B16 5SE
    Tel: 0272 571404.
   In the USA contact:

    Kay Elemetrics Corp,
    12 Maple Avenue, PO Box 2025, Pine Brook, NJ 07058-9798
    Tel:(201) 227-7760



Khoros

     * Description: Public domain image processing package with a basic DSP
       library. Not particularly applicable to speech, but not bad for the
       price.
     * Cost: Free
     * Availability: By anonymous ftp from

                 ftp://pprg.eece.unm.edu



Matlab plus Signal Processing Toolbox

     * Platform: Wide range
     * Description: Matlab (MATrix LABoratory) is a technical computing
       environment for numerical computation and visualization based on a
       matrix oriented, interpreted programming language. The programming
       environment provides support for the development of customized
       operations, along with debugging facilities and a graphical user
       interface toolkit. Audio output is provided.

       A specialised Signal Processing Toolbox is available which provides
       many functions which are useful for speech analysis. It includes filter
       design, spectral estimation, statistical signal processing, waveform
       generation, and signal and spectrogram display.

       A specialised Auditory Toolbox is available which contains functions
       useful to people interested in auditory/cochlear models. A more
       detailed description is given in Q1.10.
     * Price: On request.
     * Contact: The Math Works Inc.

    24 Prime Park Way, Natick, MA 01760-1500 USA
    Ph: 1-508-653 1415 Fax: 1-508-653 6284
    Email: info@mathworks.com

           ftp://ftp.mathworks.com

          WWW: http://www.mathworks.com/



MacSpeech Lab II (MSL II)

     * Platform: Macintosh
     * Description: A sound analysis and acquisition for Macs. MSL II delivers
       the most common functions for speech analysis (FFTs, LPCs, f0
       extraction, etc.) & produces grayscale spectrographic displays. Can be
       used for various speech technology and phonetic training tasks.
     * Hardware: Requires MacADIOS ("Macintosh Analog/Digital Input/Output
       System") hardware for speech I/O at 12/16 bits.
     * Misc: Software no longer updated by GW Instruments; MSL soft/hardware
       will not perform input/output on Quadras, for example, though analysis
       seems fine. Known to operate properly on systems as high as IIcx & II
       fx.
     * Availability: MSL has been replaced by SoundScope; see the SoundScope
       entry for more detail.
     * Contact:

    GW Instruments
    35 Medford Street, Somerville, MA 02143, USA
    Phone: (617) 625-4096 Fax: (617) 625-1322



N!Power

     * Platform: SUN, DEC and HP workstations.
     * Description: An object-oriented software package with a MOTIF GUI
       interface and a range of functionality for data analysis/editing,
       signal analysis, speech processing, real-time A/D and D/A, and 2D/3D
       interactive graphics. N!Power replaces ILS.

       N!Power can provide a Block Diagram user interface, menus, pop-ups, and
       a high-level IEEE standard symbolic scripting language. You can
       customize the blocks, menus and pop-ups with mouse point-and-click
       operations.
     * Contact:

    Signal Technology, Inc.
    104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126
    Phone: 805-899-8300 FAX: 805-899-4344
    email: larry@signal.com



OGI Speech Tools

     * Developers from the Center for Spoken Language Understanding (CSLU) at
       the Oregon Graduate Institute of Science and Technology (Portland
       Oregon)
     * Platform: Unix
     * Description: The OGI Speech tools include :
          + An X windows display tool (LYRE) for displaying data in a time
            synchronous fashion for a. the speech signal b. spectrograms c.
            phoneme labels, and other information.
          + A Neural Network (NOPT) training package.
          + An set of C library routines (LIBNSPEECH) for the manipulation of
            speech data, including: a. PLP Analysis, b. Rasta PLP Analysis, c.
            Linear Predictive Coding, d. Mel Cepstrum Coding, e. Fast Fourier
            Transform
          + A set of utilities for converting file formats such as ADC, NIST,
            mu-law, binary files, and ascii. Includes filtering.
          + A database utility (find_phone) to automate speech database
            related enquiries. It allows the user to specify a particular
            label or set of labels in a given context, display all occurrences
            of the label, and relabel the occurrences if desired.
          + A Vector-Quantizer based on the Linde Buzo and Gray (LBG)
            algorithm.
          + A set of PERL Scripts which have been used mainly to automate the
            use of the OGI Speech Tools.
          + MAN Pages for all routines and programs developed, as well as a
            User manual in both in postscript and tex format.
     * Misc: Software is written in ANSI C.
     * Availability: By anonymous ftp from

                 ftp://speech.cse.ogi.edu/pub/tools/

     * Contact: Try tools@cse.ogi.edu



Ptolemy

     * Platform: Sun SPARC, DecStation (MIPS), HP (hppa).
     * Description: Ptolemy provides a highly flexible foundation for the
       specification, simulation, and rapid prototyping of systems. It is an
       object oriented framework within which diverse models of computation
       can co-exist and interact. Ptolemy can be used to model entire systems.

       Ptolemy has been used for a broad range of applications including
       signal processing, telecomunications, parallel processing, wireless
       communications, network design, radio astronomy, real time systems, and
       hardware/software co-design. Ptolemy has also been used as a lab for
       signal processing and communications courses. Ptolemy has been
       developed at UC Berkeley over the past 3 years. Further information,
       including papers and the complete release notes, is available from the
       FTP site.
     * Cost: Free
     * Availability: The source code, binaries, and documentation are
       available by anonymous ftp from

                 ftp://ptolemy.berkeley.edu/pub/README



Signalyze 3.0 from InfoSignal

     * Platform: Macintosh
     * Description: Signalyze's basic conception revolves around up to 100
       signals, displayed synchronously in HyperCard fashion on "cards". The
       program offers a complement of signal editing features, quite a few
       spectral analysis tools, manual scoring tools, pitch extraction
       routines, a good set of signal manipulation tools, and extensive
       input-output capacity.

       Handles multiple file formats: Signalyze, MacSpeech Lab, AudioMedia,
       SoundDesigner II, SoundEdit/MacRecorder, SoundWave, three sound
       resource formats, and ASCII-text.

       Sound I/O: Direct sound input from MacRecorder and similar devices,
       AudioMedia, AudioMedia II and AD IN, some MacADIOS boards and devices,
       Apple sound input (built-in microphone). Sound output via Macintosh
       internal sound, via SoundManager 3.0, some MacADIOS boards and devices
       as well as via the Digidesign 16-bit boards.

       It has a range of capabilities for creating, editing and manipulating
       label files with flexibility in labelling format.
     * Compatibility: MacPlus and higher (including II, IIx, IIcx, IIci, IIfx,
       IIvx, IIvi, Portable, all PowerBooks, Centris and Quadras). Takes
       advantage of large and multiple screens and 16/256 color/grayscales.
       System 7.0 compatible. Runs in background with adjustable priority.
     * Misc: A demo available upon request. Manuals and tutorial included. It
       is available in English, French, and German. An UPDATER to version 2.48
       is now available in:
          + The UNIL Gopher server (see last page of InfoSignal News
            8)gopher.agoralang.com
          + The LAIP FTP server. Address: MACFL4082.unil.ch [130.223.104.31]
   Also available are a demo program, and current questions and answers.
     * Cost: Individual licence US$350, site license US$500, plus shipping.
       Upgrades from version 2.0 are available.
     * Contact: 

    North America - Network Technology Corporation
    91 Baldwin St., Charlestown MA 02129
    Fax: 617-241-5064 Phone: 617-241-9205
   Elsewhere, contact

    InfoSignal Inc.
    C.P. 73, 1015 LAUSANNE, Switzerland,
    FAX: +41 21 691-1372,
    Email: 76357.1213@COMPUSERVE.COM.



SoundScope

     * Platform: Macintosh: 68K and PowerPC native
     * Description: The SoundScope product family is used primarily in speech
       teaching & research, with some applications in animal sounds,
       forensics, and general acoustic analysis. It can record, view, analyze,
       play, copy, paste, store and print sound waveforms. Analysis functions
       include spectrogram, fundamental frequency (Fo), Linear Predictive
       Coding (LPC) including formant tracking, LPC residual, jitter (pitch
       perturbation), shimmer (amplitude perturbation), HNR, frequency
       spectrum, spectral slice, envelope, energy and zero crossing. Includes
       limited built-in filtering, runs any filter created with WLFDAP. An
       integrated text editor stores notes and calculation results. SoundScope
       lets you design your own custom "instrument" screen, tasks (macros) and
       menus. Supplied instruments include 1 channel analyser (dual snap, dual
       time, spectrogram, spectrum), 2 channel analyser, segment analyser,
       multi-channel recorder, etc.
     * Note: Supercedes MacSpeech Lab II.
     * Price: $490 to $4990, less educational discount
     * Availability: In North America, directly from GW Instruments. Contact
       the company for international distributors.
     * Contact:

    GW Instruments
    35 Medford Street, Somerville, MA 02143, USA
    Phone: (617) 625-4096 Fax: (617) 625-1322
    Email: D0268@Applelink.Apple.COM



                         Q1.10: SPEECH RESEARCH SITES

   Rather than try to list the places round the world which perform speech
   research this FAQ lists sites on the WWW where other comprehensive lists
   are maintained. Try the following:

    Shikano's WWW site on Speech and Acoustics
          http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource
          /e-www-site.html
          Lists of speech research sites by country. Currently includes around
          100 sites. The list of Japanese sites is particularly comprehensive.

    Mambo Speech Research List
          http://mambo.ucsc.edu/psl/speech.html
          Lists about 50 speech research sites and related information
          sources. Very nice presentation!

    ESCA: European Speech Communication Association
          http://ophale.icp.grenet.fr/esca/labos.html
          Links to around 15 European speech research sites and around 15
          related sources of information.

    Russ Wilcox's list of Commercial Speech Recognition
          http://www.tiac.net/users/rwilcox/speech.html
          Links to information on speech technology vendors, speech research
          labs, speech resources, on-line demos and more.

   Most speech research sites have links to other speech research sites
   somewhere in their WWW pages. You can keep following those link (till you
   go round in circles).



                 Q1.11: MISCELLANEOUS SOFTWARE AND RESOURCES.

   SPEECH INTERFACE STANDARDS: APIS ETC (ANY ADDITIONS?)

          * Microsoft Speech API

   NETWORK "PHONE" SOFTWARE

          * CyberPhone
          * FAQ: How can I use the Internet as a telephone?
          * NetPhone from Electric Magic Company
          * NEVOT (1.4v) from AT&T BL
          * Internet Phone from VocalTec

   AUDIO PROCESSING SOFTWARE

          * AF version AF3R1
          * MixViews
          * Network Audio System Release 1.1
          * NIST Software - SPHERE and SCORE
          * Sound Processing Kit

   HUMAN AUDIO PERCEPTION

          * Auditory Modeller 1
          * Auditory Modeller 2
          * Auditory Toolbox for Matlab
          * Human Audio Perception Document

   DICTIONARIES AND OTHER LEXICAL TOOLS

          * BEEP dictionary
          * CMU dictionary
          * CUVOLAD dictionary
          * Dictionary
          * Homophone List
          * MRC database
          * Dictionaries on the WWW

   PHONETIC FONTS

          * Summer Institute of Linguistics IPA Fonts
          * Yamada Language Center



AF version AF3R1

     * Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI
     * Description: The AF System is a device-independent network-transparent
       system including client applications and audio servers. With AF,
       multiple audio applications can run simultaneously, sharing access to
       the actual audio hardware.

       The AF3R1 distribution of AF includes server support for Digital RISC
       systems running Ultrix, Digital Alpha AXP systems running OSF/1, SGI
       Indigo running IRIX 4.0.5, Sun Microsystems SPARCstations running SunOS
       4.1.3, and Sun Microsystems SPARCstations running Solaris 2.3. The
       servers support audio hardware ranging from the built-in CODEC audio on
       SPARCstations and Personal DECstations to 48 KHz stereo audio using the
       DECaudio TURBOchannel module or the SPARCstation DBRI interface
     * Availability: The source kit is distributed by anonymous ftp from

                 ftp://crl.dec.com/pub/DEC/AF

                WWW: http://www.research.digital.com/CRL/projects/AF/home.html

     * Contact: af-request@crl.dec.com



MixViews

     * Description: A Unix/X sound editor. Does waveform play/record, and
       cut/splice. Has various filters, handles native file formats, FFT, LPC
       and more
     * Availability: by anonymous ftp including SunOS 4 and IRIX 5 binaries.

                 ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews



Network Audio System Release 1.1

     * Platforms: Various (includes SunOS, Solaris, SGI)
     * Description: A device-independent mechanism for transferring, playing
       and recording audio signals over a network. Has a range of features
       suited to networks.
     * Cost: Free
     * Availability: By anonymous ftp from

                 ftp://ftp.x.org:/contrib/audio/nas/netaudio-1.2.tar.gz

   Also available in the same directory are document files and some sample
       sounds.



NIST SPeech HEader REsources Package (SPHERE)

     * Description: Standard speech header software from the National
       Institute of Standards & Technology (NIST). SPHERE headers represent
       information about sample frequency, sample format, etc.
     * Availability: By anonymous ftp from

        Readme File
                ftp://jaguar.ncsl.nist.gov/pub/sphere.README 

        Source Code
                ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z 

NIST Speech Recognition Scoring Package (SCORE)

     * Description: Software for scoring results of speech recognition systems
       from the National Institute of Standards & Technology (NIST) .
     * Availability: By anonymous ftp from

        README File
                ftp://jaguar.ncsl.nist.gov/pub/score.README 

        Source Code
                ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z 



Sound Processing Kit

     * Platforms: UNIX
     * Description: Sound Processing Kit (SPKit) is an object-oriented class
       library for audio signal processing. SPKit includes classes for various
       signal processing tasks and a way of implementing sound processing
       algorithms in a simple object-oriented manner. Sound Processing Kit is
       implemented in C++ and is designed to be portable. The current version
       requires a bare-bones C++ 2.0 compatible compiler (templates and
       exceptions are not needed). ANSI C standard libraries are required.
       SPKit includes classes for
          + Sound input and output
          + Basic signal processing
          + Dynamics processing (compressor, gating etc)
          + Filtering
          + Delay and reverberation
          + Distortion
          + Signal routing
     * Availability: 

        Full documentation on the WWW:
                http://www.music.helsinki.fi/research/spkit/documentation/SPKi
                t.html

        Software distribution:
                http://www.music.helsinki.fi/research/spkit/distribution/spkit
                .tar.Z

     * Contact: Kai Lassfolk
       University of Helsinki Music Research Laboratory
       Email: spkit@elisir.helsinki.fi



Auditory Modeller 1

     * Description: John Holdsworth's implementation of a gammatone filter
       bank and Roy Patterson's spiral model, in C (with X-window display).
     * Availability: By anonymous ftp from

                 ftp://ftp.mrc-apu.cam.ac.uk/pub/aim



Auditory Modeller 2

     * Description:Lowel O'Mard's implementation of peripheral filtering, Ray
       Meddis's hair cell model and other stuff in C (as a library of
       routines).
     * Availability: By anonymous ftp from

                 ftp://suna.lut.ac.uk/public/hulpo/lutear 



Auditory Toolbox for Matlab

     * Description: This toolbox provides extensions to Matlab which are
       useful to people interested in auditory/cochlear modeling. [Matlab is
       described is the previous section.] This toolbox has been tested on
       both Macintosh and Unix computers. It includes the following major
       models:
          + Lyon's Passive Long Wave Cochlear Model (our conventional model)
          + Patterson-Holdsworth ERB Filter bank with Meddis Hair cell
          + Seneff's Auditory Model (Stages I and II)
          + MFCC (Mel-scale frequency cepstral coefficients from the ASR
            world)
          + Spectrogram
          + Correlogram generation and pitch modeling
          + Simple vowel synthesis
     * Availability: By anonymous FTP from the following site:

                 ftp://ftp.apple.com/pub/malcolm

   The following files are available:
          + AuditoryToolbox.mif.Z
          + AuditoryToolbox.psc.Z
          + AuditoryToolbox.sea.hqx
          + AuditoryToolbox.tar
          + AuditoryToolbox.tar.Z
   The ".mif.Z" file is a Unix compressed version of the FrameMaker
       documentation. The ".psc.Z" file is a Unix compressed version of the
       Postscript documentation. The ".tar" and ".tar.Z" files are Unix TAR
       archives containing all of the m-functions and C-MEX source code.
       Finally, the ".sea.hqx" file is a Macintosh self-extracting archive
       that has been encoded using BinHex. There is precompiled version of the
       three MEX function for the Macintosh.
     * Misc: Our lawyers ask you to remind you that there is no warranty.
       We've done some testing but we undoubtably missed things.
     * Contact:

    Malcolm Slaney: Interval Resarch.
    Email: malcolm@interval.com



Human Audio Perception Document

     * Description: Document prepared by Argiris Kranidiotis on the human
       audio perception system. It lists a number of references, gives plenty
       of numbers and some equations.
     * Availability: by anonymous ftp from the comp.speech archive site

                ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPercep
                tion 

     * Contact: Argiris A. Kranidiotis
       University Of Athens, Informatics Department
       email: akra@zeus.di.uoa.ariadne-t.gr



BEEP dictionary

     * Description: Phonemic transcriptions of 150,000 English words. (British
       English pronunciations)
     * Availability: By anonymous ftp from the file

        BEEP dictionary README file
                svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.6.README

        BEEP Dictionary (1.1M)
                svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.6.tar.gz



CMU dictionary

     * Description: Phonemic transcriptions of 100,000 words with American
       English pronunciation.
     * Availability: By anonymous ftp from the directory

                 ftp://ftp.cs.cmu.edu/project/fgdata/dict

   with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1



CUVOLAD dictionary

     * Description: Computer Usable Version of the Oxford Advanced Learner's
       Dictionary. Has British English pronunciations and parts of speech.
     * Availability: By anonymous ftp from the directory

                 ftp://black.ox.ac.uk/ota/dicts/710



Dictionary

     * Description: A comprehensive word list which should contain most common
       American words, abbreviations, hyphenations, and even incorrect
       spellings. The word lists were compiled from a number of sources:
       commercial news services, UseNet news postings, existing dictionaries,
       name lists, company lists, UNIX man pages, project Gutenberg's E-texts,
       project Wordnet, received mailings, etc. The current size is 460,000
       words.
     * Availability: By anonymous ftp from

                 ftp://wocket.vantage.gte.com/pub/standard_dictionary

       Note 1: There seems to be some sort of network problem reaching the
       server.
       Note 2: There is a README file which explains the file formats.



Homophone List

     * A list of homophones in General American English is available by
       anonymous FTP from the comp.speech archive site:

                ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homophone
                s-1.01.txt



MRC database

     * Description: The Medical Research Council Psycholinguistic Database.
       Has British English pronunciations, parts of speech, word frequency and
       lots of other information.
     * Availability: By anonymous ftp from the directory

                 ftp://black.ox.ac.uk/ota/dicts/1054



Dictionaries on the WWW

   For a while, there was a range of dictionaries and other lexical resources
   on the WWW and elsewhere on the Internet. However, due to copyright
   reasons, fewer sites are publishing dictionary information. When last
   checked, the following sites provide dictionaries or links to dictionaries
   on the net:
     * A comprehensive list of dictionaries, acronym lists, translation
       resources, and a Thesaurus.

                http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary
                -Information/Dictionaries-etc.html

     * Webster's dictionary online

                http://c.gp.cs.cmu.edu:5103/prog/webster


___________________________________________________________________________

   Copyright (c) 1995 by Andrew Hunt, all rights reserved.
   This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
   long as it is posted in its entirety and includes this copyright statement.

   This FAQ may not be distributed for financial gain.
   This FAQ may not be included in any collections or compilations
   without express permission from the author.



 ---

Andrew Hunt
ATR Interpreting Telecommunications Research Labs
Hikari-dai 2-2, Seika-cho, Kyoto, 619-02, Japan
Tel: +81-774-95 1390   Fax: +81-774-95 1308
Email: andrew@itl.atr.co.jp

----------------------------------------------------------------------

Path: news1.ucsd.edu!ihnp4.ucsd.edu!swrinde!newsfeed.internetmci.com!news.kei.com!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv
From: andrew@itl.atr.co.jp (Andrew Hunt)
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 2/3
Supersedes: <comp-speech-faq/part2_817055504@rtfm.mit.edu>
Followup-To: comp.speech
Date: 22 Dec 1995 14:10:45 GMT
Organization: ATR International, Japan
Lines: 1185
Approved: news-answers-request@MIT.Edu
Expires: 2 Feb 1996 14:10:32 GMT
Message-ID: <comp-speech-faq/part2_819641432@rtfm.mit.edu>
References: <comp-speech-faq/part1_819641432@rtfm.mit.edu>
Reply-To: andrew@itl.atr.co.jp (Andrew Hunt)
NNTP-Posting-Host: bloom-picayune.mit.edu
Summary: Information on Speech Technology
X-Last-Updated: 1995/12/19
Originator: faqserv@bloom-picayune.MIT.EDU
Xref: news1.ucsd.edu comp.speech:6603 comp.answers:13225 news.answers:51626

Archive-name: comp-speech-faq/part2
Last-modified: 1995/12/19
URL: http://www.speech.su.oz.au/comp.speech/


                   COMP.SPEECH FAQ POSTING - PART 2/3


[Note: this document has been automatically extracted from a WWW site:
        http://www.speech.su.oz.au/comp.speech
This may introduce some formatting errors.]


                    FAQ SECTION 2 - SIGNAL PROCESSING FOR SPEECH

          * Q2.1: What sampling do I need for speech?
          * Q2.2: Finding the pitch of a speech signal
          * Q2.3: How do I find the start and end points of a speech signal?
          * Q2.4: Where can I find FFT software?
          * Q2.5: Signal processing in speech technology
          * Q2.6: Speech sampling and signal processing hardware
          * Q2.7: How do I convert to/from mu-law format?


___________________________________________________________________________

                  Q2.1: WHAT SAMPLING DO I NEED FOR SPEECH?

   For recorded speech to be understood by humans you need an 8kHz sampling
   rate or more and at least 8 bit sampling. This produces poor quality speech
   - but in can be understood.

   Improvements can be achieved by increasing the number of bits in sampling
   to 12bits or 16bits, or by using a non-linear encoding technique such as
   mu-law or A-law (see Q2.7). This improves the "signal-to-noise" ratio.

   Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
   improves the frequency response: the higher the sampling frequency the
   better the high frequency content will be. A 16kHz sampling rate is a
   reasonable target for high quality speech recording and playback.

   When doing speech recognition you need to remember that the your computer
   is not as good as your ear so it will have trouble with poor quality
   sounds. The choice of an appropriate sampling setup depends very much on
   the speech recognition task and the amount of computer power available.



                  Q2.2: FINDING THE PITCH OF A SPEECH SIGNAL

   This topic comes up regularly in the comp.dsp newsgroup. Question 2.5 of
   the FAQ posting for comp.dsp gives a comprehensive list of references on
   the definition, perception and processing of pitch. The comp.dsp FAQ
   posting is posted regularly to the comp.dsp newsgroup, and is also
   available by ftp and on the WWW:
     * http://www.bdti.com/dsp_faq.htm
     * ftp://rtfm.mit.edu/pub/usenet/comp.dsp/



       Q2.3: HOW DO I FIND THE START AND END POINTS OF A SPEECH SIGNAL?

   A large number of papers have been presented on this task. Try the
   following papers:
     * Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints of
       Isolated Utterances", Bell System Technical Journal, Vol 54, No. 2, pp
       297-315, 1975.
     * Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans on
       Communications, Vol 26, No 1, Jan 78, pp. 140-145.
     * Newman, W.C. "Detecting Speech with an Adapative Neural Network."
       Electronic Design. 22 March 1990.
     * Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE Proc.
       Sci. Meas. Technol., Vol 141, No.3, May 1994 pp153-159.



                     Q2.4: WHERE CAN I FIND FFT SOFTWARE?

   The most comprehensive list of FFT I know of is available on the WWW. It
   contains links to about 65 different pieces of one-dimensional FFT code.

          http://tjev.tel.etf.hr/josip/DSP/fft.html

   You might also try the following file available by anonymous ftp. It
   contains a series of optimised fft routines, including mixed-radix
   algorithms.

           ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz



                 Q2.5: SIGNAL PROCESSING IN SPEECH TECHNOLOGY

   This question is far to big to be answered in a FAQ posting. Here are some
   WWW resources and books which cover the area well.

   Tony Robinson has put his Speech Analysis course notes on the web. The root
   page is http://svr-www.eng.cam.ac.uk/~ajr/SA95. There is information on the
   following:
     * Sampling theory
     * Filter bank analysis
     * Short-term fourier analysis
     * Linear prediction analysis
     * Formant analysis and voicing analysis
     * Speech coding
     * and more....

   The Signal Processing Home page has information on a range of DSP issues.
   It includes references to a range of software and much more. (Note: the
   page is in Croatia and is quite slow.)

          http://tjev.tel.etf.hr/josip/DSP/sigproc.html

   There are many good books which discuss signal processing for speech:
     * Digital processing of speech signals; L. R. Rabiner, R. W. Schafer.
       Englewood Cliffs; London: Prentice-Hall, 1978
     * Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill 1986
     * Computer Speech Processing; ed Frank Fallside, William A. Woods
       Englewood Cliffs: Prentice-Hall, c1985
     * Digital speech processing : speech coding, synthesis, and recognition
       edited by A. Nejat Ince; Kluwer Academic Publishers, Boston, c1992
     * Speech science and technology; edited by Shuzo Saito pub. Ohmsha,
       Tokyo, c1992
     * Speech analysis; edited by Ronald W. Schafer, John D. Markel, New York,
       IEEE Press, c1979
     * Speech Communication: Human and Machine Douglas O'Shaughnessy, Addison
       Wesley series in Electrical Engineering: Digital Signal Processing,
       1987.
     * Discrete-time processing of speech signals; John R Deller, John G
       Proakis, John H L Hansen; Macmillan 1993.
     * Signal processing of speech; F J Owens; Macmillan 1993.



             Q2.6: SPEECH SAMPLING AND SIGNAL PROCESSING HARDWARE

   In addition to the following information, have a look at the Audio File
   format document prepared by Guido van Rossum (see details in Section 1.8).

   Information is included on hardware for the following systems:

          * Macintosh Audio Hardware
          * PC Audio Hardware
          * Unix Audio Hardware

   Can anyone provide information for SGI, NeXT, other UNIX hardware and any
   other PC soundcards?



 Macintosh Audio Hardware - an overview

     * Description: ALL Macintosh computers come with the ability to play back
       sounds at any sample rate (sample rate conversion is done in software.)
       Older machines have 8 bit stereo output (hardware runs at 22254
       samples/second). The newer machines have 16 bit stereo hardare running
       at 44100 samples/second.

       Most of the recent Macintosh computers come with sound input hardware.
       There are probably exceptions to this, but the older and some of the
       current low-end machines have 8 bit (linear) mono hardware running at
       22254.54 samples/second. All of the PowerPC, AV, and the 500 series
       notebook computers come with 16 bit 44kHz stereo sampling hardware.
       They can also record at 22050 samples/second. The sound manager
       implements an AGC (Automatic Gain Control) function for the 8 bit
       hardware. The drivers have a switch to turn off the AGC.

       There are a number of DSP vendors that support high quality audio.
       Generally this means quieter analog sections, and more IO formats
       (AES/IBU, for example). Try DigiDesign and Spectral Innovations.

       The software drivers for sound are described in "Inside Macintosh:
       Sound". If you want to see some sample code check out the sources for
       the Matlab "Sound and Image Toolbox". They can be found at

                ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.hqx

       Routines that play and record sounds using the toolbox are included
       (and interfaced to Matlab).



 PC Audio Hardware

   Note: new soundcards are becoming available all the time - the information
   below is definately not up to date. Check out the following newsgroups for
   up-to-date information.
     * comp.sys.ibm.pc.soundcard
     * comp.sys.ibm.pc.soundcard.GUS
     * comp.sys.ibm.pc.soundcard.advocacy
     * comp.sys.ibm.pc.soundcard.games
     * comp.sys.ibm.pc.soundcard.misc
     * comp.sys.ibm.pc.soundcard.music
     * comp.sys.ibm.pc.soundcard.tech

   An excellent sources of programs and information for soundcards is
   available on SimTel:

          http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

   Additional information on PC soundcards is available by anonymous ftp from:

          ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_S
          oundcard_FAQ_v1.05

          ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Aria_S
          oundcard_Support_List_v2.09

          ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Midi_f
          iles_software_archives_on_the_Internet

          ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/Turtle
          _Beach_sound_cards_FAQ

    IBM RS/6000 ACPA (Audio Capture and Playback Adapter)
     * Description: The card supports PCM, Mu-Law, A-Law and ADPCM at 44.1kHz
       (& 22.05, 11.025, 8kHz) with 16-bits of resolution in stereo. The card
       has a built-in DSP (don't know which one). The device also supports
       various formats for the output data, like big-endian, twos complement,
       etc. Good noise immunity.

       The card is used for IBM's VoiceServer (they use the DSP for speech
       recognition). Apparently, the IBM voiceserver has a speaker-independent
       vocabulary of over 20,000 words and each ACPA can support two
       independent sessions at once.
     * Cost: $US495
     * Contact: ?

    Sound Galaxy NX , Aztech Systems
     * Platform: PC - DOS,Windows 3.1
     * Cost: ?
     * Input: 8bit linear, 4-22 kHz.
     * Output: 8bit linear, 4-44.1 kHz
     * Misc: 11-voice FM Music Synthesizer YM3812; Built-in power amplifier;
       DSP signal processing support - ST70019SB, Hardware ADPCM decompression
       (2:1,3:1,4:1) "AdLib" and "Sound Blaster" compatbility.

    Dicon DSProto
     * Description: DSP/PC card (ISA bus) with TI TMS320C31 (40 or 50MHz),
       32Kx32 zero wait state SRAM, external bus and serial port. Provided
       with C3X assembler/linker and PC/DSP Utility Program (C/DSP code
       library) which include routines for FFTs, IIRs, FIRs and Mu-law (CELP
       and JPEG also - but licensed).
     * Cost: $US419.95 industry, $US399.95 education
     * See also: DSProto Codec below
     * Contact: 

    Dicon Lab
    1810 NW 23rd Blvd., Suite 164
    Gainesville, FL 32605
    phone: 904-372-6160 fax: 904-376-7215
    email: diconlab@aol.com

    Dicon DSProto Codec
     * Platform: PC
     * Description: External board which attaches to the DSProto serial port.
       16 bit, dual-channel, 7.35-44.1kHz sampling A/D and D/A. Includes
       drivers for DSProto and a demo program which echo, bass and LPF
       effects.
     * Cost: $US159.95 industry, $US149.95 education
     * See also: DSProto above
     * Contact: 

    Dicon Lab
    1810 NW 23rd Blvd., Suite 164
    Gainesville, FL 32605
    phone: 904-372-6160 fax: 904-376-7215
    email: diconlab@aol.com

    Sound Galaxy NX PRO, Aztech Systems
     * Platform: PC - DOS,Windows 3.1
     * Cost: ?
     * Input: 2 * 8bit linear, 4-22.05 kHz(stereo), 4-44.1 KHz(mono).
     * Output: 2 * 8bit linear, 4-44.1 kHz(stereo/mono)
     * Misc: 20-voice FM Music Synthesizer; Built-in power amplifier; Stereo
       Digital/Analog Mixer; Configuration in EEPROM. Hardware ADPCM
       decompression (2:1,3:1,4:1). Includes DSP signal processing support.
       "AdLib" and "Sound Blaster Pro II" compatybility. Software includes a
       simple Text-to-Speech program and Sampling laboratory for Windows 3.1:
       WinDAT.
     * Contact: USA (510)6238988

    ATI Stereo F/X Sound Board
     * Platform:PC XT or AT - DOS, Windows 3.0, 3.1
     * Cost: $120 Canadian
     * Description: Input - 8 bit ADC, 44.1 kHz mono, 22.05 kHz Stereo. Output
       - Dynamic range = 48 dB, 32 anti-aliasing filters. Adds Stereo effect
       to existing mono Adlib or Sound Blaster apps. 11-voice YAMAHA FM Music
       Synthesizer. Built-in 8 watt power amplifier, 4 watts per channel.
       Volume ctrl on rear. 2 Joystick input, software setup (no switches),
       software included. "AdLib" and "Sound Blaster" compatibility. DMA
       support for high speed digital audio. ADPCM decomp @ 4:1, 3:1, 2:1.
       Will play .WAV files. Optional MIDI I/O port $79. (MIDI IN, OUT, THRU,
       and sequencer).
     * Contact:

    ATI Technologies Inc.
    3761 Victoria Park Avenue, Scarborough, Ontario
    CANADA, M1W 3S2
    Ph: (416) 756-0711 Fax: (416) 756-0720
    BBS: (416) 764-9404 (9600 baud N.8.1)

    Ariel Signal Processors
     * Description: A range of signal I/O, A/D, D/A and DSP products are
       available. There are too many to list.
     * Contact:

    Ariel Corp.
    433 River Road, Highland Park, NJ 08904.
    Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124

    Other PC Sound Cards
============================================================================
sound          stereo/mono              compatible     included   voices
card           & sample rate            with           ports
============================================================================
Adlib Gold     stereo: 8-bit 44.1khz    Adlib ?        audio      20 (opl3)
1000                  16-bit 44.1khz                   in/out,    +2 digital
               mono: 8-bit 44.1khz                     mic in,    channels
                    16-bit 44.1khz                     joystick,
                                                       MIDI

Sound Blaster  mono: 8-bit 22.1khz      Adlib          audio       11 synth.
               FM synth with                           in/out,
               2 operators                             joystick,

Sound Blaster  stereo: 8-bit 22.05khz   Adlib          audio       22
Pro Basic      mono: 8-bit 44.1khz      Sound Blaster  in/out,
                                                       joystick,

Sound Blaster  stereo: 8-bit 22.05khz   Adlib          audio       11
Pro            mono: 8-bit 44.1khz      Sound Blaster  in/out
                                                       joystick,
                                                       MIDI, SCSI

Sound Blaster  stereo: 8-bit 4-44.1khz  Sound Blaster  audio       20
16 ASP         stereo: 16-bit 4-44.1khz                in/out,
                                                       joystick,
                                                       MIDI

Audio Port     mono: 8-bit 22.05khz     Adlib          audio       11
                                        Sound Blaster  in/out,
                                                       joystick

Pro Audio      stereo: 8-bit 44.1khz    Adlib          audio,      20
Spectrum +                              Pro Audio      in/out,
                                        Spectrum       joystick

Pro Audio      stereo: 16-bit 44.1khz   Adlib          audio       20
Spectrum 16                             Pro Audio      in/out,
                                        Spectrum       joystick,
                                        Sound Blaster  MIDI, SCSI

Thunder Board  stereo: 8-bit 22khz      Adlib          audio       11
                                        Sound Blaster  in/out,
                                                       joystick

Gravis         stereo: 8-bit 44.1khz    Adlib,         audio line  32 sampled
Ultrasound     mono: 8-bit 44.1khz      Sound Blaster  in/out,     32 synth.
                                                       amplified
                                                       out,
               (w/16-bit daughtercard)                 mic in, CD
               stereo: 16-bit 44.1khz                  audio in,
               mono: 16-bit 44.1khz                    daughterboard
                                                       ports (for
                                                       SCSI and
                                                       16-bit)

MultiSound     stereo: 16-bit 44.1kHz   Nothing        audio       32 sampled
               64x oversampling                        in/out,
                                                       joystick,
                                                       MIDI

=============================================================================



 Unix Audio Hardware

   Could someone please provide information on the audio capabilities of
   DECstations, SGI and other Unix platforms?

    Sun standard audio port: SPARC I & II
     * Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample rate.
       This provides telephone quality sampling.

    Sun DBRI audio port (SPARC 10 & 20)
     * Input and Output: Stereo (2 channels). 16-bit linear sampling. Multiple
       sample rates (48000, 44100, 37800, 32000, 22050, 18900, 16000, 11025,
       9600, 8000 Hz)

    Ariel Signal Processors
     * Platform: Various
     * Description: A range of signal I/O, A/D, D/A and DSP products are
       available. There are too many to list.
     * Contact:

    Ariel Corp.
    433 River Road, Highland Park, NJ 08904.
    Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124



                Q2.7: HOW DO I CONVERT TO/FROM MU-LAW FORMAT?

   Mu-law coding is a form of compression for audio signals including speech.
   It is widely used in the telecommunications field because it improves the
   signal-to-noise ratio without increasing the amount of data. Typically,
   mu-law compressed speech is carried in 8-bit samples. It is a companding
   technqiue. That means that carries more information about the smaller
   signals than about larger signals.

   On SUN Sparc systems have a look in the directory /usr/demo/SOUND. Included
   are table lookup macros for ulaw conversions. [Note however that not all
   systems will have /usr/demo/SOUND installed as it is optional - see your
   system admin if it is missing.]

   OR, here is some sample conversion code in C.
/**
 ** Signal conversion routines for use with Sun4/60 audio chip
 **/

#include stdio.h

unsigned char linear2ulaw(/* int */);
int ulaw2linear(/* unsigned char */);

/*
** This routine converts from linear to ulaw
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711  (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
**     Continuous PCM Companding Law," Villeret, Michel,
**     et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
**     1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
**     for Analog-to_Digital Conversion Techniques,"
**     17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/

#define ZEROTRAP    /* turn on the trap as per the MIL-STD */
#define BIAS 0x84   /* define the add-in bias for 16 bit samples */
#define CLIP 32635

unsigned char
linear2ulaw(sample)
int sample; {
  static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
                             4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
                             5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
                             5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
                             7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
  int sign, exponent, mantissa;
  unsigned char ulawbyte;

  /* Get the sample into sign-magnitude. */
  sign = (sample >> 8) & 0x80;          /* set aside the sign */
  if (sign != 0) sample = -sample;              /* get magnitude */
  if (sample > CLIP) sample = CLIP;             /* clip the magnitude */

  /* Convert from 16 bit linear to ulaw. */
  sample = sample + BIAS;
  exponent = exp_lut[(sample >> 7) & 0xFF];
  mantissa = (sample >> (exponent + 3)) & 0x0F;
  ulawbyte = ~(sign | (exponent << 4) | mantissa);
#ifdef ZEROTRAP
  if (ulawbyte == 0) ulawbyte = 0x02;   /* optional CCITT trap */
#endif

  return(ulawbyte);
}

/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711  (very difficult to follow)
** 2) MIL-STD-188-113,"Interoperability and Performance Standards
**     for Analog-to_Digital Conversion Techniques,"
**     17 February 1987
**
** Input: 8 bit ulaw sample
** Output: signed 16 bit linear sample
*/

int
ulaw2linear(ulawbyte)
unsigned char ulawbyte;
{
  static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
  int sign, exponent, mantissa, sample;

  ulawbyte = ~ulawbyte;
  sign = (ulawbyte & 0x80);
  exponent = (ulawbyte >> 4) & 0x07;
  mantissa = ulawbyte & 0x0F;
  sample = exp_lut[exponent] + (mantissa << (exponent + 3));
  if (sign != 0) sample = -sample;

  return(sample);
}


___________________________________________________________________________

                   FAQ SECTION 3 - SPEECH CODING AND COMPRESSION

          * Q3.1: Speech compression techniques
          * Q3.2: References on coding/compression
          * Q3.3: Compression and Coding Software



                     Q3.1: SPEECH COMPRESSION TECHNIQUES

   Note: the comp.compression FAQ includes a few questions and answers on the
   compression of speech.

   The aim of speech compression is to produce a compact representation of
   speech sounds such that when reconstructed it is perceived to be close to
   the original. The two main measures of closeness are intelligibility and
   naturalness.

   The standard reference point is toll quality speech, this is the same as
   what would be expected over a telephone line, for example, speech coded at
   8 kHz using 8 bit ulaw coding and a maximum frequency of about 3.3 kHz.
   This is a bit rate of 64 kbps, and as such represents a compressed form
   over (say) 16 bit, 16 kHz speech which is the standard in speech
   recognition work.

   ulaw coding does not exploit the (normally large) sample to sample
   correlations found in speech. ADPCM is the next family of speech coding
   techniques, and does exploit this redundancy by using a simple linear
   filter to predict the next sample of speech. The resulting prediction error
   is typically quantised to 4 bits thus giving a bit rate of 32 kbps (see,
   for example, the software in Q3.3: 32 kbps ADPCM, G.711/721/723
   Compression, shorten). The advantages of ADPCM are that is simple to
   implement and has very low delay.

   To obtain more compression specific properties of the speech signal must be
   modelling. The main assumption is known as the source filter model of
   speech production. This assumes that a source (voicing or fricative
   excitation) is passed through a filter (the vocal tract response) to
   produce the speech. The simplest implementation of this is known as a LPC
   synthesiser (e.g. LPC10e). At every frame the speech is analysed to compute
   the filter coefficients, the energy of the excitation, a voicing decision,
   and a pitch value if voiced. At the decoder a regular set of pulses for
   voiced speech or white noise for unvoiced speech is passed through the
   linear filter and multiplied by the gain to produce the speech. This is a
   very efficient system and typically produces speech coded at 1200-2400bps.
   With clever acoustic vector prediction this can be reduced to 300-600bps.
   The disadvantages are a loss of naturalness over most of the speech and
   occasionally a loss of intelligibility.

   The CELP family of coders compensates for the lack of quality of the simple
   LPC model by using more information in the excitation. Each of a set of
   codebook of excitation vectors is tried and the index of the one that best
   matches the original speech is transmitted. This results in an increase in
   the bit rate to typically 4800-9600bps. Most speech coding research is
   currently directed towards CELP coders. (See, for example, CELP 3.2a, a TMS
   implementation, a G.728 LD-CELP vocoder, and the L&H implementation.



                    Q3.2: REFERENCES ON CODING/COMPRESSION

   Tony Robinson's lecture notes on Speech Analysis have some coverage of
   speech coding (http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html).

   The following books cover speech coding/compression.
     * Douglas O'Shaughnessy, Speech Communication: Human and Machine, Addison
       Wesley series in Electrical Engineering: Digital Signal Processing,
       1987.
     * Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech
       Processing. London: Prentice/Hall International, 1985.
     * Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the IEEE
       63 (1975): 561 - 580.



                    Q3.3: COMPRESSION AND CODING SOFTWARE

   The following speech compression software is described in the FAQ.

          * 32 kbps ADPCM
          * CELP 3.2a & LPC
          * 8 Kbit/s CELP on the TMS320C5x family of DSP chips
          * File format conversion
          * G.711/721/723 Compression
          * G.728 LD-CELP vocoder
          * G.728 Compression
          * GSM 06.10 Compression
          * Lernout & Hauspie Speech Coding (5 products)
          * Lernout & Hauspie Speech Coding SDK
          * shorten - a lossless compressor for speech signals
          * TrueSpeech from DSP Group
          * U.S.F.S. 1016 CELP vocoder for DSP56001
          * ToolVox from Voxware



32 kbps ADPCM

     * Platform: SGI and Sun Sparcs
     * Description: 32 kbps ADPCM C-source code (G.721 compatibility is
       uncertain)
     * Contact: Jack Jansen
     * Availablity: Anoymous ftp

                 ftp://ftp.cwi.nl/pub/adpcm.shar



CELP 3.2a & LPC

     * Platform: Sun (the makefiles & source can be modified for other
       platforms)
     * Description: CELP is lossy compression technqiue. The U.S. DoD's
       Federal-Standard-1016 based 4800 bps code excited linear prediction
       voice coder version 3.2a (CELP 3.2a) Fortran and C simulation source
       codes. Available for worldwide distribution (on DOS diskettes, but
       configured to compile on Sun SPARC stations) from NTIS and DTIC.
       Example input and processed speech files are included. A Technical
       Information Bulletin (TIB), "Details to Assist in Implementation of
       Federal Standard 1016 CELP," and the official standard, "Federal
       Standard 1016, Telecommunications: Analog to Digital Conversion of
       Radio Voice by 4,800 bit/second Code Excited Linear Prediction (CELP),"
       are also available.
     * Availability 1: National Technical Information Service (NTIS)
       U.S. Department of Commerce
       5285 Port Royal Road, Springfield, VA 22161, USA

       The "AD" ordering number for the CELP software is AD M000 118 (US$
       90.00) and for the TIB it's AD A256 629 (US$ 17.50). The LPC-10
       standard, described below, is FIPS Pub 137 (US$ 12.50). There is a
       $3.00 shipping charge on all U.S. orders. The telephone number for
       their automated system is 703-487-4650, or 703-487-4600 if you'd prefer
       to talk with a real person.

       (U.S. DoD personnel and contractors can receive the package from the
       Defense Technical Information Center: DTIC, Building 5, Cameron
       Station, Alexandria, VA 22304-6145. Their telephone number is
       703-274-7633.)
     * Availability 2: By anonymous ftp from:

        From ftp.super.org
                ftp://ftp.super.org(192.31.192.1)/pub/celp_3.2a.tar.Z

        Or from the comp.speech ftp server
                ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.Z

                ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.g
                z

     * Misc: The following articles describe the Federal-Standard-1016
       4.8-kbps CELP coder (it's unnecessary to read more than one):
          + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
            "The Federal Standard 1016 4800 bps CELP Voice Coder," Digital
            Signal Processing, Academic Press, 1991, Vol. 1, No. 3, p.
            145-155.
          + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
            "The DoD 4.8 kbps Standard (Proposed Federal Standard 1016)," in
            Advances in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer
            Academic Publishers, 1991, Chapter 12, p. 121-133.
          + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
            "The Proposed Federal Standard 1016 4800 bps Voice Coder: CELP,"
            Speech Technology Magazine, April/May 1990, p. 58-64.

       The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400 bps
       linear prediction coder (LPC-10) was republished as a Federal
       Information Processing Standards Publication 137 (FIPS Pub 137). It is
       described in:
          + Thomas E. Tremain, "The Government Standard Linear Predictive
            Coding Algorithm: LPC-10," Speech Technology Magazine, April 1982,
            p. 40-49.

       There is also a section about FS-1015 in the book:
          + Panos E. Papamichalis, Practical Approaches to Speech Coding,
            Prentice-Hall, 1987.

       The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
       described in:
          + Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced
            Classification of Speech with Applications to the U.S. Government
            LPC-10E Algorithm," Proceedings of the IEEE Intl. Conf. on
            Acoustics, Speech, and Signal Processing, 1986, p. 473-6.
   Copies of the official standard, "Federal Standard 1016,
       Telecommunications: Analog to Digital Conversion of Radio Voice by
       4,800 bit/second Code Excited Linear Prediction (CELP)" are available
       for US$ 5.00 each from:
          + GSA Federal Supply Service Bureau
            Specification Section, Suite 8100
            470 E. L'Enfant Place, S.W.
            Washington, DC 20407
            (202)755-0325
   Realtime DSP code for FS-1015 and FS-1016 is sold by:
          + John DellaMorte, DSP Software Engineering
            165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA
            Ph: 1-617-275-3733 Fax: 1-617-275-4323
            Email: dspse.bedford@channel1.com
   DSP Software Engineering's FS-1016 code can run on a DSP Research's Tiger
       30 (a PC board with a TMS320C3x and analog interface suited to
       development work).
          + DSP Research
            1095 E. Duane Ave, Sunnyvale, CA 94086, USA
            Ph: (408)773-1042 Fax: (408)736-3451



8 Kbit/s CELP on the TMS320C5x family of DSP chips

     * Description: For low bandwidth transmission of voice, compact voice
       storage for archival purposes, low-cost digital answering machines and
       efficient storage for voice mail. Features :
          + near toll quality at 8 Kb/s.
          + Variable rate option with 1 Kb/s silence encoding.
          + Implemented on a fixed-point processor for lower system cost.
          + Attractive licensing scheme.
          + Future availability of 4 Kb/s.
          + Custom rates possible.
   Capacity :
          + Two half-duplex or one full duplex channels on the 20 MIPS 'C5x
            (at 95% and 55% CPU utilization respectively).
          + Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU
            utilization).
          + Requires 9 K-words program memory and 3 K-words data memory.
          + Decoding in real-time on a 486 class CPU.
     * Contact:

    CVI Inc.
    443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
    Tel: (604) 987 1719 Fax: (604) 986 8139
    Email: cvi@extropia.wimsey.com



File format conversion

     * Platform: SUN OS?
     * Description: Conversion utility able to encode and decode between the
       the following formats: G.723, G.721, A-law, u-law and linear.
     * Availability: By anonymous ftp from

                 ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z



G.711/721/723 Compression

     * Description:
          + G.711 : CCITT u-law and A-law compression
          + G.721 : CCITT 32 kbps ADPCM coder
          + G.723 : CCITT 24 kbps and 40 kbps ADPCM coders
     * Availability: By email to itudoc@itu.ch, with
                GET ITU-3022
   as the *only* line in the body of the message.
       It is also available by anonymous ftp from:

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G721_G
                723.tar.Z



G.728 LD-CELP vocoder

     * Platform: Analog Devices ADSP-2171
     * Description: Real-time, full-duplex G.728 LD-CELP vocoder that runs on
       a single Analog Devices ADSP-2171. Source and object code available for
       a one-time license fee.
     * Contact:

    Cole Erskine
    Analogical Systems
    299 California Avenue, Suite 120
    Palo Alto, CA 94306, USA
    Tel:(415) 323-3232 FAX:(415) 323-4222
    email: cole@analogical.com



G.728 Compression

     * Description: G.728 low delay celp package written by Alex Zatsman of
       Analog Devices, Inc.
     * Availability: By anonymous ftp from

                 ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz



GSM 06.10 Compression

     * Platform: Unix; faster than real time on most Sun SPARCstations
     * Description: GSM 06.10 is a standardized lossy speech compression
       employed by most European wireless telephones. It uses RPE/LTP
       (residual pulse excitation/long term prediction) coding to compress
       frames of 160 13-bit samples (8 kHz sampling rate, i.e. a frame rate of
       50 Hz) into 260 bits.
     * Contact: GSM 06.10 support and implementation jutta@cs.tu-berlin.de,
       cabo@cs.tu-berlin.de
     * Availability: The following configurations are available be anonymous
       ftp:

                 gzip compression from Germany:
                ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.7
                .tar.gz

                 MS-DOS compression from Germany:
                ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ddj/gsm-1
                07.zip

                 MS-DOS compression from USA:
                ftp://ftp.mv.com/pub/ddj/1194.12/gsm-105.zip

     * Misc: The WWW site is

                http://www.cs.tu-berlin.de/~jutta/toast.html



Lernout & Hauspie Speech and Music Coding Product Range

     * Product name: L&H.smc650: 32kbps ADPCM Speech coding
          + Implementation of ADPCM 32 kbps based on CCITT G721 standard.
          + Estimated quality: 4.1 MOS (Mean Opinion Score)
          + Hardware Example: Analog Devices ADSP2101
          + Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
            signal with up to 16 bits per sample; 8 kHz sampling rate
     * Product name: L&H.smc550: LD-CELP 16 kbps speech coding
          + Proprietary implementation of LD-CELP 16 kbps based on CCITT G728
            standard.
          + Estimated quality: 4.0 MOS (Mean Opinion Score)
          + Hardware Example: Motorola 5600X
          + Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
            signal with up to 16 bits per sample; 8 kHz sampling rate
     * Product name: L&H.smc450: 16-17.5 kbps speech coding
          + Estimated Quality: 3.9 MOS (Mean Opinion Score)
          + Hardware Examples: Analog Devices ADSP2101, Intel 486 DX2/66 MHz
          + Input / Output Signal: A-Law or mu-Law PCM (64 kbps); Linear
            signal with up to 16 bits per sample; 8 kHz sampling rate.
     * Product name: L&H.smc350: 4.8-9.6 kbps speech coding
          + Proprietary CELP based software for compression rates of 4.8 kbps
            to 9.6 kbps
          + Estimated Quality: 3.5 MOS (Mean Opinion Score)
          + Hardware Examples: AT&T DSP32C
          + Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
            signal with up to 16 bits per sample; 8 kHz or 11.025kHz sampling
            rate.
     * Product name: L&H.smc250: 2.4 kbps speech coding
          + Combination of multi band excitation and code book excited linear
            prediction.
          + Estimated Quality: 3.0 MOS (Mean Opinion Score).
          + Hardware Examples: Intel 486 DX2/66 MHz, Analog Devices ADSP2101
          + Input signal: A-Law or mu-Law PCM (64 kbps); Linear signal with
            12-15 bits per sample; 8 kHz sampling rate.
          + Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal with
            12-15 bits per sample; 8 kHz sampling rate.
     * See also: L&H Speech Coding SDK
     * Cost: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com



Lernout & Hauspie Speech Coding SDK

     * Description: Windows based software development kit for integrating
       speech coding technology with Windows based PC applications.
     * Requirements: IBM-compatible 486 DX/33 MHz + 2MB RAM + MS DOS 5.0 + MS
       Windows 3.1 (or higher) + Sound Blaster compatible sound board.
     * See also: L&H Speech Coding Products
     * Cost: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com



shorten - a lossless compressor for speech signals

     * Platform: UNIX/DOS
     * Description: A fast waveform coder suitable for a speech and music
       signals in a wide variety of file formats. The degree of compression is
       adjustable from lossless to three bits a sample. 16bit 16kHz speech
       generally attains 50% lossless compression and 16:3 compression of
       CDROM quality speech is obtainable with only minor audiable
       degredation.
     * Availability: Anonymous ftp - UNIX and DOS versions

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar
                .gz

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.tar
                .Z

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorten.zip



TrueSpeech from DSP Group

     * Description: TrueSpeech is a family of speech compression and
       decompression algorithms and software. It is designed for personal
       computers and personal communications devices. With the high
       compression ratios ranging from 15:1 to 27:1, TrueSpeech improves the
       storage and communications transmission of digital voice information
       and can be used in the integration of personal computers and
       telephones. TrueSpeech can be utilized in many products and
       applications such as:
          + Multimedia PCs
          + Sound cards and modems
          + Computer/telephony and teleconferencing
          + Voice mail systems and PBX systems
          + Wireless/cellular applications
          + Personal digital assistants
          + Games, Education
          + Video/cable and on-line services
   The TrueSpeech encoder is available for free in the Sound System of Windows
       95 and Windows NT. The DSPG WWW pages have information on how to add
       TrueSpeech capability to your WWW pages.
     * Contact: DSP Group, Inc.
       3120 Scott Boulevard, Santa Clara, CA 95054-3317, USA
       Phone: (408) 986-4300 Fax: (408) 986-4323
       Email: Webster@dspg.com
       WWW: http://www.dspg.com/index.html



U.S.F.S. 1016 CELP vocoder for DSP56001

     * Platform: DSP56001
     * Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a single
       27MHz Motorola DSP56001. Free demo software available for PC-56 and
       PC-56D. Source and object code available for a one-time license fee.
     * Contact:

    Cole Erskine
    Analogical Systems
    299 California Avenue, Suite 120
    Palo Alto, CA 94306, USA
    Tel:(415) 323-3232 FAX:(415) 323-4222
    Email: cole@analogical.com



ToolVox from Voxware

     * Platform: Windows and soon available on Mac (in Beta now) and Unix
     * Description: ToolVox is a proprietary frequency domain speech coder. 11
       KHz speech is coded to an average rate of between 5,000 bits per second
       and 9,000 bps. Real-time compression algorithms available for 2,400
       bps. 22 KHz playback, as well as a ultra low bit rate 8 KHz codec are
       coming soon. On playback, the time scale can be changed by a 5x factor,
       pitch can be modified over a 3 octave range, and vocal personality can
       be modified using a tranformation function called VoiceFonts(tm).
     * Misc 1: A SDK for Windows is available.
     * Misc 2: Demo software is available from the Voxware Inc WWW page:
       http://www.voxware.com/
     * Price: Basic toolkit is $895 US. OEM and mass distribution licenses are
       separate. Ordering information is provided on the Voxware WWW server.
     * Contact:

    Voxware, Inc.
    Ph: (609) 497-1212 Fax: (609) 497-2490
    Sale information: sales@voxware.com
    WWW: http://www.voxware.com/


___________________________________________________________________________

                    FAQ SECTION 4 - NATURAL LANGUAGE PROCESSING

   There is now a newsgroup specifically for Natural Language Processing;
   comp.ai.nat-lang. A FAQ posting is available for the group:

          ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language_Pro
          cessing_FAQ

   There is also a lot of useful information on Natural Language Processing in
   the comp.ai FAQ. That FAQ lists available software and useful references.
   It includes a substantial list of software, documentation and other info
   available by ftp.

   The FAQ has information on the following:

          * Q4.1: NLP References and Books
          * Q4.2: NLP Software



                        Q4.1: NLP REFERENCES AND BOOKS

   Take a look at the FAQ for the "comp.ai" newsgroup as it also includes some
   useful references.
     * James Allen: Natural Language Understanding, (Benjamin/Cummings Series
       in Computer Science) Menlo Park: Benjamin/Cummings Publishing Company,
       1987.
          + This book consists of four parts: syntactic processing, semantic
            interpretation, context and world knowledge, and response
            generation.
     * G. Gazdar and C. Mellish, Natural Language Processing in Prolog,
       Addison Wesley, 1989
     * G. Gazdar and C. Mellish, Natural Language Processing in Lisp, Addison
       Wesley, 1989
     * G. Gazdar and C. Mellish, Natural Language Processing in Pop11, Addison
       Wesley, 1989
          + Emphasis on parsing, especially unification-based parsing, lots of
            details on the lexicon, feature propagation, etc. Fair coverage of
            semantic interpretation, inference in natural language processing,
            and pragmatics; much less extensive than in Allen's book, but more
            formal. There are three versions, one for each programming
            language listed above, with complete code.
     * Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1 and
       2. New York: John Wiley & Sons, 1990.
          + There are articles on the different areas of natural language
            processing which also give additional references.
     * Paris, Ce'cile L.; Swartout, William R.; Mann, William C.: Natural
       Language Generation in Artificial Intelligence and Computational
       Linguistics. Boston: Kluwer Academic Publishers, 1991.
          + The book describes the most current research developments in
            natural language generation and all aspects of the generation
            process are discussed. The book is comprised of three sections:
            one on text planning, one on lexical choice, and one on grammar.
     * Readings in Natural Language Processing, ed by B. Grosz, K. Sparck
       Jones and B. Webber, Morgan Kaufmann, 1986
          + A collection of classic papers on Natural Language Processing.
            Fairly complete at the time the book came out (1986) but now
            seriously out of date. Still useful for ATN's, etc.
     * Klaus K. Obermeier, Natural Language Processing Technologies in
       Artificial Intelligence: The Science and Industry Perspective, Ellis
       Horwood Ltd, John Wiley & Sons, Chichester, England, 1989.

   The following are extensive bibliographies related to NLP:
     * Computational Parsing : Syntactic Analysis, Semantic Analysis, Semantic
       Interpretation, Parsing Algorithms, Parsing Strategies : BIBLIOGRAPHY,
       by Conrad F. Sabourin 1994, 2 volumes, 1029p, ISBN 2-921173-02-6,
       INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
     * Computational Text Understanding : Natural Language Programming,
       Argument Analysis : BIBLIOGRAPHY, by Conrad F. Sabourin 1994, 657p,
       ISBN 2-921173-06-9, INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal,
       H3X 3T4, Canada.
     * Computational Text Generation : Generation from data or Linguistic
       Structure, Text Planning, Sentence Generation, Explanation Generation
       : BIBLIOGRAPHY, by Conrad F. Sabourin with a survey article by Mark T.
       Maybury 1994, 649p, ISBN 2-921173-07-7, INFOLINGUA inc., P.O. Box 187
       Snowdon, Montreal, H3X 3T4, Canada.
     * Natural Language Processing : Interfaces to Databases, to Expert
       Systems, to Robots, to Operating Systems, and to Question-Answering
       Systems : BIBLIOGRAPHY, by Conrad F. Sabourin, 1994, 2 volumes, 847p,
       ISBN 2-921173-08-5 INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal, H3X
       3T4, Canada

Journals

   The major journals of the field are
     * Computational Linguistics and Cognitive Science for the artificial
       intelligence aspects,
     * Cognition for the psychological aspects,
     * Language and Linguistics and Philosophy and Linguistic Inquiry for the
       linguistic aspects.
     * Artificial Intelligence occasionally has papers on natural language
       processing.

Conferences

   The major conferences of the field are
     * ACL (held every year)
     * COLING (held every two years). Most AI conferences have a NLP track;
       AAAI, ECAI, IJCAI and the Cognitive Science Society conferences usually
       are the most interesting for NLP. CUNY is an important psycholinguistic
       conference. There are lots of linguistic conferences: the most
       important seem to be NELS, the conference of the Chicago Linguistic
       Society (CLS), WCCFL, LSA, the Amsterdam Colloquium, and SALT.



                              Q4.2: NLP SOFTWARE

Natural Language Software Registry (NLSR) - NLP Tools

     * The Natural Language Software Registry is available from the German
       Research Institute for Artificial Intelligence (DFKI) in Saarbrucken.
       Its purpose is to facilitate the exchange and evaluation of natural
       language processing software within the research community. To this
       end, the NLSR is cataloging natural language software projects, both
       commercial and non- commercial. The new updated and enlarged version
       contains more than 100 descriptions of natural processing software.
       Registry listings include:
          + speech signal processors, such as the Computerized Speech Lab (Kay
            Elemetrics)
          + morphological analyzers, such as PC-KIMMO (Summer Institute for
            Linguistics)
          + parsers, such as Alveytools (University of Edinburgh)
          + semantic and pragmatic analyzer, such as NLL (University of the
            Saarland, Germany)
          + generation programs, such as FUF (Ben Gurion University of the
            Negev)
          + knowledge representation systems, such as Rhet (University of
            Rochester)
          + multicomponent systems, such as ELU (ISSCO), PENMAN (ISI), Pundit
            (UNISYS), SNePS (SUNY Buffalo),
          + NLP-Tools, such as GULP (University of Georgia) or Linguist
            (Kansai Research Laboratory)
          + applications programs (misc.)
     * If you have developed a piece of software for natural language
       processing that other researchers might find useful, you can include it
       by returning the questionnaire available from the sources below.
     *  ftp://ftp.dfki.uni-sb.de/pub/registry
     * e-mail: registry@dfki.uni-sb.de
     * post:

    Natural Language Software Registry
    Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
    Stuhlsatzenhausweg 3
    D-66123 Saarbruecken
    Germany
     * Other ftp sites are

         ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy

         ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry 

Part of Speech Tagger

     * Description: A rule-based part pf speech tagger developed by Eric
       Brill. For a detailed description of the tagger see chapter 6 of his
       thesis.
     * Availability: The tagger and description are available by anonymous ftp
       from

         ftp://lightning.lcs.mit.edu/pub/BRILL/Programs & Papers 


___________________________________________________________________________

   Copyright (c) 1995 by Andrew Hunt, all rights reserved.
   This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
   long as it is posted in its entirety and includes this copyright statement.

   This FAQ may not be distributed for financial gain.
   This FAQ may not be included in any collections or compilations
   without express permission from the author.



 ---

Andrew Hunt
ATR Interpreting Telecommunications Research Labs
Hikari-dai 2-2, Seika-cho, Kyoto, 619-02, Japan
Tel: +81-774-95 1390   Fax: +81-774-95 1308
Email: andrew@itl.atr.co.jp

----------------------------------------------------------------------

Path: news1.ucsd.edu!ihnp4.ucsd.edu!swrinde!newsfeed.internetmci.com!news.kei.com!bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv
From: andrew@itl.atr.co.jp (Andrew Hunt)
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 3/3
Supersedes: <comp-speech-faq/part3_817055504@rtfm.mit.edu>
Followup-To: comp.speech
Date: 22 Dec 1995 14:10:49 GMT
Organization: ATR International, Japan
Lines: 2799
Approved: news-answers-request@MIT.Edu
Expires: 2 Feb 1996 14:10:32 GMT
Message-ID: <comp-speech-faq/part3_819641432@rtfm.mit.edu>
References: <comp-speech-faq/part1_819641432@rtfm.mit.edu>
Reply-To: andrew@itl.atr.co.jp (Andrew Hunt)
NNTP-Posting-Host: bloom-picayune.mit.edu
Summary: Information on Speech Technology
X-Last-Updated: 1995/12/19
Originator: faqserv@bloom-picayune.MIT.EDU
Xref: news1.ucsd.edu comp.speech:6604 comp.answers:13226 news.answers:51628

Archive-name: comp-speech-faq/part3
Last-modified: 1995/12/19
URL: http://www.speech.su.oz.au/comp.speech/


                   COMP.SPEECH FAQ POSTING - PART 3/3


[Note: this document has been automatically extracted from a WWW site:
        http://www.speech.su.oz.au/comp.speech
This may introduce some formatting errors.]


                          FAQ SECTION 5 - SPEECH SYNTHESIS

          * Q5.1: What is speech synthesis?
          * Q5.2: How can speech synthesis be performed?
          * Q5.3: References/Books on Synthesis
          * Q5.4: Speech Synthesis on the WWW
          * Q5.5: Speech Synthesis Software/Hardware



                          Q5.1: WHAT IS SPEECH SYNTHESIS?

   Speech synthesis is the task of transforming written input to spoken
   output. The input can either be provided in a graphemic/orthographic or a
   phonemic script, depending on its source.

   Could someone provide a more informative description?



                         Q5.2: PERFORMING SPEECH SYNTHESIS

   There are several algorithms. The choice depends on the task they're used
   for. The easiest way is to just record the voice of a person speaking the
   desired phrases. This is useful if only a restricted volume of phrases and
   sentences is used, e.g. messages in a train station, or schedule
   information via phone. The quality depends on the way recording is done.

   More sophisticated but worse in quality are algorithms which split the
   speech into smaller pieces. The smaller those units are, the less are they
   in number, but the quality also decreases. An often used unit is the
   phoneme, the smallest linguistic unit. Depending on the language used there
   are about 35-50 phonemes in western European languages, i.e. there are
   35-50 single recordings. The problem is combining them as fluent speech
   requires fluent transitions between the elements. The intellegibility is
   therefore lower, but the memory required is small.

   A solution to this dilemma is using diphones. Instead of splitting at the
   transitions, the cut is done at the center of the phonemes, leaving the
   transitions themselves intact. This gives about 400 elements (20*20) and
   the quality increases.

   The longer the units become, the more elements are there, but the quality
   increases along with the memory required. Other units which are widely used
   are half-syllables, syllables, words, or combinations of them, e.g. word
   stems and inflectional endings.



                        Q5.3: REFERENCES/BOOKS ON SYNTHESIS

   The following are good introductory books/articles.
     * Douglas O'Shaughnessy, Speech Communication: Human and Machine Addison
       Wesley series in Electrical Engineering: Digital Signal Processing,
       1987.
     * D. H. Klatt, "Review of Text-To-Speech Conversion for English", Jnl. of
       the Acoustic Society of America (JASA), v82, Sept. 1987, pp 737-793.
     * "Talking Machines, Theories, Models and Designs" Eds, G. Bailly & C.
       Benoit (Elsevier: North Holland)
     * I. H. Witten. Principles of Computer Speech. (London: Academic Press,
       Inc., 1982).
     * John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech:
       The MITalk System", Cambridge University Press, 1987.

   The following book is a comprehensive bibliography of speech processing.
     * Computational Speech Processing: Speech Analysis, Recognition,
       Understanding, Compression, Transmission, Coding, Synthesis ; Text to
       Speech Systems, Speech to Tactile Displays, Speaker Identification,
       Prosody Processing : BIBLIOGRAPHY, by Conrad F. Sabourin, 1994, 2
       volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187
       Snowdon, Montreal, H3X 3T4, Canada.



                         Q5.4: SPEECH SYNTHESIS ON THE WWW

   There is a growing amount of information on speech synthesis available on
   the World Wide Web. Apart from the information in Q5.5, check out the
   following:

   Speech Synthesis "Museum"
          URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
          Maintained by Jon Iles <j.p.iles@cs.bham.ac.uk> at the University of
          Birmingham.
          Information and speech samples for

          + YorkTalk
          + Loughborough Sound Images
          + University of Birmingham - FDFS
          + Eurovocs
          + DECtalk
          + AT&T Bell Labs Synthesiser
          + S.W.A.Ll.C. - Welsh Synthesis from CSTR
          + All-Prosodic Speech Synthesis - IPOX
          + Orator from Bellcore

   Say...
          http://wwwtios.cs.utwente.nl/say
          WWW demo of the rsynth speech synthesis software. The WWW capability
          was implemented by Axel Belinfante.

   AT&T Bell Laboratories Voices
          http://www.research.att.com/cgi-bin/voices.form/
          WWW interface to the AT&T Bell Laboratories text to speech (TTS)
          synthesizer

   Yahoo page on speech generation
          http://www.yahoo.com/Science/Computer_Science/Artificial_Intelligenc
          e/Natural_Language_Processing/Speech_Generation/ 



                      Q5.5: SPEECH SYNTHESIS SOFTWARE/HARDWARE

   Please email any updates, corrections or additions to the following list.
   The range of commercially available synthesis software is growing rapidly
   so any help in keeping up to date will be appreciated.

          * AsTeR
          * TheBigMouth
          * CSRE: Canadian Speech Research Environment
          * DECTalk
          * Eloquence
          * Emacspeak - A Speech Output Subsystem For Emacs
          * Infovox Product Range
          * JSRU
          * Klatt-style synthesiser
          * KPE80 - A Klatt Synthesiser and Parameter Editor
          * "learph": Trainable text-to-phoneme software by Antonio Lucca 
          * Lernout and Hauspie Text-To-Speech (3 products)
          * Lernout and Hauspie Text-To-Speech Windows SDK
          * Various Mac Speech Output Applications
          * MacinTalk
          * Monologue for Windows from First Byte
          * Narrator Translator Library
          * Narrator
          * TextToSpeech Kit (NeXT)
          * Orator from Bellcore
          * PAM - A Text-To-Speech Application
          * ProVerbe Speech Engine for Windows
          * ProVoice Developer's Speech Toolkit from First Byte
          * RC Systems V8600/V8601 Text to Speech synthesizers 
          * rsynth
          * SENSYN speech synthesizer
          * SGI Developers Toolbox Synthesiser
          * SIMTEL
          * Sound Bytes DeveloperUs Kit
          * spchsyn.exe
          * Speak
          * Speech Manager and PlainTalk
          * Text to Phoneme Program 1
          * Text to phoneme program 2
          * Text to phoneme program 3
          * Tinytalk
          * TrueTalk
          * TruVoice from Centigram



AsTeR

     * Platform: UNIX
     * Description: TTS front-end program which encodes structural information
       about documents in speech synthesis. For more information check out:

                http://www.research.digital.com/CRL/personal/raman/aster/aster
                -toplevel.html

     * Operation requirements: Lisp: Lucid, clisp
     * Contact: T. V. Raman

    email: raman@crl.dec.com



TheBigMouth - a Text to Speech Program

     * Platform: NeXT
     * Description: Text to speech program based on concatenation of
       pre-recorded speech segments. NeXT equivalent of "Speak" for Suns.
     * Availability: try NeXT archive sites such as sonata.cc.purdue.edu.



CSRE: Canadian Speech Research Environment

     * Platform: PC
     * Cost: Distributed on a cost recovery basis.
     * Description: CSRE is a software system which includes in addition to
       the Klatt speech synthesizer, SPEECH ANALYSIS and EXPERIMENT CONTROL
       SYSTEM. A paper about the whole package can be found in:
          + Jamieson D.G. et al, "CSRE: A Speech Research Environment", Proc.
            of the Second Intl. Conf. on Spoken Language Processing, Edmonton:
            University of Alberta, pp. 1127-1130.
     * Hardware: Can use a range of data aqcuisition/DSP hardware.
     * Availability: For more information contact

    AVAAZ Innovations Inc.
    P.O.Box 8040
    1225 Wonderland Rd. N
    London, Ontario, CANADA, N6G 2B0
    Tel : (519) 472-7944 Fax : (519) 472-7814
    Email: info@avaaz.com
     * Note: A more detailed description is given in Section 1.9 on speech
       environments.



DECTalk

     * Description: Speech synthesis hardware and software. Detailed
       information on DECtalk and other DEC products is available on a
       World-Wide Web site.
          + http://www.digital.com/info.html
   For specific information on DECtalk, check out this www url:
          +
            http://www.digital.com/archive/pub/Digital/info/Customer-Update/940
            620005.txt



Eloquence

     * Platform: Windows, Solaris, SunOS, SGI, RS/6000
     * Description: Software based text-to-speech package. Generates waveforms
       completely algorithmically instead of by concatenating waveforms, for
       maximum flexibility and naturalism. For instance, when the user
       requests a deeper voice, the software simulates a larger vocal tract,
       instead of simply pitch-shifting samples.

       Uses high-level linguistic parsing, which obviates the need for a huge
       dictionary. Handles numbers, acronyms, currency, etc. Includes a set of
       annotation symbols, for placing stress on particular words, expressing
       excitement/boredom, etc. Also allows phonetic input. Support for
       Windows DDL.

       Produces male and female voices for General American English. Dialects
       under development include Alabama, Brooklyn, and Boston.
     * Price: Flexible license agreements on application.
     * Availability:

    Eloquent Technology, Inc.
    2389 North Triphammer Road
    Ithaca, NY 14850
    Ph: (607) 607-266-7025 Fax: (607) 607-266-7030
    Email: eti@plab.dmll.cornell.edu



Emacspeak - A Speech Output Subsystem For Emacs

     * Platform: UNIX, Emacs
     * Description: Emacspeak is a speech output system that will allow
       someone who cannot see to work directly on a UNIX system. Emacspeak is
       built on top of Emacs. With emacspeak loaded, Emacs provides spoken
       feedback for everything you do. Emacspeak currently supports the new
       Dectalk Express speech synthesizer, as well as older versions of the
       Dectalk e.g. the MultiVoice. See the Emacspeak WWW page, the Emacspeak
       FAQ or the Emacspeak distribution for additional details.
     * Requirements: Requires GNU FSF Emacs 19 (version 19.23 or later) and
       TCLX 7.3B (Extended TCL) to run Emacspeak.
     * Availability: 

        Emacspeak WWW page
                http://www.research.digital.com/CRL/personal/raman/emacspeak/e
                macspeak.html

        Emacspeak source
                http://www.research.digital.com/CRL/personal/raman/emacspeak/e
                macspeak.tar.gz

     * Contact: T. V. Raman
       Email: raman@adobe.com
       Email: raman@cs.cornell.edu 



Infovox Product Range

     * Description: Multilingual Text-to-speech systems, languages available:
       American English, British English, German, French, Spanish, Italian,
       Swedish, Norwegian, Icelandic, Danish and Finnish.

     * Product name:INFOVOX 500, PC BOARD
          + Product description: Half length expansion board for IBM PC, XT,
            AT, PS/2 model 30 or compatible personal computers. The board can
            also be connected via the serial port. Language and control
            program for downloading into RAM or mounted on EPROMs
          + Platform: for IBM PC, XT, AT, PS/2 model 30 or compatible
          + Delivered standard interface: MS DOS I/O driver
     * Product name: INFOVOX 600, OEM BOARD
          + Product description: OEM board built with CMOS IC's. Language and
            control program are stored in on-board fixed memory.
          + Platform: any, Interface: 9-pole D-SUB (RS 232-C) 300-9600 Baud.
          + Delivered standard interfaces: MS DOS I/O driver and interface to
            Apple Speech manager.
     * Product name: INFOVOX 700, DESKTOP UNIT
          + Product description: Desktop unit with built in Infovox 600 to be
            connected to any computer or terminal via an RS 232-C serial
            interface. Built in loudspeaker and rechargable battery for 4
            hours use, and control knobs for continuous control of speech
            volume and speed.
          + Platform: any
          + Delivered standard interfaces: MS DOS I/O driver and interface to
            Apple Speech manager
     * Product name: INFOVOX 650, OEM BOARD
          + Product description: OEM-board built with CMOS IC's. Language and
            control program are stored in on-board memory.
          + Platform: any, Interface: 9 pole D-SUB (RS 232-C) 300-9600 Baud
          + Delivered standard interfaces: MS DOS I/O driver and interface to
            Apple Speech manager
     * Product name: INFOVOX 750, DESKTOP UNIT
          + Product description: Desktop unit with built in Infovox 650 to be
            connected to any computer or terminal via an RS 232-C serial
            interface. Built in loudspeaker and rechargable battery for 5
            hours use, and a control knob for continuous control of speech
            volume.
          + Platform: any
          + Delivered standard interfaces: MS DOS I/O driver and interface to
            Apple Speech manager
     * Product name: Infovox 210, software for Apple Macintosh
          + Product description: Software based text-to-speech conversion.
            Produces 16 bit and 8 bit sound. Delivered on 3.5" diskettes with
            user lexicon and a complete documentation.
          + Platform: Apple Macintosh with minimum 68030, 33 MHz
            microprocessor.
          + Delivered standard interfaces: Standard interface to Apple Speech
            manager
     * Product name: Infovox 220, software for Microsoft Windows.
          + Product description: Software based text-to-speech conversion.
            Produces 16 bit sound and conforms to Microsoft Windows multimedia
            standard MCI. Delivered on 3.5" diskettes with user lexicon and a
            complete documentation.
          + Platform: IBM compatible PC with minimum 486, 25 MHz
            microprocessor.
          + Delivered standard interfaces: Standard interface to Microsoft
            Windows 3.1 and sound boards supporting Microsoft Windows
            multimedia driver for audio.
     * Contact: 

    Telia Promotor Infovox AB
    TTS Sales Division
    P.O. Box 2069
    S-171 02 Solna, Sweden
    Ph: +46 8 764 35 00 Fax: +46 8 735 78 76
    email: tts-sales@infovox.se



JSRU

     * Platform: UNIX and PC
     * Cost: 100 pounds sterling (from academic institutions and industry)
     * Description: A C version of the JSRU system, Version 2.3 is available.
       It's written in Turbo C but runs on most Unix systems with very little
       modification. A Form of Agreement must be signed to say that the
       software is required for research and development only.
     * Contact: Dr. E.Lewis eric.lewis@bristol.ac.uk)



Klatt-style synthesiser

     * Platform: Unix
     * Cost: Free
     * Description: Software posted to comp.speech in late 1992.
     * Availability: By ftp from the comp.speech ftp site
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt-3.04.ta
            r.gz 
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt-3.04.ta
            r.Z 
     * See also: KPE80 - A Klatt Synthesiser and Parameter Editor.



KPE80 - A Klatt Synthesiser and Parameter Editor

     * Platform: Unix
     * Description: The KPE80 program provides a graphical interface for the
       implementation of the Klatt 1980 formant synthesiser written by Jon
       Iles and Nick Ing-Simmons. It was inspired by IGE, a piece of code
       written by Rob Fletcher (http://www.york.ac.uk/~rpf1/IGE.html).
     * Technical Desc.: It is comprised of an X-Window interface and version
       3.03 of the synthesiser code. The interface allows users to display and
       edit Klatt parameters using a graphical display which includes the
       time-amplitude waveform of both the original speech and its synthetic
       copy, and some signal analysis facilities. Most of the work in choosing
       the parameter values to produce the synthetic copy has to be done by
       the user. KPE will estimate the fundamental frequency contour from an
       original token; this estimate will need to be amended where errors
       occur. It is possible to specify the formant trajectories with some
       precision by overlaying the appropriate formant frequency parameter
       tracks on the spectrogram of the target waveform. A number of
       facilities exist to help in the refinement of parameter values:
       original and synthetic waveforms can be compared aurally, spectrally,
       and spectrographically using built-in speech analysis facilities.
     * File formats: KPE will read RIFF (.wav) files and SFS files. (SFS is a
       suite of speech-signal processing programs available free from
       Phonetics and Linguistics, UCL.)
     * Availability: 

        KPE for SunOs 4.1.3 (statically compiled libraries)
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z

        KPE for Linux (statically compiled libraries)
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z

        The source code (needs gcc and SUIT to compile)
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z

        A postscript overview of KPE
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps

        The SFS distribution
                ftp://pitch.phon.ucl.ac.uk/pub/sfs

     * See also: Public domain Klatt-style speech synthesis code.
     * Contact: Andrew Simpson

    Department of Phonetics and Linguistics
    University College London
    Wolfson House, 4 Stephenson Way, London NW1 2HE
    email: a.simpson@ucl.ac.uk
    WWW: http://www.phon.ucl.ac.uk/home/andrew/home.html



"learph": Trainable text-to-phoneme software by Antonio Lucca

     * Platform: UNIX (unconfirmed)
     * Description: Experimental software which learns text to phoneme
       translation from examples.
     * Availability: Examples and source are available on the WWW:
       http://www.dsi.unimi.it/Users/Students/lucca/TTS/ttsdoc.html
     * Contact: Antonio Lucca: lucca@ghost.dsi.unimi.it 



Lernout & Hauspie Text-to-Speech (3 products)

   Lernout & Hauspie have three TTS products. The functionality of the
   products is similar, however, they differ in hardware implementation and
   other details where described below.
     * L&H tts2000/T: TTS for the Telephony and Telecommunications Market
     * L&H tts2000/M: TTS for the Computer and Multimedia Market
     * L&H tts3000/C: TTS for the Buisness and Consumer Electronics Market
     * Description: Text to Speech (TTS) software based on parameterized
       segment concatenation (diphones, triphones and tetraphones) algorithms.
       Available for US English, German, Dutch, French, Spanish (Castilian),
       Italian and Korean. General features include:
          + The control of volume, speech rate and speech pitch.
          + The use of control sequences to customize TTS output (adding
            pauses, using phonetic input, etc.).
          + Switching between languages at run time.
          + A personal vocabulary editor is available for building exception
            dictionaries.
          + Readout modes: letter by letter, word by word or sentence by
            sentence.
          + Input formats: orthographic input, phonetic input, phonetic input
            with prosodic information.
     * tts2000/T
          + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear
            PCM.
          + Sampling Frequency: 8kHz
          + Single channel platform examples: SHARP SH7000, ARM6/ARM7, Intel
            i960, TI TMS320C31, AT&T DSP3210
          + Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
     * tts2000/M
          + Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit
            A-law PCM, 16 bit linear PC.
          + Sampling Frequency: 8/10/11.025 kHz
          + Single processor platform examples: ARM6/ARM7, Intel
            386/486/Pentium, Motorola 68040
          + Two processor platform examples: {Intel 386/486/Pentium or
            Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI
            TMS320C25/20C5X}
     * tts3000/C
          + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear
            PCM.
          + Sampling Frequency: 10kHz
          + Single processor platform examples: SHARP SH7000, ARM6/ARM7, Intel
            i960, TI TMS320C31, AT&T DSP3210
          + Two processors platform examples: { SHARP SH7000 or ARM6/ARM7 or
            Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or Motorola 5600X
            or TI TMS320C25/C5X or TI TSP50C10}
     * See also: L&H Windows TTS SDK
     * Price: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com



Lernout & Hauspie Text-to-Speech Windows SDK

     * Platform: IBM-Compatible
     * Description: The L&H Text-to-Speech software developers kit is able to
       integrate text-to-speech technology with your own or existing PC
       applications under Microsoft Windows 3.1. This software will allow
       conversion of written text into clear human sounding synthetic speech.
     * Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 + MS
       Windows 3.1 (or higher) + SoundBlaster compatible sound board.
     * See also: L&H TTS Products
     * Price: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com


AddressSpeech     info-mac  4D talking address book (from Speech Pack 2.0)
At Ease 2.0    MacWarehouse Friendly desktop that speaks file names
At Ease 2.0 WG MacWarehouse Friendly desktop that speaks file names
Eliza 3.1            AOL    Talking Eliza (Rogerian psych therapist)
FB speech      Inside Basic Mag, volume 3, no. 6. FutureBasic demo
FB Speech demo Inside Basic Mag, volume 3, no. 7. FutureBasic demo
Fortune 1.1       info-mac  Like a talking UNIX fortune command - slick
Homer 0.92d9  zaphod.ee.pitt.edu  GUI IRC client, assign nicks voices - slick
MacMessage 1.0  FirstClassBBS  Share talking messages/customizable startup
Say               info-mac  MPW Tool which converts standard input to speech
ScriptTools 1.2   info-mac  Write AppleScript scripts to say text messages
Siege Watch 1.01f info-mac  Wryly political speaking clock
SoToSpeak1.0.0b10 info-mac  Two voice conversation (also see Fortune's About)
Speak It!         info-mac  Type in a message and have it spoken
Speaker 1.11      info-mac  Simple text file editor, speaks on CR, macros
Speecher 1.2.1    info-mac  Customizable word pronunciation/substitution
SpeechManagerdemo info-mac  Command line interface, C source, aka -explorer
Speech Pack 2.0   info-mac  4th Dimension external, add speech to database
speek-02b         info-mac  Speech XCMD for HyperCard
TalkingClockPro2.0info-mac  AppleScriptable talking clock extension (2.0b0)
TeachText 7.2      AV Mac   Apple's talking TeachText (simple editor w/QT)
Tex-Edit 1.9         AOL    Talking word processor, McSink like, modeming
VoiceDemo 1.0.1   info-mac  Bare bones phrase talker
Welcome!v1.3.1    info-mac  A talking Welcome to Macintosh startup
?                     ?     Talking Plug-In-Module for MS Word 5,
                               experimental, unsupported, buggy, beware!
Speech Rhythms       AOL    A cool text file for one of the above apps
_____
     * Sources: 
          + AOL = America Online
          + info-mac = {ftp sumex-aim.stanford.edu, ftp wuarchive.wustl.edu,
            et al.}
          + MacWarehouse = (800) 255-6227
     * Misc: Apple's work in spoken language technologies and systems is
       described in:
          + Lee, Kai-Fu. "The Conversational Computer: An Apple Perspective."
            (Keynote Speech) In Proc. Eurospeech in Berlin, September, 1993.



MacinTalk

     * Platform: Macintosh
     * Cost: Free
     * Description: Formant based speech synthesis. There is also a program
       called "tex-edit" which apparently can pronounce English sentences
       reasonably using Macintalk.
     * Note: MacinTalk doesn't run reliably on Macintosh's with new sound
       hardware under the lastest OS (System 7.1 w/HUD 2.0). More recent
       software is listed above.
     * Availability: By anonymous ftp from many archive sites (have a look on
       archie if you can). tex-edit is on many of the same sites. Try

                ftp://wuarchive.wustl.edu/mirrors2/info-mac/Old/card/macintal
                k.hqx 

                ftp://wuarchive.wustl.edu/mirrors2/info-mac/Old/card/macintal
                k-stack.hqx

                ftp://wuarchive.wustl.edu/mirrors2/info-mac/app/tex-edit-15.h
                qx



Monologue for Windows from First Byte

     * Description: Monologue, a software program that reads text from the
       clipboard in Windows 16 or 32 bit applications, can be found as a
       bundled product with many sound cards and multimedia general purpose
       computer systems. It is not offered as a separate product at this time.

       Monologue can add the element of speech to virtually any text oriented
       application. Any pronounceable combination of letters and numbers will
       be spoken clearly. It can be applied to tasks such as eyes-free
       proofreading, data verification (e.g. spreadsheets), reading E-mail and
       more. User-changeable parameters provide control over the sound quality
       by allowing for changes in pitch, and the speed of speech. An exception
       dictionary saves preferred pronunciation of words and abbreviations.

       Monologue works with sound devices that comply with the Windows Sound
       API. Monologue male "SpeechFonts" are available for US English, British
       English, German, French, Latin American Spanish, Italian. A US English
       Female SpeechFont is also available.
     * Availability: Currently bundled with many sound cards and multimedia
       general purpose computer systems. Monologue will soon be available as a
       stand-alone product. Single user and site licenses as well as
       Distributor discounts will be offered.
     * WWW: For more detailed information and examples go to the First Byte
       WWW page: http://www.firstbyte.davd.com/
     * See also: ProVoice Developer's Speech Toolkit from First Byte
     * Contact: 

    First Byte
    19840 Pioneer Ave., Torrance, CA 90503
    Ph: 310-793-0610 Fax: 310-793-0611
    Email: info@firstbyte.davd.com
    WWW: http://www.firstbyte.davd.com/



Narrator Translator Library

     * Platform: Amiga
     * Description: A replacement for the Commodore-supplied
       "translator.library" which is a part of the Narrator speech synthesis
       package. It implements multi-lingual text-to-speech for an Amiga. The
       library allows the user to specify the language the text to be spoken
       should be translated as. This can be done by setting the default
       language or by including markup codes in the text in a similar way to
       Latex or Html. eg: "\french{Bonjour}". There is currently support for
       American English, British English, Swedish, Maori, Finnish, German,
       Icelandic, Klingon, Polish, Italian, and Welsh.
     * Availability: The library (but not source) is available by anonymous
       ftp from Aminet:

                ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha

     * More Information: is available on the WWW.

                http://www.sans.vuw.ac.nz/~ffranc/translator/index.html 



Narrator

     * Platform: Amiga
     * Description: Formant based speech synthesis. Includes a
       Engish-to-phoneme translation library, and a SPEAK: pseudo-device for
       speech output.
     * Hardware: Standard Amiga hardware
     * Availability: Part of AmigaOS
     * See Also: The Narrator Translation library



TextToSpeech Kit

     * Platform: NeXT Computers
     * Description: The TextToSpeech Kit does unrestricted conversion of
       English text to synthesized speech in real-time. The user has control
       over speaking rate, median pitch, stereo balance, volume, and
       intonation type. Text of any length can be spoken, and messages can be
       queued up, from multiple applications if desired. Real-time controls
       such as pause, continue, and erase are included. Pronunciations are
       derived primarily by dictionary look-up. The Main Dictionary has nearly
       100,000 hand-edited pronunciations which can be supplemented or
       overridden with the User and Application dictionaries. A number parser
       handles numbers in any form. A letter-to-sound knowledge base provides
       pronunciations for words not in the Main or customized dictionaries.
       Dictionary search order is under user control. Special modes of text
       input are available for spelling and emphasis of words or phrases. The
       actual conversion of text to speech is done by the TextToSpeech Server.
       The Server runs as an independent task in the background, and can
       handle up to 50 client connections.
     * Misc: The TextToSpeech Kit comes in two packages: the Developer Kit and
       the User Kit. The Developer Kit enables developers to build and test
       applications which incorporate text-to-speech. It includes the
       TextToSpeech Server, the TextToSpeech Object, the pronunciation editor
       PrEditor, several example applications, phonetic fonts, example source
       code, and developer documentation. The User Kit provides support for
       applications which incorporate text-to-speech. It is a subset of the
       Developer Kit.
     * Hardware: Uses standard NeXT Computer hardware.
     * Cost:
          + TextToSpeech User Kit: $175 CDN ($145 US)
          + TextToSpeech Developer Kit: $350 CDN ($290 US)
          + Upgrade from User to Developer Kit: $175 CDN ($145 US)
     * Availability: Trillium Sound Research

    1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
    Tel: (403) 284-9278 Fax: (403) 282-6778
    Order Desk: 1-800-L-ORATOR (US and Canada only)
    Email: TTSInfo@trillium.ab.ca



Orator Text-to-Speech Synthesizer

     * Platform: SUN SPARC, Decstation 5000. Written in C, and therefore
       portable to other UNIX platforms. Some successful ports: HP, RS-6000,
       PC-Unix [Linux].
     * Description: Sophisticated speech synthesis package. Has text
       preprocessing (for abbreviations, numbers), acronym rules, and
       human-like spelling routines. Natural-sounding synthesis based on
       demisyllable concatenation. Has high accuracy for pronunciation of
       names of people, places and businesses in America; good accuracy for
       English text; rules for stress and intonation marking; various methods
       of user control and customization at most stages of processing.
       A new version of the ORATOR system is under development. Both ORATOR
       and this new "ORATOR II" system are capable of general text synthesis.
       The ORATOR II system has a more natural-sounding voice.
     * Hardware: Runs on common SPARC or Decstation workstations, using their
       internal audio output capability. Recommend at least 16M of memory.
     * WWW: More detailed information plus examples of ORATOR synthesis are
       available on the ORATOR WWW pages:
       http://www.bellcore.com/demotoo/ORATOR/index.html
     * Misc 1: A free demo cassette is available.
     * Misc 2: Examples of Orator are also available on the University of
       Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).
     * Availability and Pricing: Contact Bellcore's Licensing Office
       Tel: 1-800-521-CORE (521-2673)
       Fax: 1-908-336-2559
       Email: Anthony Lindsey: alin1@panix.com
       WWW: http://www.bellcore.com/demotoo/ORATOR/index.html



PAM - A Text-To-Speech Application

     * Platform: Windows
     * Description: PAM is a talking personal assistant and text reader
       application. It uses the ProVoice TTS package. PAM will verbally advise
       about appointments and reminder messages at specified times during the
       day. It can read text files, clipboard text, and text sent in DDE
       messages. Using the full verbal interface, PAM can be used by visually
       challenged individuals. Shareware - thirty day free trial.
     * Requirements: Any Windows sound card, speakers or headphones. Min.
       memory - 4 megs, 8 megs recommended.
     * WWW: A more complete description is available on the JTS homepage:
       http://www.islandnet.com/~tslemko/homepage.html
     * Availability: The shareware can be downloaded by ftp from
       ftp://ftp.islandnet.com/jts/pam_en1e.zip. The file size is approx. 1
       MByte.
     * Price: $US40 for the registered version.
     * Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,

    JTS Micro Consulting Ltd
    10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0



ProVerbe Speech Engine for Windows (95 and NT)

     * Description: The ProVerbe Speech Engine produces natural sounding
       speech from any written text. A high level of naturalness is achieved
       by using the TD-PSOLA process from the CNET (France telecom's research
       lab.) which is based on the concatenation of elementary speech units
       (including diphones). Supported languages are British English, German,
       French and Spanish. For multi-channel applications Elan Informatique
       also provides hardware platforms.
     * Demo: Anonymous ftp from ftp://www.cict.fr/pub/elan/
     * Contact: Elan Informatique
       4 rue Jean Rodier, 31400 TOULOUSE FRANCE
       Contact person: Pierre Delrat
       Phone: +33 61 36 07 77 Fax: +33 61 36 07 70
       BBS: +33 61 36 07 88
       E-mail: 101346,465@compuserve.com



ProVoice Developer's Speech Toolkit from First Byte

     * Platform: ProVoice Developer's Toolkits are available for DOS, Windows
       3.1, Windows 95, Windows NT, OS/2, and Macintosh.
     * Description: ProVoice allows programmers to add synthesized speech to
       their applications. Your program passes text strings to the ProVoice
       speech engine that translates text into audible speech. Male and/or
       female "SpeechFonts" are available for many languages; English, French,
       German, UK British English, Italian, and Spanish.

       ProVoice converts text to speech in two phases using a set of phonetic
       translation and pronunciation rules. First, the software analyzes and
       translates text into "sound descriptors", a phonetic language with
       pitch, duration, and amplitude codes which are needed to produce stress
       patterns in phrases and sentences. Rules are used to analyze words,
       numbers, and punctuation. The second phase converts the intermediate
       phonetic language in speech signals; algorithms drive distinct speech
       signals into smooth flowing, continuous, clear speech. Real time
       synchronization of mouth movement and word boundaries allows animation
       of a graphical talking character, or highlighting of displayed text as
       it is spoken.

       Necessary tools and examples are provided for programmers to manipulate
       the ProVoice speech technology; including installation instructions,
       extensive samples programs, and complete documentation. In addition,
       sample code is provided on disk to illustrate speech programming
       techniques.
     * Note 1: First Byte will perform custom work for embedded systems.
     * Note 2: ProVoice Windows will speak through any Windows-supported wave
       audio device.
     * Note 3: Distribution of ProVoice for commercial use is subject to
       execution of a Commercial Product Distribution License Agreement.
     * WWW: For more detailed information and examples go to the First Byte
       WWW page: http://www.firstbyte.davd.com/
     * See also: Monologue for Windows from First Byte
     * Price and Availability: Contact First Byte
     * Contact: 

    First Byte
    19840 Pioneer Ave., Torrance, CA 90503
    Ph: 310-793-0610 Fax: 310-793-0611
    Email: info@firstbyte.davd.com
    WWW: http://www.firstbyte.davd.com/



RC Systems V8600/V8601 Text to Speech synthesizers

     * Platform 1: IBM PC: ISA card.
     * Platform 2: Interface to PC/104 standard microcontrollers.
     * Platform 3: Standalone (or embedded) thru RS232 or parallel printer
       port or processor bus.
     * Description: Converts plain ASCII text to speech. Programmable voices,
       pitch rate, volume, etc. Built-in DTMF and tone generators.
     * Price: $151-$299 US (qty 1)
     * Contact: RC Systems

    1609 England Avenue, Everett, WA 98203, USA
    Ph: (206) 355-3800 Fax: (206) 355-1098
    Europe: +44181 539-0285



rsynth

     * Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI Irix4.x,
       Linux)
     * Description: Public domain text-to-speech systm assembled from a
       variety of sources. It supports CMU and BEEP format dictionaries (as
       described in Q1.10) and now utilises stress marks in the dictionary in
       synthesising intonation.
     * Price: Free
     * Misc: Axel Belinfante has implemented a WWW rsynth demo:
       http://wwwtios.cs.utwente.nl/say.
     * Availability: by anonymous ftp from

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2
                .0.tar.Z 

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2
                .0.tar.gz 



SENSYN speech synthesizer

     * Platform: PC, Mac, Sun, and NeXt
     * Rough Cost: $300
     * Description: This formant synthesizer produces speech waveform files
       based on the (Klatt) KLSYN88 synthesizer. It is intended for laboratory
       and research use. Note that this is NOT a text-to-speech synthesizer,
       but creates speech sounds based upon a large number of input variables
       (formant frequencies, bandwidths, glottal pulse characteristics, etc.)
       and would be used as part of a TTS system. Includes full source code.
     * Availability: Sensimetrics Corporation

    64 Sidney Street, Cambridge MA 02139.
    Fax: (617) 225-0470; Tel: (617) 225-2442.
    Email: sensimetrics@sens.com



SGI Developers Toolbox Synthesiser

     * Platform: SGI
     * Description: The SGI Developer Toolbox 4.0 CDROM contains a basicpublic
       domain text-to-speech program in the publics/speak directory. The
       directory includes man pages and source.
     * Availability: on the SGI Developer Toolbox 4.0 CDROM



SIMTEL

   A wide range of speech related software, sound-blaster software and signal
   processing software for PCs is available on SimTel and its mirror sites. It
   can be obtained by ftp from:

           ftp://oak.oakland.edu/SimTel/msdos/voice/

   and is now on the WWW:

          http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

    Voicemaker

   The archives include the program Voicemaker which synthesises speech from
   phonemes using "concatenation" of phonemes recorded by the user. Voicemaker
   is a freeware program. It requires an IBM or compatible, 512KB RAM, sound
   blaster compatible sound card.

           ftp://oak.oakland.edu/SimTel/msdos/voice/vm110.zip



Sound Bytes DeveloperUs Kit

     * Platform: Subroutine library for PC (MS-Windows, OS/2) and Macintosh
     *

       Hardware: Windows - 16 MHz 80386 (minimum) running Windows 3.1; 4 Mb
       RAM with at least 1.4 Mb RAM free. Disk space 1.4 Mb.
       OS/2 - 16 MHz 80386 (minimum) running OS/2 2.0 or above; 8 Mb RAM with
       at least 1.4 Mb RAM free.
       Mac - Any Mac with at least 2.5 Mb of RAM running 6.0.4 or higher.
       Telephone compatible. Compatible with commonly used sound cards.
     * Description: SBDK is a software-only sentence-level synthesizer that
       converts unrestricted English text (ASCII) into synthesized voice
       through diphone concatenation. SBDK utlizes parsing to incorporate the
       intonational and rhythmic patterns of normal speech. The developerUs
       kit includes two voices, one female and one male. The product has a
       55,000-word built-in dictionary and a tool for creating customized user
       dictionaries. It converts numbers, dates, dollars, phone numbers and
       times to words, and has a SoundOut facility that provides a choice of
       pronouncing unknown words phonetically or spelling them out. Developers
       can vary voice pitch (130-220 Hz) and rate (65-200 wpm), synchronize
       speech to other events, have multiple channels of speech to the same or
       different boards, etc. Speech sampling options: 8-bit linear; 8-bit
       companded at 11 kHz (Windows); 8-bit mu-law PCM at 8 or 11 kHz; 16-bit
       linear at 11 kHz.
     * Cost: Sound Bytes may be licensed for internal use or resale. Site
       license fee= $3750. Resale or Internal runtime fees= 2% of net sales
       price per runtime sold, OR $150 per telephone port, OR per unit pricing
       for internal use determined case-by-case.
     * Misc: Demo disks are available for Windows and the Mac.
     * Availability: Natural Speech Technologies, Inc. - (619) 457-2526.



spchsyn.exe

     * Platform: PC?
     * Availability: By anonymous ftp as a self extracting DOS archive.

                ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.
                exe

     * Requirements: May require special TI product(s), but all source is
       there.



"Speak" - a Text to Speech Program

     * Platform: Sun SPARC
     * Description: Text to speech program based on concatenation of
       pre-recorded speech segments. A function library can be used to
       integrate speech output into other code.
     * Hardware: SPARC audio I/O
     * Availability: by anonymous ftp

                 ftp://wilma.cs.brown.edu/pub/speak.tar.Z



Speech Manager and PlainTalk

     * Platform: Macintosh
     * Cost: Free
     * Description: Apple's text-to-speech system extensions that enable
       applications to perform text-to-speech conversion. The Speech Manager
       runs on most Macs, but PlainTalk (and the high quality voices) requires
       a 68020 Mac or better.
     * Availability: By anonymous ftp from:

                ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/
                system_sw/PlainTalk 1.4.1

   This directory contains subdirectories for recent versions of PlainTalk.
       The current release (PlainTalk 1.4.1) contains the English
       Text-To-Speech with about a dozen voices ( English_Text-to-Speech.hqx:
       5.3 MByte), Mexican Spanish ( Mexican_Spanish_TTS.hqx: 2.8 MByte), and
       the English Speech Recognition software (
       English_Speech_Recognition.hqx: 2.3MByte).
     * WWW: The latest information is available from Apple's WWW page for
       speech recognition and synthesis:
       http://www.info.apple.com/apple.speech/
     * Note: Joshua Baer (shaddar+@cmu.edu) runs a mailing list for Plaintalk.
       To subscribe, send email to plaintalk@thelorax.mac.cc.cmu.edu with the
       word subscribe as the subject. There is also a WWW page with links to
       ftpable software.

                http://www.contrib.andrew.cmu.edu/usr/jbbt/plaintalk/plaintalk
                .html



Text to phoneme program (1)

     * Platform: unknown
     * Description: Text to phoneme program. Based on Naval Research Lab's set
       of text to phoneme rules.
     * Availability: by anonymous ftp

                 ftp://shark.cse.fau.edu/pub/src/phon.tar.Z



Text to phoneme program (2)

     * Platform: unknown
     * Description: Text to phoneme program.
     * Availability: by anonymous ftp

                ftp://wuarchive.wustl.edu/mirrors/unix-c/utils/phoneme.c



Text to phoneme program (3)

     * Description: A public domain version of the same Naval Research Lab
       text to phoneme rules.
     * Availability: By anonymous ftp

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2
                phoneme.shar



Tinytalk

     * Platform: PC
     * Description: Shareware package is a speech 'screen reader' which is
       used by many blind users.
     * Price: Tinytalk is now $150. There are package deals on Tinytalk with
       various speech synthesizers.
     * Availability: Tinytalk is available by anonymous ftp from the following
       site

        Files: ttdoc167.zip and ttdoc167.zip (executable and documenation)
                ftp://ftp.netcom.com/pub/eb/ebohlman/

   (Note: it is a busy ftp server.)
     * Contact: Eric Bohlman

    OMS Development
    610-B Forest Ave., Wilmette, IL 60091
    Ph: (800)831-0272 Fax: 708-251-5793
    Outside North America: (708)-251-5787
    Email: ebohlman@netcom.com



TrueTalk

     * Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or SGI
       Indy/Indigo/Indigo2 with IRIX 5.2. Other platforms in development.
     * Description: Personal TrueTalk, by Entropic Research Laboratory, Inc.,
       is an all-software Text-to-Speech (TTS) system designed to voice-enable
       UNIX X-Windows workstations. It combines a graphical interface with a
       powerful TTS engine based on technology developed by AT& Bell
       Laboratories. Features include:
          + Intelligible, prosodically natural speech.
          + Text taken from file input, highlighted X selections, the
            interface scratch pad, other programs connected through a TCP/IP
            socket, or Tcl/Tk applications via the Tk "send" mechanism.
          + Stop, pause and resume while speech is in progress.
          + Visual indication of corresponding text position when paused.
          + Nine speaking voices, with Male and Female versions of each voice.
          + Adjustable speaking rate and volume.
          + Supports drop-in text filters; "email" and "lively" examples
            included.
          + Audio output through workstation headphones or speaker.
          + Complete on-line documentation, including mouse-activated help
            windows.
     * Misc: A more detailed description of TrueTalk is available on the
       Entropic WWW server: http://www.entropic.com/truetalk.com
     * Availability: You can obtain Personal TrueTalk through the Internet.
       For details, see

                 ftp://ftp.entropic.com/pub/truetalk/README.ptt

   Personal TrueTalk is available free of charge for evaluation purposes. You
       can fully-enable your evaluation copy at any time by purchasing a
       license key from Entropic.
     * Requirements: 12MB disk space, 8MB process size (24MB system RAM
       recommended).
     * Cost: US$495; US$395 academic
     * Contact: 

    Entropic Research Laboratory, Inc.,
    Washington, D.C.
    Voice: 1-800-ENTROPIC (North America), (202) 547 1420
    Fax: (202) 547-6648
    Email: truetalk@entropic.com
    WWW: http://www.entropic.com/



TruVoice from Centigram

     * Platform: Windows-NT, Windows 95, Windows 3.1 (limited release), OS/2,
       Sun Solaris 1&2
     * Description: TruVoice., an advanced text-to-speech converter, is
       available for multiple environments. TruVoice converts text into spoken
       language. TruVoice adds intelligible, natural-sounding speech to sound
       enabled platforms.
          + No vocabulary restrictions
          + User-definable pronunciation dictionary
          + Accurately pronounces surnames and place names
          + Preprocessor provides e-mail and spreadsheet reading capabilities
            and expands abbreviations.
          + Multiple languages available: American English, Latin American
            Spanish, German, French, Italian
          + Flexible pitch, volume and speech rate
          + Intonation support for punctuation
          + Supports navigational capabilities such as, pause, resume and jump
            forward / jump back
   More detailed information is provided in the brochure page on the Centigram
       WWW pages.
       A demonstration of TruVoice is available on the Centigram WWW pages.
     * Cost: 
          + Windows versions are $295 for the SDK
          + Solaris versions are $995
          + Contact Centigram for other pricing.
     * Contact: Christine Hansen
       Centigram Communications Corporation
       91 East Tasman Drive, San Jose, CA 95134
       Tel: 408/944-0250 Fax: 408/428-3732
       Email: chris.hansen@centigram.com
       WWW: http://www.centigram.com/


___________________________________________________________________________

                         FAQ SECTION 6 - SPEECH RECOGNITION

          * Q6.1: What is speech recognition?
          * Q6.2: How is speech recognition performed?
          * Q6.3: How can I build a simple speech recogniser?
          * Q6.4: References & books on speech recognition
          * Q6.5: Speech Recognition Hardware/Software



                      Q6.1: WHAT IS SPEECH RECOGNITION?

Automatic Speech Recognition

   Automatic speech recognition is the process by which a computer maps an
   acoustic speech signal to text.

   Automatic speech understanding is the process by which a computer maps an
   acoustic speech signal to some form of abstract meaning of the speech.

What does speaker dependent / adaptive / independent mean?

   A speaker dependent system is developed to operate for a single speaker.
   These systems are usually easier to develop, cheaper to buy and more
   accurate, but not as flexible as speaker adaptive or speaker independent
   systems.

   A speaker independent system is developed to operate for any speaker of a
   particular type (e.g. American English). These systems are the most
   difficult to develop, most expensive and accuracy is lower than speaker
   dependent systems. However, they are more flexible.

   A speaker adaptive system is developed to adapt its operation to the
   characteristics of new speakers. It's difficulty lies somewhere between
   speaker independent and speaker dependent systems.

What does small/medium/large/very-large vocabulary mean?

   The size of vocabulary of a speech recognition system affects the
   complexity, processing requirements and the accuracy of the system. Some
   applications only require a few words (e.g. numbers only), others require
   very large dictionaries (e.g. dictation machines). There are no established
   definitions, however, try
     * small vocabulary - tens of words
     * medium vocabulary - hundreds of words
     * large vocabulary - thousands of words
     * very-large vocabulary - tens of thousands of words.

What does continuous speech or isolated-word mean?

   An isolated-word system operates on single words at a time - requiring a
   pause between saying each word. This is the simplest form of recognition to
   perform because the end points are easier to find and the pronunciation of
   a word tends not affect others. Thus, because the occurrences of words are
   more consistent they are easier to recognise.

   A continuous speech system operates on speech in which words are connected
   together, i.e. not separated by pauses. Continuous speech is more difficult
   to handle because of a variety of effects. First, it is difficult to find
   the start and end points of words. Another problem is "coarticulation". The
   production of each phoneme is affected by the production of surrounding
   phonemes, and similarly the the start and end of words are affected by the
   preceding and following words. The recognition of continuous speech is also
   affected by the rate of speech (fast speech tends to be harder).



                  Q6.2: HOW IS SPEECH RECOGNITION PERFORMED?

   A wide variety of techniques are used to perform speech recognition. There
   are many types of speech recognition. There are many levels of speech
   recognition / analysis / understanding.

   Typically speech recognition starts with the digital sampling of speech.
   The next stage is acoustic signal processing. Most techniques include
   spectral analysis; e.g. LPC analysis, MFCC, cochlea modelling and many,
   many more.

   The next stage is recognition of phonemes, groups of phonemes and words.
   This stage can be achieved by many processes such as DTW (Dynamic Time
   Warping), HMM (hidden Markov modelling), NNs (Neural Networks), expert
   systems and combinations of techniques. HMM-based systems are currently the
   most commonly used and most successful approach.

   Most systems utilise some knowledge of the language to aid the recognition
   process.

   Some systems try to "understand" speech. That is, they try to convert the
   words into a representation of what the speaker intended to mean or achieve
   by what they said.



              Q6.3: HOW CAN I BUILD A SIMPLE SPEECH RECOGNISER?

   Doug Danforth provides a detailed account in article 253 in the comp.speech
   archives. A summary is provided below. It is also available by anonymous
   ftp

          ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognit
          ion

    QUICKY RECOGNIZER sketch:

    Here is a simple recognizer that should give you 85%+ recognition
   accuracy. The accuracy is a function of the words you have in your
   vocabulary. Long distinct words are easy. Short similar words are hard. You
   can get 98+% on the digits with this recognizer.

   Overview:
     * Find the begining and end of the utterance.
     * Filter the raw signal into frequency bands.
     * Cut the utterance into a fixed number of segments.
     * Average data for each band in each segment.
     * Store this pattern with its name.
     * Collect training set of about 3 repetitions of each pattern (word).
     * Recognize unknown by comparing its pattern against all patterns in the
       training set and returning the name of the pattern closest to the
       unknown.

   Many variations upon the theme can be made to improve the performance. Try
   different filtering of the raw signal and different processing methods.

   Q6.5 contains information on public domain speech recognition software:
   Lotec and Myers' Hidden Markov Model software.



                Q6.4: REFERENCES & BOOKS ON SPEECH RECOGNITION

  PRODUCT REVIEWS AND COMPARISONS

   Comparisons of speech recognition products (this article is already a year
   out of date).
     * "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
     * "Seybold Report on Desktop Publishing" published a nine-page,
       head-to-head comparison of Dragon's DOS software with IBM's OS/2
       software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
       ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA 19063
       USA, phone (610) 565-2480.
     * McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
       published a two-page review of IBM's Personal Dictation System
       software. May 1994; Volume ?, Number ?; Pages 145-146; ISSN:0360-5280;
       Editorial, Executive, and Circulation address: One Phoenix Mill Lane,
       Peterborough, NH 03458 USA, phone ?

  TECHNOLOGY: GENERAL AND INTRODUCTORY

   Some general introduction books on speech recognition technology:
     * Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang
       Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
       Series), c1993, ISBN 0-13-015157-2
     * Speech recognition by machine; W.A. Ainsworth London: Peregrinus for
       the Institution of Electrical Engineers, c1988
     * Speech synthesis and recognition; J.N. Holmes Wokingham: Van Nostrand
       Reinhold, c1988
     * Speech Communication: Human and Machine, Douglas O'Shaughnessy; Addison
       Wesley series in Electrical Engineering: Digital Signal Processing,
       1987.
     * Electronic speech recognition: techniques, technology and applications,
       edited by Geoff Bristow, London: Collins, 1986
     * Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu Lee. San
       Mateo: Morgan Kaufmann, c1990

  TECHNICAL
     * Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki, M.A.
       Jack. Edinburgh: Edinburgh University Press, c1990
     * Speech Recognition: The Complete Practical Reference Guide; T. Schalk,
       P. J. Foster: Telecom Library Inc, New York; ISBN O-9366648-39-2; 377
       pages; paperback only. Covers speech recognition in a telephony
       environment and wish to use call processing hardware based in PCs. It
       is written using Dialogic hardware as the example for the hardware.
     * Automatic speech recognition: the development of the SPHINX system; by
       Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
     * An Introduction to the Application of the Theory of Probabilistic
       Functions of a Markov Process to Automatic Speech Recognition, S. E.
       Levinson, L. R. Rabiner and M. M. Sondhi; in Bell Syst. Tech. Jnl.
       v62(4), pp1035--1074, April 1983
     * Review of Neural Networks for Speech Recognition, R. P. Lippmann; in
       Neural Computation, v1(1), pp 1-38, 1989.

  BIBLIOGRAPHY

   The following book is a comprehensive bibliography of speech processing.
     * Computational Speech Processing: Speech Analysis, Recognition,
       Understanding, Compression, Transmission, Coding, Synthesis ; Text to
       Speech Systems, Speech to Tactile Displays, Speaker Identification,
       Prosody Processing : BIBLIOGRAPHY, by Conrad F. Sabourin, 1994, 2
       volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187
       Snowdon, Montreal, H3X 3T4, Canada.



                 Q6.5: SPEECH RECOGNITION HARDWARE & SOFTWARE

   The number of speech recognition packages, and the information about the
   software is changing rapidly. Any help with keeping this information up to
   date will be appreciated.

    Speech Recognition Processors (ICs)
          Jean-Pierre Lereboullet <Jean-Pierre.Lereboullet@paris.ensam.fr> has
          put together a detailed list of Voice Recognition Processors which
          covers about 15 ICs and pieces of related hardware (including D6106,
          HM2007, MSM6679, RSC-164, TC8860F/64F/65F, 5A128).

        The document is available on the comp.speech ftp server:

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecogn
                itionProcessors

    Recognition Information on the WWW
          In addition to the entries on speech recognition in this FAQ, the
          following WWW sites provide information on speech recognition:

         Commercial Speech Recognition: Russ Wilcox of PureSpeech Inc.

                http://www.tiac.net/users/rwilcox/speech.html

         Yahoo pages on Speech Recognition
                http://www.yahoo.com/business/corporations/computers/software/
                voice_recognition/ 
                http://www.yahoo.com/Science/Computer_Science/Artificial_Intel
                ligence/Natural_Language_Processing/Speech_Recognition/ 

  IN THE FAQ...

   The following speech recognition software/hardware is described in the
   comp.speech FAQ.

          * AbbotDemo
          * BBN Hark Telephony Recognizer
          * Corona Speech Recognition System
          * Custom Voice(TM) by A&G Graphics Interface
          * D6006 Voice Control Processor
          * DATAVOX - French
          * Digital Dreams Speech Recognition Plug-Ins
          * DragonDictate version 3.0
          * DragonDictate for Windows
          * DragonVoiceTools
          * DSP Semiconductor Recognition Chip
          * EARS: Single Word Recognition Package
          * HM2007 - Speech Recognition Chip
          * Hidden Markov Model Toolkit (HTK) from Entropic 
          * IBM VoiceType Dictation
          * ICSS system from IBM
          * IN3 Voice Command
          * IN3 Voice Command for Windows
          * Kurzweil Voice for Windows
          * Lernout & Hauspie ASR (3 products)
          * Lernout & Hauspie ASR SDK
          * Listen for Windows 2.0 - Verbex Voice Systems
          * Lotec Speech Recognition Package
          * Myers' Hidden Markov Model software
          * NCC Dictate
          * OKI VRP6679 - Speech Recognition Chip
          * Speech Systems Phonetic Engine 500 (PE500)
          * PowerSecretary
          * ProNotes Voice Tools (due late '95)
          * PureSpeech
          * recnet
          * SayIt
          * Simon Says - for NeXT
          * Speech Commander - Verbex Voice Systems
          * 'Speech Recognition Expert' Toolkit for Windows
          * Visual Voice from Stylus Innovation
          * Voice Command Line Interface
          * Voice Control Systems Recognition
          * Visus SpeechKit
          * VCS 2030 & 2060 Voice Dialer
          * Voice-Trek 2.0
          * Creative VoiceAssist
          * Voice Blaster Ver. 4.0
          * VoiceServer for Windows
          * Votan
          * Voice Processing Corporation Speech Recognition Product Line



AbbotDemo

     * Platform: SunOS4, IRIX, Linux, HU-UX
     * Description: Large vocabulary, speaker independent, continuous
       automatic speech recognition system. Uses recurrent neural networks and
       hidden Markov models with a 5,000 word vocabulary upgradable) and a
       trigram word grammar. Includes a front end for waveform capture and
       display (including spectrogram) and a graphical display of the phoneme
       representation as well as a rewriting display of the best guess word
       sequence.
     * Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster processor, 16
       bit soundcard, reasonable quality microphone and a copy of the Wall
       Street Journal newspaper.
     * Price: Free for non-commercial use
     * Availability: By anonymous ftp from

        ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo

     * Note 1: This is not a complete system for dictation.
     * Note 2: At present there are no sources with this distribution. For
       sources for an earlier version see the recnet entry.
     * Note 3: Not supported.
     * Contact: AbbotDemo@compute.demon.co.uk
       Tony Robinson
       Cambridge University Engineering Department
       Trumpington Street, Cambridge, CB2 1PZ, UK
       Tel: +44-1223-332815 Fax: +44-1223-332662



BBN Hark Telephony Recognizer

     * Platform: Available for Unix-based workstation and PC hardware
       platforms including IBM RS6000/AIX and Pentium/SCO Unix.
     * Description: Large vocabulary (2,000+ words), speaker independent,
       continuous ASR software. Specifically designed for large scale
       telephony applications. Using a client/server architecture, all
       features and capabilities are integrated in one software product
       instead of on separate boards. Very memory efficient, the Hark
       Telephony Recognizer runs in as little as 2MB of physical memory.
       Multiple recognizers can be run on a single platform. Uses Hidden
       Markov Model and phoneme-based BBN recognition algorithms. An API is
       provided for integration with existing applications. A developer's
       toolkit is available.
     * Price and availability: Price varies depending on vocabulary size.
       Version 3.0 available immediately.
     * Misc: BBN Hark provides application design and human factors consulting
       services. Regular monthly training classes on developing speech-enabled
       applications are held at BBN Hark's Cambridge (Mass) headquarters.
     * WWW: For additional information, see BBN Hark's home page on the Web at
       http://www.bbn.com/bbn_hark/HarkHome.html.
     * Contact: 

    BBN Hark Systems
    70 Fawcett Street, Cambridge, MA 02138
    Tel: 617-873-4636 Fax: 617-873-2473
    WWW: http://www.bbn.com/bbn_hark/HarkHome.html



Corona Speech Recognition System

     * Platform: Unknown
     * Description: The Corona System is a UNIX-based, multi-channel
       recognition system designed for telephony-based applications. It
       features speaker-independent, continuous speech recognition over
       standard telephone lines and includes a natural language understanding
       capability. The natural language capability significantly enhances
       throughput performance of the application and makes life easier for the
       application developer.
     * Price and availability: Unknown
     * Contact: Corona Corp.
       Menlo Park, CA
       Tel: (415) 462 8200 Fax: (415) 462 8201



Custom Voice(TM) by A&G Graphics Interface

     * Description: Speech recognition custom control for Visual Basic, Visual
       C++, Borland C++, and other development platforms that support *.VBX.
       Provides an engine/proprietary independent development platform for
       speech recognition. Currently supports ICSS, but should soon support
       other platforms. Includes a grammar debugger and parser APIs to parse
       spoken speech into useful data types.
     * Requirements: Visual Basic or any development platform that supports
       VBX.
     * Price: $US495 or $695 bundled with ICSS.
     * Contact: 

    A&G Graphics Interface
    51 Gore Street, Cambridge, MA, 02139, USA
    (617) 492-0120



D6006 Voice Control Processor

     * Misc: Is this chip from the same manufacturer as the D6106 which is
       described in Jean-Pierre Lereboullet's document on Voice Recognition
       Processors?
     * Contact: DSP Telecommunications Inc.
       2855 Kifer Road, Suite 202, Santa Clara CA 95051, USA
       Tel:(408)986-4310
       Fax:(408)986-4324



DATAVOX - French

     * Platform: PC
     * Description: Continuous speech - speaker independent or dependent.
     * Rough Cost: ?
     * Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an A/D -
       D/A module (ASA116)
     * Misc: Application software may dialog with DATAVOX through 2 types of
       interfaces :
          + Keyboard overlay: The application software may be used with any PC
            compatible package. No specific adaptation is necessary, you only
            need to define your configuration with the application software.
          + C library: Allows a user-written program to drive the recognition
            system.
   DATAVOX is based on the AMADEUS speech recognition software developed at
       LIMSI. It provides
          + Continuous speech recognition with 500 words speaker dependent, 50
            words speaker independent (custom-made vocabulary).
          + Grammar of the application language (syntax acquisition,
            verification and simplification software).
          + Large vocabulary : DATAVOX can recognize vocabularies of several
            thousand words as long as there are no more than 500 words in the
            active vocabulary at any given node. It takes less than 1 second
            to change syntax and vocabulary.
          + Training controlled by the system (use of co-articulation models).
          + Response time less than 500 ms for any phrase length.
          + Synthetis (ADPCM) can be heard simultaneously while recognition is
            being carried out.
     * Contact:

    VECSYS
    Le Chene rond, 91570 Bievres, France
    Fax: 33 1 69 41 24 30
    Voice: 33 1 69 41 15 04



Digital Dreams Speech Recognition Plug-Ins

     * Platform: Apple Quadra AV or Power Macintosh
     * Description: A suite of speech plug-ins for the interactive multimedia
       market which enable developers to quickly incorporate speech
       recognition into their titles without having to resort to a low-level
       programming language, such as C. Speech plug-ins bridge the gap between
       a speech recognition API, such as Apple's PlainTalk Speech Recognition
       technology, and authoring/development environments, such as Macromedia
       Director or HyperCard. Digital Dreams currently offers Macintosh speech
       plug-ins for Macromedia Director and HyperCard. Support for other
       environments, including AppleScript, Apple Media Tool, Authorware, and
       Windows is being developed. Currently available for North American
       Adult English. More information is available on the Digital Dreams WWW
       site.
     * Requirements: Apple's PlainTalk Speech Recognition extension.
     * Cost: Single User License $200
     * Contact: Digital Dreams
       4308 Harbord Drive, Oakland, CA, 94618, USA
       Tel: (510) 547-6929 Fax: (510) 547-6799
       email: dreams@emf.net
       WWW: http://www.emf.net/~dreams/
       FTP: ftp://emf.net/users/dreams



DragonDictate version 3.0

     * Platform: PC / DOS
     * Description: Speaker-adaptive recognition system for discrete speech.
       Provides 110,000 word dictionary and also allows user to add words.
       Active vocabulary of 5,000, 30,000, or 60,000 words. Allows dictation
       into almost all DOS applications (word processors, spreadsheets, etc.)
       and hands-free operation of the PC. Specialized medical and legal
       vocabularies are available as add-on products. More information on the
       Dragon Systems WWW pages.
     * Cost: Prices including audio board and high-quality headset microphone:
          + 5,000 word Starter Edition: US$695
          + 30,000 word Classic Edition: US$995
          + 60,000 word Power Edition: US$1,995
          + Medical vocabulary add-on: US$495
          + Legal vocabulary add-on: US$495
     * See also: DragonDictate for Windows and DragonVoiceTools.
     * Requirements: Minimum of 33 Mhz 486 with 8-16M memory and at least 29M
       disk space (depending on product), one 8-bit slot, DOS 5.0 and up (also
       runs in a DOS box under Windows or OS/2).
     * Contact: Dragon Systems, Inc.

    320 Nevada Street, Newton, MA 02160, USA
    Tel: 1-617-965-5200 or 1-800-TALK-TYP
    Fax: 1-617-527-0372
    Email: info@dragonsys.com
    WWW: http://www.dragonsys.com/
    CompuServe: GO DRAGON

   Note: Simon Crosby maintains an FAQ for DragonDictate:
   http://www.cl.cam.ac.uk/users/sac/dd-faq.html



DragonDictate for Windows

     * Platform: PC
     * Description: Speech-to-text dictation system. Discrete dictation;
       continuous command/control; speaker-adaptive. Also provides mouse
       movement for hands-free operation of Windows. Comes with a 120,000 word
       pronunciation dictionary; users can also add their own words or
       phrases. Dictate directly into any application. Available in US and UK
       English, French, Italian, German, Spanish, and Swedish. More
       information on the Dragon Systems WWW pages.
     * Requirements: 486/66, 7-10 MB dedicated RAM (depending on edition),
       Windows 3.1x or 95. Supported sound boards: Creative Labs Sound Blaster
       16, Microsoft Windows Sound System, IBM M-Audio Capture/Playback
       Adapter.
     * Rough Cost: Prices including software, documentation and microphone:
          + DragonDictate Personal Edition (10,000 words active) - $395
          + DragonDictate Classic Edition (30,000 words active) - $695
          + DragonDictate Power Edition (60,000 words active) - $1,695
     * See also: DragonDictate and DragonVoiceTools. Simon Crosby maintains an
       FAQ for DragonDictate:

                http://www.cl.cam.ac.uk/users/sac/dd-faq.html

     * Contact: Dragon Systems, Inc.
       320 Nevada Street, Newton, MA 02160, USA
       Tel: 1-617-965-5200 or 1-800-TALK-TYP
       Fax: 1-617-527-0372
       Email: info@dragonsys.com
       WWW: http://www.dragonsys.com/
       CompuServe: GO DRAGON



DragonVoiceTools

     * Platform: PC
     * Description: Programmer's toolkit for developing speech-aware DOS or
       Windows applications. Recognizes continuously spoken digits and
       discretely spoken words or phrases. Up to 1,000 words can be active at
       one time. Use words from 110,000 word dictionary (included) and/or
       develop your own word models. More information on the Dragon Systems
       WWW pages.
     * Requirements: Minimum of 20 Mhz 386 (larger vocabulary requires faster
       processor) with at least 5M memory and at least 19M disk space
       (depending on vocabulary size), DOS 5.0 and up, Windows 3.1 and up,
       Borland C or C++ or Microsoft C or C++. DOS applications require IBM
       M-ACPA card available from IBM or Dragon Systems ($325). Windows
       applications can use industry-standard sound cards (supported are
       Creative Labs Sound Blaster 16 and Windows Sound System) or M-ACPA
       card.
     * Cost:
          + Developer's kit: US$495
          + End-user system: $US195
     * See also: DragonDictate and DragonDictate for Windows
     * Contact: Dragon Systems, Inc.

    320 Nevada Street, Newton, MA 02160, USA
    Tel: 1-617-965-5200 or 1-800-TALK-TYP
    Fax: 1-617-527-0372
    Email: info@dragonsys.com
    WWW: http://www.dragonsys.com/
    CompuServe: GO DRAGON

   Note: Simon Crosby maintains an FAQ for DragonDictate:
   http://www.cl.cam.ac.uk/users/sac/dd-faq.html



DSP Semiconductor Recognition Chip

     * Description: Up to 128 word vocabulary, however, the recommended size
       is 16 words. Requires external memory, a codec and an audio amplifier.
       Speaker dependent recognition.
     * Cost: US$18 in quantities
     * Producer: 

    DSP Semiconductor (no contact details)



EARS: Single Word Recognition Package

     * Platform: UNIX
     * Description: Intended as a limited ready-to-use single word recognizer.
       However, its design aims at being a platform for various kinds of
       methods used in speech recognition (SR). EARS is designed to be a
       flexible environment for recognition system components; for example,
       take this feature extractor and that recognizing method, and this list
       of words. New methods for single word recognition can be integrated
       easily, as EARS uses C++ abstract base classes. You speak the words you
       want to be recognized later. Your utterances can be saved to RIFF WAV
       files for inspection, change or delete them before they are further
       processed to the pattern files on which the recognizer is finally
       trained. As of version 0.15, the feature extractors are: Rasta-PLP,
       PLP, LPC, Mel-Cepstrum. The implemented recognizers are: DTW and non-
       recurrent neural nets on fixed-size sound patterns.
     * Misc: The current version is an ALPHA release.
     * Requirements: AF audio server software (see Q1.11) and the OGI Speech
       Tools (see Q1.9)
     * Availability: by anonymous ftp

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0
                .15.tar.gz

                ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.15.ta
                r.gz

     * Contact: Ralf W. Stephan: ralf@ark.franken.de



HM2007 - Speech Recognition Chip

     * Description: HM2007 is a 48-pin single chip CMOS voice recognition LSI
       circuit with on-chip analog front end, voice analysis, recognition
       process and system control functions. A 40 word isolated-word voice
       recognition system can be composed of an external microphone, keyboard,
       SRAM and a few other components. When combined with a microprocessor,
       an intelligent recognition system can be built. A demo board for this
       chip is being distributed by The Summa Group.
     * Cost: Approx US$16 for the HM2007 and US$160 for the demo board.
     * Misc: Jean-Pierre Lereboullet's document on Voice Recognition
       Processors provides additional information on the HM2007.
     * Note: Several people have reported problems in obtaining small numbers
       of this chip (say less than 10).
     * Producer: HUALON Microelectronic Corp. USA
       Tel: (415) 288 0390 Fax: (415) 288-0399
     * Distributor 1: Marywale Engineering Company
       Tel: (602) 247 4451 Fax: (602) 247 6167
       Email: meco@indirect.com
     * Distributor 2: The Summa Group Limited
       One California Street, Suite #1940,
       San Francisco, CA 94111
       Ph: (415) 288-0390



Entropic's HTK (HMM Toolkit)

     * Platform: Range of Unix platforms.
     * Description: HTK is a software toolkit for building continuous density
       HMM based speech recognisers. It consists of a number of library
       modules and a number of tools. Functions include speech analysis,
       training tools, recognition tools, results analysis, and an interactive
       tool for speech labelling. Many standard forms of continuous density
       HMM are possible. Can perform isolated word or connected word speech
       recognition. It van model whole words, sub- word units. Can perform
       speaker verification and other pattern recognition work using HMMs. HTK
       is now integerated with the ESPS/Waves speech research environment
       which is described in Section 1.9.
     * Misc 1: The availability of HTK changed in early 1993 when Entropic
       obtained exclusive marketing rights to HTK from the developers at
       Cambridge.
     * Misc 2: More detailed information on HTK is available from the Entropic
       WW server: http://www.entropic.com/htk.html
     * Cost: On request.
     * Contact: 

    Entropic Research Laboratory,
    600 Pennsylvania Ave, S.E. Suite 202,
    Washington, D.C. 20003, USA
    Phone: (202) 547-1420.
    email - info@entropic.com
    WWW: http://www.entropic.com/



IBM VoiceType Dictation

     * Platform: Intel I486 with IBM OS/2, Windows or Windows95
     * Description: Independent Speaker, discrete speech dictation with
       navigation. Navigation does not require setup, most applications are
       automatically speech enabled by dynamic control analysis. Dictation
       averages 70WPM with 95% accuracy and uses statistical trigram
       modelling. The base system is 22K words. Laptop support through PCMCIA
       DSP Card.
       Additional specialised vocabularies available.
          + US: Legal, Emergency Medicine, Radiology and Journalism
          + UK: Legal
          + IT: Radiology
     * Requirements: 486SX or above, 16MB Ram, 30MB File space, Dictation
       Adapter
     * Cost: Software $495 (includes mic) / Hardware $495
     * Misc 1: Based on IBM Tangora Technology
     * Misc 2: Available as Osborne Personal Dictation System in Australia
     * Availability: US English. Other languages (UK, FR, GR, IT, and ES)
       available 3Q94.
     * Contact: US Contact 1-800-TALK-2-ME or 1-914-766-9252.



ICSS system from IBM

     * Description: A large vocabulary, speaker independent, continuous speech
       system which runs under Windows, OS/2, and AIX.
     * Requirements: Soundboard (e.g. Soundblaster)
     * Price: $US319
     * Contact: 

    A&G Graphics Interface
    ICSS Reseller
    51 Gore Street, Cambridge, MA, 02139, USA
    (617) 492-0120



IN3 Voice Command

     * Platform: Sun SPARCstation
     * Description: IN3 provides a secure, robust, word spotting, continuous
       speech recognition facility for the Sun OS or Solaris operating
       systems. The recognition system is a secure operating system facility
       capable of working with various interfaces, microphones, and devices.
       The operating system interface works with native UNIX outside of X
       Windows as well as provides enhanced X Windows facilities including
       named window support. The user interface provides a means to quickly
       create commands on the fly for replacing long strings and complex
       operations with voice macros. [Voice macros can reduce the strain of
       repetitive stress injuries (RSI) such as Carpel Tunnel Syndrome (CTS)
       by replacing heavy repetitive keyboard hammering with simple voice
       operations. ] The IN3 user interface works with generic X servers and
       window managers. A developer API is also available for creating
       voice-enabled applications, interfacing with other audio sources, and
       providing extensive application control over the recognition facility.
     * Availability: SunSite archive at SunSITE.unc.edu as well as on Catalyst
       CDware as both a runable demo and unlockable software.
     * Hardware Required: Sun SPARCstation with audio input. Noise canceling
       microphone recommended but not required.
     * Software Required:
          + Sun OS 4.1.2 with OpenWindows 3.0
          + or, Sun OS 4.1.3
          + or, Solaris 2.1 or Solaris 2.2
     * Misc: An equivilant MS-Windows product is described below.
     * Price: $495 U.S.
     * Contact: 

    Brantley Kelly
    Email: cbk@gacc.atl.ga.us CIS: 75120,431
    FAX: 1-404-925-7924 Phone: 1-404-813-8030
    Command Corp. Inc, 3675 Crestwood Parkway, Duluth GA 30136, USA



IN3 Voice Command for Windows

     * Platform: PC with Windows 3.1
     * Description: IN3 is now available for MS-Windows. Users can call
       applications to the foreground with voice commands. Once the
       application is called, the user may enter commands and data with voice
       commands. Voice macros can reduce the strain of repetitive stress
       injuries (RSI) such as Carpel Tunnel Syndrome (CTS) by replacing heavy
       repetitive keyboard hammering with simple voice operations. Voice
       macros take complex operations and reduce them to simple verbal
       commands. Voice input can provide new facilities for tasks which could
       not easily have been otherwise performed without the multiple axis of
       input. IN3 is hardware-independent, users with any Windows-compatible
       audio add speech recognition to the desktop. IN3 works with either 8
       bit or 16 bit Windows audio boards. IN3 is based on continuous
       word-spotting technology. A developer API is also available for
       creating voice-enabled applications.
     * Price: $179 U.S.
     * Requirements: PC with 80386 processor or better, Microsoft Windows 3.1,
       and Windows compatible audio system with microphone.
     * Misc: Fully functional demos are available on Compuserve in various
       Multimedia and CAD forums. Demos are also available from "America on
       Line", the comp.binaries.ms-windows archive sites, and various BBS
       systems. It is also available by anonymous ftp

                ftp://ftp.wustl.edu/usenet/comp.binaries.ms-windows/v3/in3dem
                o.zip 

                 ftp://ftp.uwasa.fi/mirror/ultrasound/demo/in3demo.zip

   An equivilant Sun product is described above.
     * Contact: 

    Brantley Kelly
    Email: cbk@gacc.atl.ga.us CIS: 75120,431
    FAX: 1-404-925-7924 Phone: 1-404-925-7950
    Command Corp. Inc, 3675 Crestwood Parkway, Duluth GA 30136, USA



Kurzweil Voice for Windows

     * Platform: MS Windows 3.1
     * Description: Kurzweil Voice for Windows is a dictation product enabling
       the user to create text and enter data by speaking to Windows-based
       applications. System is adaptive but requires no initial training.
       Users can choose either 30,000 or 60,000 word active vocabulary.
       Application command translation templates for popular Windows
       application such as WordPerfect, 1-2-3, Organizer, Word.
     * Cost: US $995
     * Hardware: 486DX/33 or higher, 8 or 16 MB dedicated memory (depends on
       vocabulary, 30 MBs dedicated disk space, VGA or higher,
       Kurzweil-supplied microphone and DSP board.
     * Contact:

    Phone: 1-800-380-1234
    Email: info@kurz-ai.com



Lernout & Hauspie ASR 1000/T and 1000/M

   [Note: L&H asr200/A is described below.]
     * L&H asr1000/T: ASR for the Telephony and Telecommunications Market
     * L&H asr1000/M: TTS for the Computer and Multimedia Market
     * Description: Automatic speech recognition software providing continuous
       speech recognition, isolated word recognition, keyword spotting or
       continuous digits recognition. The engine is speaker independent, and
       phoneme-based with optimization for commonly used words. General
       features include:
          + Languages available: US English, German, French, Spanish
            (Castilian), Dutch.
          + Available vocabulary: >100,000 words.
          + Line adaptation.
          + Rejection of out of vocabulary/grammar words.
          + N-best alternatives for isolated word recognition and keyword
            spotting.
          + Push to talk.
     * asr1000/T
          + Single channel platform examples: Motorola 56156, TI
            TMS320C2X/C3X/C5X
          + Multi-channel platform examples: TI TMS320C3X/C5X, AT&T
            DSP32C/3210, Motorola 96000
          + Input: 8 kHz telephone sampling
     * asr1000/M
          + Single processor platform examples: Intel 486/Pentium
          + Input: 8 kHz telephone or 11 kHz microphone sampling
     * See also: L&H ASR SDK for Windows
     * Cost: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com

Lernout & Hauspie ASR 200/A for the Automotive and Industrial Market

     * Description: Automatic speech recognition software providing isolated
       word recognition, keyword spotting and alphabet recognition (optional).
       This engine is robust, speaker independent and word based. Other
       features:
          + Vocabulary: 100 words US English
          + Voice activation detection
          + Response time
          + Platform examples: Analog Devices ADSP2101/5
          + Input: 8 kHz telephone or microphone sampling
     * See also: L&H ASR SDK for Windows
     * Cost: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com



Lernout & Hauspie ASR SDK

     * Description: Windows based Software Development Kits are available for
       integrating automatic speech recognition technology with Windows based
       PC applications.
     * Requirements: IBM-compatible 486 DX/33 MHz + 8 MB RAM + MS DOS 5.0 + MS
       Windows 3.1 (or higher) + Sound Blaster compatible sound board.
     * See also: L&H ASR Products
     * Cost: Unknown
     * Contact: 

    Lernout & Hauspie Speech Products
    800 West Cummings Park, Suite 3100
    Woburn, MA 01801, USA
    Tel: (617) 932 4118
    Fax: (617) 932 9209
    Email: sales@lhs.com



Listen for Windows 2.0 - Verbex Voice Systems

     * Platform: Windows
     * Description: Listen for Windows Version 2.0 is a Speaker Independent
       software product that provides continuous speech recognition for
       Windows applications. The product works with most industry standard
       sound cards and PCs with inbedded audio chips. Listen for Windows comes
       with over 16,000 commands in speech interfaces for over 40 software
       applications, such as MS Office, Lotus SmartSuite,Quicken, etc. The
       Listen Command Editor allows a user to change or add commands to
       existing speech interfaces or create new speech interfaces for most
       Windows applications.
     * Requirements: 486/25SX PC or higher
     * Cost: $99 without a microphone or $139 with either a desktop microphone
       or headset
     * Contact: 

    Verbex Voice Systems
    1090 King Georges Post Rd., Bldg 107,
    Edison NJ 08837, USA
    Tel: 1-800-ASK-VRBX
    Tel:(908) 225-5225
    Fax:(908) 225-7764



Lotec Speech Recognition Package

     * Platform: Sun
     * Description: Public domain speech recognition software. Operates from
       input in Sun audio format (.au files) and outputs word hypotheses and
       time labelling data. The software includes programs to collect speech
       samples, a labeller, a "featurizer" which parameterises speech files, a
       word spotter and the recogniser. The software can perform real time
       recognition on a Sparc 10 for small vocabularies.
     * Requirements: Sun SPARC audio input and a "decent" microphone Sun
       multimedia demo software (in /usr/demo/SOUND) and X.
     * Availability: By anonymous ftp

                ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z 

     * Contact: Nigel Ward: nigel@sanpo.t.u-tokyo.ac.jp



Myers' Hidden Markov Model software

     * Description: Hidden Markov model software for automatic speech
       recognition. C++ code that implements a basic left-right hidden Markov
       model and corresponding Baum-Welch (ML) training algorithm. It is meant
       as an example of the HMM algorithms described by L.Rabiner and others.
       The code was built in order to learn how HMM systems work and we are
       now offering it to the net so that others can learn how to use HMMs for
       speech recognition. Keep in mind that ease of understanding was our
       primary concern, not efficiency. The code can be used to build an
       experimental speech recognition systems using "train_hmm" and
       "test_hmm", and can be used in conjunction with written tutorials on
       HMMs to understand how they work.
     * Availability: By anonymous ftp from the comp.speech archive site. There
       are two files in the directory
          + ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
   The files are
          + hmm.README
          + hmm-1.03.tar.gz
     * Contact: Richard Myers: email: rmyers@isx.edu



NCC Dictate

     * Platform: Windows
     * Description: NCC Digital DictateTM is an add-on, enhanced interface for
       use with IBM's VoiceType(TM) Dictation for Windows and various Windows
       3.1 applications (e.g. MS Word, WordPerfect). Digital DictateTM
       provides faster corrections and dictation rates and various other
       features. This version is not a stand alone product; it requires
       VoiceTypeTM Dictation to provide the speech recognition engine and the
       Windows application. Features include:
          + Direct dictation into Windows applications with access to all
            functions while dictating.
          + Versions for MS Word, WordPerfect, Ami Pro, and other Windows
            applications.
          + Speech enabled editing.
          + Capability to save speaker models and defer corrections.
          + Microphone "pause and restore" functions controlled with speech
            commands.
          + Add-on vocabularies for legal, medical, science and business.
          + SWITCH-ITTM foot pedal control or CardSwitchTM infrared wireless
            control available which switch between dictation and
            proofing/correction modes.
     * Requirements: IBM's VoiceTypeTM Dictation for Windows; a computer
       system meeting VoiceTypeTM Dictation for Windows requirements;
       VoiceTypeTM Dictation Adapter.
     * Availability: Through computer dealerships.
     * Price: $US295
     * Contact: 

    NCC Incorporated
    5808 E. Turquoise, Scottsdale, AZ 85253
    Ph: (602) 922-6236 Fax: (602) 596-9050



OKI VRP6679 - Voice Recognition Processor

     * Description: Speech recognition IC. 25 words max. Speaker independent
       recognition capability. Recognition rate quoted as 97% in a noisy
       environment (e.g. a car).
     * Misc: Alias MSM6679
     * Misc 2: More information is provided in Jean-Pierre Lereboullet's
       document on Voice Recognition Processors.
     * Cost: Approx US$20. Demo board $876
     * Availability: OKI Semiconductor and OKI Distributors
       Corporate Headquarters
       785 North Mary Avenue, Sunnyvale, CA, 94086 2909
       Tel: (408) 720 1900
       Fax: (408) 720 1918



Speech Systems Phonetic Engine 500 (PE500)

     * Platform: PC
     * Description: Speaker independent, 40,000 word vocabulary, continuous
       speech recognition for MS Windows. Grammars with high perplexity
       possible. Includes noise rejection. Uses proprietary DSP board.
     *

       Cost: Prices in US$ - quantity one. The PE500 SDK is $995.00 including
       board, microphone, and runtime software. Runtime only is $595.00.
       SpeechWizard(r) adds speech input to existing Windows applications,
       $295.00. Two-day training: $295.00 with purchase, $595.00 without.
     * Misc: The user defines the grammar of allowed utterances and must write
       software to invoke the board driver functions that control recognition.
       The user must also write software to collect/parse/interpret the ASCII
       text strings returned when recognition succeeds.
     * Misc 2: SSI now offers speech application development services.
     * Contact: 

    Speech Systems, Inc.
    2945 Center Green Court South
    Boulder, CO 80301-2275, USA
    Tel: 303.938.1110 Fax: 303.938.1874
    http://www.speechsys.com



PowerSecretary

     * Platform: Centris 650, 660AV. Quadra 650, 660AV, 700,800, 840AV, 900,
       950.
     * Description: Speaker dependent/adaptive system requiring words to be
       separated by short pauses. Detailed information is available from their
       WWW page.
     * Vocabulary: 30,000 at any one time, automatically selected from
       120,000-word dictionary.
     * Cost: US$2,495; non-AV machines need an audio board will cost about
       US$300.
     * Requirements: Minimum of 16M of ram and System 7.0.
     * Contact:

    Articulate Systems
    600 W. Cummings Park, Suite 4500
    Woburn, MA 01801
    Ph: (617) 935-5656 Fax: (617) 935-0490
    WWW: http://www.artsys.com/



ProNotes Voice Tools (due late '95)

     * Platform: Windows
     * Description: ProNotes Voice Tools are designed to bring the speech
       recognition capabilities of the IBM VoiceTypeTM Dictation System for
       Windows into any program without the need for the programmer to
       directly interface with the speech engine at the API level. There are
       five tools, as described below, which are all available in three forms:
       Visual Basic(TM) Custom Controls (known as VBXs), 16-bit OLE Custom
       Controls, and 32-bit OLE Custom Controls. The tools are intended for
       use by Windows(TM) developers working with Windows 3.1(TM), Windows for
       Workgroups 3.11(TM), Windows NT 3.51 Workstation(TM), and Windows
       95(TM). The custom controls can be utilized with any application
       development environment which supports the use of such controls (e.g.
       Visual Basic and Visual C++).

        Playback and Record
                An object which allows developers to use the IBM Speech Engine
                to record and play back sound files. Can be used to add voice
                prompts and to allow end users to record and playback sound
                files.

        Voice Button
                An object having standard button properties and behavior,
                which can additionally be controlled by voice. The button can
                also be used as a label or a 3D panel.

        Dictation Window
                A text box that allows free dictation, voice macro
                utilization, and correction by voice. Each Dictation Window
                has access to global and context sensitive vocabularies for
                both command and dictation. There are three correction modes.

        Voice List Box
                Has standard list box properties and behavior, but can
                additionally be controlled by voice. A user can select items
                by pronouncing the entry's text or the entries can be numbered
                and selected accordingly.

        Voice Navigator
                Provides navigation by voice within an application developed
                with the Voice Tools, between voice-enabled objects described
                above, as well as some standard objects found within the
                application.

     * Availability: ProNotes Voice Tools is due for release before the end of
       '95.
     * Contact: 

    Pronotes, Inc.
    1546 Magee Avenue, Philadelphia, PA 19149
    Ph: (215)-533-8569
    proinfo@pronotes.com



PureSpeech 2.0 Recognition Engine

     * Platform: Windows 3.1, Windows 95, Unix, Dialogic Antares DSP
     * Description: Speaker-independent, continuous speech, large active
       vocabulary speech recognition engine for American English. Permits
       on-the-fly additions to the vocabulary using phonetic models and
       telephone or wideband microphone input. Flexible grammar, natural
       language processing, discourse models. Software only with a small
       RAM/CPU footprint. Can be used as a voice user interfaces (VUI's) for
       PC software applications. Can also be used for high-volume call center
       telephony, especially in banks, finance and other specialized
       applications.
     * Availability: PureSpeech is not available as a stand-alone product. It
       is embedded in other Windows-based software.
     * Contact: 

    PureSpeech, Inc
    100 CambridgePark Drive, Cambridge, MA 02140, USA
    Ph: (617) 441-0000 Fax: (617) 441-0001



recnet

     * Platform: UNIX
     * Description: Speech recognition for the speaker independent TIMIT and
       Resource Management tasks. It uses recurrent networks to estimate phone
       probabilities and Markov models to find the most probable sequence of
       phones or words. The system is a snapshot of evolving research code.
       There is no documentation other than published research papers. The
       components are:
          + A preprocessor which implements many standard and many non-
            standard front end processing techniques.
          + A recurrent net recogniser and parameter files
          + Two Markov model based recognisers, one for phone recognition and
            one for word recognition
          + A dynamic programming scoring package. The complete system
            performs competatively.
     * Cost: Free
     * Requirements: TIMIT and Resource Management databases
     * Contact: Tony Robinson: ajr@eng.cam.ac.uk
     * Availability: by anonymous ftp

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/recnet
                -1.3.tar.Z 



SayIt

     * Platform: Sun SPARCstation - SunOS 4.1.x ONLY - SayIt uses NeWS which
       is no longer available on Solaris 2.x
     * Description: Voice recognition and macro building package for Suns in
       the Openwindows 3.0 environment. Speaker dependent discrete speech
       recognition. Vocabularies can be associated to applications and the
       active vocabulary follows the application that has input focus. Macros
       can include mouse commands, keystrokes, Unix commands, sound,
       Openwindow actions and more. An evaluation copy is available by email.
     * Hardware: Microphone required (SunMicrophone is fine).
     * Cost: $US295
     * Contact: 

    Phone: 1-800-245-UNIX or 1-415-572-0200
    Fax: 1-415-572-1300
    Email: info@qualix.com 
    WWW: http://www.qualix.com/ 



Simon Says - for NeXT

     * Platform: NeXT
     * Description: Provides the ability to link commands to spoken phrases.
     * Cost: Unknown
     * Availability:By anonymous ftp

        Simon Says demo
                ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/A
                udio/audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz

        Readme file
                ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/A
                udio/audio-apps/SimonSaysDemo.1.5.1.README

     * Contact: Metrosoft

    710 13th Street, Suite 310 X, San Diego, California 92101
    Ph: 619.488.9411 Fax: 619.488.3045
    Email: info@metrosoft.com [NeXTmail welcome]



Speech Commander - Verbex Voice Systems

     * Platform: Various - Serial Port connection
     * Description: A hand-held (portable) device about the size of a
       paperback book which provides speaker-dependent continuous speech
       recognition. The device connects through a serial port, so it can be
       connected to a wide range of computers. It comes with a battery pack.
     * Misc: Could someone please provide more detailed information on vocab
       size, training etc?
     * Contact: 

    Verbex Voice Systems
    1090 King Georges Post Rd., Bldg 107,
    Edison NJ 08837, USA
    Tel:(908)225-5225
    Fax:(908)225-7764



'Speech Recognition Expert' Toolkit for Windows

     * Description: Provides an object-oriented development tool designed to
       rapidly build speech enabled applications without writting source code.
       Currently supports IBM's VoiceType Application Factory. Future versions
       to support other platforms. Includes BlackBox library and Custom
       Grammar Tools.
     * Requirements: Layout for Windows from Objects, Inc.
     * Price: $US349 + Shipping/Handling
     * Contact: Speech Technologies, Inc.
       P.O. Box 3905
       Naperville, IL 60567-3905
       CompuServe @102147,3521
       Ph: (708)983-7634



Visual Voice from Stylus Innovation

     * Platform: Microsoft Windows
     *

       Description: Visual Voice is a toolkit for building Windows-based voice
       processing and telephony applications including interactive voice
       response (e.g. touch-tone banking), fax-on-demand, and voice mail.
       Visual Voice can be used to add voice recognition to your telephony
       applications.
       Voice Recognition (VR) Support for Visual Voice is exposed as a
       standard VBX control and provides one or more voice recognition
       "resources" to your application. Applications can dynamically assign
       resources across several voice lines. Voice recognition is either
       "discrete" or "continuous". Discrete recognition is slightly more
       accurate and requires the speaker to pause briefly between words.
       Continuous recognition provides a natural way to enter information by
       speaking without pauses. Three configurations are supported:

        Software-Only Solution
                The software only solution uses Telaccount's SpeechEasy
                technology for discrete recognition using your PC's CPU. A
                vocabulary is included with digits, basic command words and
                more.

        Hardware-Assisted Solution with Dialogic AEB boards
                Discrete voice recognition in over 25 languages using Dialogic
                D/41D voice boards and the Dialogic VR/40 board. Vocabularies
                are included with digits, basic command words, voice mail
                vocabulary and more.

        Hardware-Assisted Solution with Dialogic PEB boards
                Use the VR control with any Dialogic PEB-based voice board,
                such as the D/12x or D/24x, to access voice recognition
                resources from your phone lines. This requires a Dialogic VRP
                board with either 1 to 4 VRM/40 modules (4 channel discrete
                voice recognition modules) and/or 1 to 4 VRM/2C modules (2
                channel continuous voice recognition modules). You can have up
                to 4 modules on each VRP: 4 VRM/40s for 16 channels of
                discrete voice recognition; 4 VRM/2Cs for 8 channels of
                continuous recognition; or a combination. Over 25 languages
                supported. Includes vocabularies as described above.

     * Pricing: Unknown
     * Availability: From Stylus Innovations Inc. or from the distributors
       listed on the Stylus WWW pages.
     * Misc: More detailed technical information, slide show demonstration
       software is available on the WWW

                http://www.stylus.com/stylus/

     * Contact: 

    Stylus Innovation Inc.
    One Kendall Square, Building 300, Cambridge, MA 02139
    Ph: (617) 621 9545 Fax: (617) 621 7862
    WWW: http://www.stylus.com/stylus/
    Compuserve forum: GO STYLUS
    Email: info@stylus.com



Voice Command Line Interface

     * Platform: Amiga
     * Description: VCLI will execute CLI commands, ARexx commands, or ARexx
       scripts by voice command through your audio digitizer. VCLI allows you
       to launch multiple applications or control any program with an ARexx
       capability entirely by spoken voice command. VCLI is fully multitasking
       and will run in the background, continuously listening for your voice
       commands even while other programs are running. Documentation is
       provided in AmigaGuide format. VCLI 6.0 runs under either Amiga DOS 2.0
       or 3.0.
     * Cost: Free?
     * Requirements: Supports the DSS8, PerfectSound 3, Sound Master, Sound
       Magic, and Generic audio digitizers.
     * Availability: by ftp from wuarchive.wustl.edu in the file
       systems/amiga/incoming/audio/VCLI60.lha and from amiga.physik.unizh.ch
       as the file pub/aminet/util/misc/VCLI60.lha
     * Contact: Author's email is RHorne@cup.portal.com



Voice Control Systems Continuous Speech Recognition

     * Description: Voice Control Systems (VCS) continuous speech recognition
       is a proprietary phonetic recognizer based on technology developed at
       VCS over the last 17 years. It is robust for applications such as the
       "hands-free" automotive environment or telephone networks, both
       wireless and wireline. VCS speech recognition is used by many
       developers and manufacturers in telecommunications. VCS technology is a
       software-based capability which VCS has currently developed for a
       limited number of processing environments. VCS offers "off-the-shelf"
       capabilities for the TI-C3X and C4X DSPs with other hardware platform
       support planned for the future. As a benchmark, today's VCS continuous
       technology requires about 1/2 of a 33Mhz TMS320C31. VCS continuous
       technology is available in cellular and wireline based libraries for
       continuous digit input in approximately 15 languages. VCS continuous
       recognition is a modified HMM decision strategy built upon the
       foundation of VCS phonetic "front end".
     * Availability: VCS continuous technology is available today in software
       form from VCS or implemented in hardware or speech systems from VCS
       distributors including Dialogic Corporation, Brite Voice, Intervoice,
       Periphonics, and Syntellect.
     * Cost: Software royalties are volume based and range from per unit costs
       of $500 per recognizer to less than $5 in large quantities.
     * See also: the VCS Phonetic Dictionary Recognizer and VCS Isolated Word
       Speech Recognition below, and the VCS 2030 & 2060 Voice Dialers.
     * Contact: 

    Voice Control Systems, Inc.
    14140 Midway Rd., Dallas, Tx. 75244, USA
    Ph: +1-214-386-0300 Fax: +1-214-386-5555
    Email: sales@vcsi.com

Voice Control Systems Phonetic Dictionary Recognizer

     * Description: This recognizer is based upon a HMM type recognition
       strategy coupled with the VCS "front end" (feature extraction
       software). The HMM modeling is based upon the basic phonetic building
       blocks in each language. In American English this is approximately 43
       units. The recognition vocabulary is built up by combining these units
       into word models. By building the words in this way new recognition
       vocabularies may be constructed. The phonetic assembly can also be used
       for "word spotting" recognition libraries.
     * Platform: This VCS recognition software runs on the TI TMS320C30 DSP.
       Two recognizers can operate on a single 55mhz C30. Currently the
       software may be purchased as an Enhanced Technology from VCS to run on
       the Dialogic VR/160p speech recognizer board. The hardware is purchased
       from Dialogic, with the "Enhanced" software purchased from VCS. Up to
       four phonetic recognizers can run on a single 160; one per VRM2C
       (C30-33mhz DSP) daughtercard.
     * Note: This recognizer is in its late "beta" stage of development and is
       available for U.S. English vocabularies. Other languages are presently
       under development.
     * Price: VCS software is priced at $350 per recognizer for unit
       quantities with volume discounts available.
     * See also: VCS Continuous Recognition above, VCS Isolated Word Speech
       Recognition below, and the VCS 2030 & 2060 Voice Dialers.
     * Contact: 

    Voice Control Systems, Inc.
    14140 Midway Rd., Dallas, Tx. 75244, USA
    Ph: +1-214-386-0300 Fax: +1-214-386-5555
    Email: sales@vcsi.com

Voice Control Systems Isolated Word Speech Recognition

     * Description: Voice Control Systems (VCS) isolated word recognition
       using VCS phonetic recognizer technology. It is robust in demanding
       environments such as the "hands-free" automotive environment, telephone
       networks, wireless or wireline. Capabilities include
       speaker-independent, speaker-dependent and speaker-adaptive
       recognition. Libraries are available for 45+ languages and custom
       vocabulary development services are available. The technology is suited
       for many applications including:
          + Desktop computing: such as keyboard accelerators orinteractive
            multimedia.
          + Network telephony: such as automating operator functions or voice
            dialing.
          + Computer telephony: such as remote access to a personal computers.
          + Automotive accessory control: such as voice activated cellular
            phones or other automotive accessories.
          + Consumer electronics: such as voice controllers for video games or
            VCRs and televisions.
     * Platform: Include Intel-X86, TI-C5X, C3X, C4X and C2X, OKI 6679, and
       NEC-V20 and V30, and can operate on 16 bit microcontrollers. As a
       benchmark, 8 recognizers can run on an Intel 486-33 DX.
     * Availability: The technology is available under software licenses
       direct from VCS or by purchasing hardware from an OEM. VCS OEMs
       include: Dialogic, Oki Semiconductor, Intervoice, Periphonics, etc.
     * Cost: VCS isolated word recognition software is available under a
       volume pricing license agreement. Small quantity royalties are in the
       $500.00 per recognizer range while large (millions) quantity royalties
       are less than $1.00 per recognizer.
     * See also: VCS Continuous Speech Recognition and VCS Phonetic Dictionary
       Recognizer above, and the VCS 2030 & 2060 Voice Dialers.
     * Contact: 

    Voice Control Systems, Inc.
    14140 Midway Rd., Dallas, Tx. 75244, USA
    Ph: +1-214-386-0300 Fax: +1-214-386-5555
    Email: sales@vcsi.com



Visus SpeechKit

     * Platform: NeXT
     * Description: SpeechKit is based on SPHINX, a speaker-independent, 1000
       word or so, continuous speech recognition system which allows you to
       incorporate speech recognition into your applications. You can design
       your vocabulary and grammars.
     * Contact: Visus - no address or phone provided. A possible contact is
       Robert Brennan at Carnegie Mellon University. email:
       Robert_Brennan@cmu.edu



VCS 2060 Voice Dialer

VCS 2030 Voice Dialer

     * Platform: Stand-alone unit, TMS320C5X based with VCS phonetic speech
       recognition and CELP speech compression.
     * Description: The VCS 2060 is a telephone dialing system which
       recognizes 50 names - and speed dials the associated telephone number.
       The VCS 2030 has 20 memories. Users use speaker-independent recognition
       to select the "call", "program", or "list" menu, then place a call,
       enroll a new memory, or listen to playback of entries in the phonebook.
       Enrollment is simple and includes a "name tag" enrollment pass so that
       when one selects an entry to call, the selection is confirmed by
       repeating the memory's associated name tag, e.g. "calling Pete". The
       system uses both speaker-independent and speaker-dependent technology
       from Voice Control Systems, Inc.
     * Installation: The VCS 2060 can be installed in series (RJ-11) with one
       phone for single phone operation or installed in parallel (RJ-31) to
       provide voice dialing from every phone in a house.
     * Cost: Standard retail prices:
          + VCS 2030 Voice Dialer - $269.00
          + VCS 2060 Voice Dialer - $299.00
     * Availability: From catalogs or direct from Voice Control Systems.

    Voice Control Systems
    14140 Midway Rd., Dallas, Tx. 75225, USA
    Ph: 800-VCS-7525 Fax: 214-386-5555
    Email: sales@vcsi.com



Voice-Trek 2.0

     * Platform: ?
     * Description: ?
     * Contact: 

    Tardis Technology Inc., Voice Recognition Div.
    10321 Los Alamitos Blvd., Los Alamitos CA 90720
    Tel:(310)799-3355 Fax:(310)799-3360



Creative VoiceAssist

     * Platform: PC (?)
     * Price: $US99.95
     * Contact:

    Creative Labs
    Ph: 1-800-998-5227



Voice Blaster Ver. 4.0

     * Platform: IBM AT or higher, DOS or Wndows 3.1
     * Description: Uses a Sound Blaster or compatible board. Contains a
       microphone headset and a connector for LPT1:. A printer can still be
       used on LPT1:. Will recognize 1024 words that are trained by the
       operator. Each word activates a macro that can enter an ascii word on
       the screen or into a word processor or invoke a batch file. An optional
       footswitch may be installed. Software to run under DOS or Windows 3.1
       is included.
     * Cost: Unknown
     * Contact: Unknown (original supplier has been sold)



VoiceServer for Windows

     * Platform: PC
     * Description: Speaker dependent, each with an independent directory.
       Isolated word. Upto 1000 words/user, 300 words/window. 1 word occupies
       2Kb on hard disk. Can be used to control Windows applications by
       issuing voice commands instead of menu selection.
     * Rough Cost: 292 Pounds(UK)
     * Requirements: None
     * Misc: Price includes a half-sized AT voice card (including a DSP),
       software, documentation & a microphone (attachable to keyboard or
       speaker). A light-weight high-spec headset is an optional extra.
     * Contact: 

    Mark Redwood
    Applied Voice Technologies
    26 Danbury Street, Islington,
    London, UK, N1 8JU
    Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225



Votan

     * Platform: MS-DOS, SCO UNIX
     * Description: Isolated word and continuous speech modes, speaker
       dependant and (limited) speaker independent. Vocab size is 255 words or
       up to a fixed memory limit - but it is possible to dynamically load
       different words for effectively unlimited number of words.
     * Cost: Approx US $1,000-$1,500
     * Requirements: Cost includes one Votan Voice Recognition ISA-bus board
       for 386/486-based machines. A software development system is also
       available for DOS and Unix.
     * Misc: Up to 8 Votan boards may co-exist for 8 simultaneous voice users.
       A telephone interface is also available. There is also a 4GL and a
       software development system. Apparently there is more than one version
       - can anyone provide more detail?
     * Contact: 800-877-4756, 510-426-5600



Voice Processing Corporation Speech Recognition Product Line

     * Description: Voice Processing Corporation (VPC) supplies automated
       speech recognition systems. VPC's products are used in the
       telecommunications, cellular and personal computer markets to enable
       computers to understand human speech. The company's VPro product line
       is sold to original equipment manufacturers (OEMs), value added
       resellers (VARs), system integrators and application developers. VPC's
       speech recognition systems are currently used in applications such as
       voice mail, voice activated dialing, interactive voice response, and
       command and control of personal computers.

       The following are descriptions of the Voice Processing Corporation's
       VPro Product Line: VProContinuous, VPro/XD, VPro/RT, VProCel,
       VProSpeller, VProPRL, VPro hardware platforms, and the application
       Osprey.

       More information is available on these products at the VPC WWW site:
       http://www.vpro.com/
     * VProContinuous(TM) is a speaker-independent, continuous digit
       recognizer. It recognizes digit strings spoken in a continuous manner,
       by any caller, without unnatural beeps or pauses. VProContinuous uses
       out-of-vocabulary rejection and word spotting technologies to reject
       extraneous words and phrases often spoken by callers. The
       VProContinuous vocabulary consists of the words "zero" through "nine,"
       "yes," "no," and "oh." The product is language-independent. American
       English, Australian English, Brazilian Portuguese, Canadian French,
       Castilian Spanish, French, German, Italian, Mexican Spanish,
       Portuguese, Swiss German and U.K. English versions are available.
     * VPro/XD(TM) is a discrete or multiword speech recognizer for
       extra-demanding applications and/or vocabularies. This robust discrete
       product recognizes isolated discrete utterances (words or very short
       phrases). VPro/XD utilizes proprietary out-of-vocabulary rejection and
       word-spotting technologies. VPro/XD is speaker-independent and includes
       Talkover capability allowing speech-interrupt over prompts. Pre-trained
       vocabulary libraries are available in American English, Australian
       English, Brazilian Portuguese, Canadian French, Castilian Spanish,
       Central American Spanish, German, Italian, Mandarin Chinese, Mexican
       Spanish, Portuguese, Swiss German and UK English. Pre-trained
       vocabularies consisting of voice mail words, voice dialing words, call
       control words, banking, and emergency words are available in American
       English (both cellular and land-line).
     * VPro/RT(TM) is a discrete speech recognizer for rapid training of
       vocabularies in the field. This robust discrete product recognizes
       isolated discrete utterances. Application designers and end-users
       define the vocabulary of their choice and train the system in real-time
       either prior to system start-up, or adapting on-the-fly while the
       system is running live. Vocabularies can be subset, and applications
       involving thousands of words can be developed quickly. VPro/RT, which
       also supports Talkover, is suited to speaker-dependent recognition
       tasks, such as the personal directory of names in a voice-activated
       dailing application. VPro/RT is also good for applications that require
       speaker-independent vocabularies to be developed quickly in the field
       or those that require many vocabularies. VPro/RT can also be used as a
       tool for quick prototyping of applications.
     * VProCel consists of speaker-independent VProContinuous, VPro/XD and
       speaker-dependent VPro/RT specifically tuned for the cellular
       environment. The speaker-dependent discrete feature of VProCel allows
       for a user-defined 20-word personal directory, with a one-pass
       enrollment whereby users need only speak their chosen commands once. In
       addition, cellular-ready VPro/XD vocabularies consisting of
       voice-activated dialing command words are also available. VProCel is
       suited to voice-activated dialing applications using either digit
       strings or a listing of words in a personal directory.
     * VProSpeller is a recognizer that can determine which name or word is
       being spelled by a caller. Users may spell a string of letters (up to
       32 letters) in an uninterrupted manner (without prompts or beeps
       between each letter). VProSpeller can recognize confusable letters by
       conducting an automated search of a database of words maintained by the
       application for the best candidates to match.
     * VProPRL Designed for customers who wish to enable VPC speech
       recognition technologies on platforms other than those supported by
       VPro hardware, the VProPRL is a portable recognizer library of
       VProContinuous, VPro/XD and VPro/RT, which can be embedded into a wide
       variety of hardware platforms. It consists of a library of object
       modules which can be linked with a user application or task.
     * VPro Hardware Platforms: VPro-42, VPro-84, VPro-88 : The VPro platforms
       are ISA compliant PC/AT boards. Each supports four to eight Virtual
       Speech Processors (VSPs). Each VSP, depending on load factors, can
       handle multiple telephone lines. Application and host computers
       communicate with each of the VSPs as separate autonomous units. VPro
       platforms use Texas Instruments TMS320C31 microprocessors which provide
       up to 133 MFLOPS of compute power. The platforms can have up to 8
       megabytes of memory shared among all processors. In addition, each
       processor has 512K bytes of local memory. Both the PEB and MVIP PCM
       audio buses are supported by all VPro platforms.
     * Osprey is a call management software application that performs the
       kinds of telephone related activities typically done by a personal
       assistant, such as answering the phone, screening callers, routing
       calls, and taking and delivering messages. It is an automated phone
       attendant.
     * Price and availability: Contact Voice Processing Corporation
     * Contact: Kelli V. Smith

    Voice Processing Corporation
    1 Main Street, Cambridge, MA, 02142 USA
    Ph: (617)494-0100 Fax: (617)494-4970
    e-mail: KSmith@vpro.com
    WWW: http://www.vpro.com/


___________________________________________________________________________

   Copyright (c) 1995 by Andrew Hunt, all rights reserved.
   This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
   long as it is posted in its entirety and includes this copyright statement.

   This FAQ may not be distributed for financial gain.
   This FAQ may not be included in any collections or compilations
   without express permission from the author.



 ---

Andrew Hunt
ATR Interpreting Telecommunications Research Labs
Hikari-dai 2-2, Seika-cho, Kyoto, 619-02, Japan
Tel: +81-774-95 1390   Fax: +81-774-95 1308
Email: andrew@itl.atr.co.jp

.