Главная страница Случайная страница КАТЕГОРИИ: АвтомобилиАстрономияБиологияГеографияДом и садДругие языкиДругоеИнформатикаИсторияКультураЛитератураЛогикаМатематикаМедицинаМеталлургияМеханикаОбразованиеОхрана трудаПедагогикаПолитикаПравоПсихологияРелигияРиторикаСоциологияСпортСтроительствоТехнологияТуризмФизикаФилософияФинансыХимияЧерчениеЭкологияЭкономикаЭлектроника |
Taxonomy of applications
The three speech technologies mentioned above are enabling technologies. They are meant to be used in applications and services. The use of one or more of the speech technologies should result in cheaper services (7X24 hour services, or partly or completely automated operator services), in user-friendly services (speech technology must offer ease of use), safer services (offer additional protection than a PIN alone), or a combination of these. There are a number of ways to distinguish between types of applications and services. We mention here the most important ones. First, there is the distinction between uni-modal vs. multi-modal. In the first case only one modality is used to perform a task, e.g. speech is used for input and output in a train table information system that is accessed by telephone. In a multi-modal version of this service speech can be used as input and text (e.g. on the screen of a screen phone) might be used for output. Second, there is the distinction between desktop and remote access (via telephone). The most well known desktop applications are the dictation packages that are on the market now. With these tools one is able to speak commands in the most used Microsoft office programs (e.g. open, file, edit, etc.) and one even can dictate text. The speech recognition programme executes locally on the PC. When installed, one needs to tune the phoneme models to ones own voice. The more speech is used for this purpose, the better. Remote access applications, on the other hand, are accessed over the phone. Speech recognition must be speaker independent, because the application has few, if any, means to determine who the speaker is. Perhaps, in the (near) future speaker adaptive recognition may become feasible in a limited category of applications. If Calling Line Identification is combined with speaker identification, the CLI information is used to limit the set of potential speakers to a small number, after which speaker identification can be used to determine which person is calling. However, other classes of speech driven telephony services (like the train timetable information system) will always require speaker independent recognition, because it is not possible to build speaker models for all persons in the country who might call. The last factor that makes a big difference is the communication metaphor. Some services (like a speech driven unified messaging system, that allows one to determine the way in which e-mail, voice mail and fax mail are handled) are best modelled as " command & control" interfaces. If the user is giving commands to a system with a well-known and fixed functionality she can easily learn to use specific expressions. For instance, the speech recogniser might be able to understand commands like " read my e-mails", " read the e-mails from my boss", " play the voice mails from my boyfriend", while expressions like " could you please tell me the contents of the last e-mail of my boyfriend and then play the last voice mail from my boss" are beyond its capabilities. Other services, however, are more appropriately modelled after an intelligent personal assistant. In the communication with an intelligent agent one would expect to be able to use free, spontaneous speech, including hesitations, false starts, etc. For the time being, speech recognition technology is capable of supporting command & control type applications. Its performance is also good enough for somewhat more complex services, like the train timetable information service.
|