Студопедия

Главная страница Случайная страница

КАТЕГОРИИ:

АвтомобилиАстрономияБиологияГеографияДом и садДругие языкиДругоеИнформатикаИсторияКультураЛитератураЛогикаМатематикаМедицинаМеталлургияМеханикаОбразованиеОхрана трудаПедагогикаПолитикаПравоПсихологияРелигияРиторикаСоциологияСпортСтроительствоТехнологияТуризмФизикаФилософияФинансыХимияЧерчениеЭкологияЭкономикаЭлектроника






Automatic speaker recognition






 

Speaker recognition is a cover term for two quite different actions, even if both have much in common, like the need to maintain a data base of identity codes and voice patterns of all persons who are known to the system. In speaker Identification the machine must use a speech utterance to determine which of the persons has spoken (or whether the speaker was an unknown person -- and therefore probably an intruder). Thus, speaker identification is a 1 out of N+1 selection (if the data base contains N subjects). In speaker Verification a speaker claims to be one of the N persons in the data base of the system, and the task is to decide whether or not that claim can be substantiated. Thus, verification is basically a 1 out of 2 selection. In actual practice, the verification result can be a probability score for the unknown speaker being the person whose identity was claimed, leaving it to the application to decide how to act upon this information.

In general, three different methods are known to verify someone’s’ identity: by means of something that the person possesses (e.g., a passport or a batch); by means of something that the person knows (e.g., a password or a secret code); or by some personal characteristic (e.g., finger print, retina pattern, or voice characteristics). The use of personal characteristics for identity verification is called biometric authentication. Speaker verification is a biometric authentication method.

Biometric methods come in two different types. Characteristics like finger prints and retina patterns are anatomical, and therefore cannot be manipulated at will by the person (or by an intruder who wants to impersonate the true owner, for that matter). Other characteristics, including the voice, are behavioural in nature, and thus can be manipulated at least to some extent. What is more, behavioural biometrics display an inherent variability: it is impossible to produce the exact same signal when speaking the same password under different conditions, and when using different microphones. Therefore, behavioural biometric patterns must necessarily consist of means and their attendant variances. As a consequence, behavioural biometrics can never provide 100% secure authentication.

Speaker verification is used to protect the safety of services (e.g., voice mail systems, calling cards, etc.) or physical premises (like buildings, special rooms in a building, workstations, etc.). Depending on the type of output of the verification system (a yes/no decision or a probability) the system can act as a gatekeeper or an alarm bell. In the gatekeeper metaphor one must optimise the ratio between false accepts (i.e., the proportion of successful break-in attempts) and the ratio of false rejects (when the system fails to recognise a true customer). When used as an alarm bell, it is left to the application to decide what a user is allowed to do: as long as no actions are performed which may incur losses (of money or privacy) it is not necessary to refuse access.

Speaker verification comes in several different forms. In text-independent verification techniques the speaker patterns are based on characteristics of the voice which are present in almost all speech sounds. Of course, a recording of the true persons voice would suffice to fool a text-independent speaker verification system that operates in gatekeeper mode. If all the speech produced during an interaction with an application is used to monitor the identity of the customer (with the verification system operating in alarm bell mode), arbitrary recordings are no longer useful, since they cannot be used to instruct the system, e.g., to transfer money to the bank account of the burglar.

In text-dependent speaker verification the user must speak specific words or phrases in order to have her identity verified. To prevent the use of recorded passwords text-prompted verification can be used. In this mode the speaker verification system plays a randomly selected utterance to the customer, and asks to repeat that utterance verbatim. If the number of possible utterances is large enough (and certainly if the set is kept secret) there is no practical way in which one might fool a text-prompted system.

The security that can be obtained with text-dependent speaker verification is considerably higher than the performance of text-independent techniques. That is because speaker models based on known sounds can be made much more specific than models which must cover all possible speech sounds.

 


Поделиться с друзьями:

mylektsii.su - Мои Лекции - 2015-2025 год. (0.006 сек.)Все материалы представленные на сайте исключительно с целью ознакомления читателями и не преследуют коммерческих целей или нарушение авторских прав Пожаловаться на материал