SPEECH - RECOGNITION

SPEECH RECOGNITION


INTRODUCTION :

Speech is an easy and usable technique of communication between humans, but nowadays humans are not limited to connecting to each other but even to the different machines in our lives . The most important is the computer . So , this communication technique can be used between computers and humans . This interaction is done through interfaces , this area called Human Computer Interaction ( HCI ) . Automatic Speech Recognition ( ASR ) which is an important domain of artificial intelligence and which should be taken into account during any related actions .

Speech recognition , the ability of devices to respond to spoken commands . Speech recognition enables hands - free control of various devices and equipment ( a particular boon to many disabled persons ) , provides input to automatic translation, and creates print - ready dictation . Among the earliest applications for speech recognition were automated telephone systems and medical dictation software . It is frequently used for dictation , for querying databases , and for giving commands to computer - based systems , especially in professions that rely on specialized vocabularies . It also enables personal assistants in vehicles and smartphones , such as Apple's Siri.

WORKING :

Before any machine can interpret speech , a microphone  must translate the vibrations of a person’s voice into a wavelike electrical signal . This signal in turn is converted by the system’s hardware - for instance , a computer’s sound card - into a digital signal . It is the digital signal that a speech recognition program analyzes in order to recognize separate phonemes , the basic building blocks of speech . The phonemes are then recombined into words . However , many words sound alike , and , in order to select the appropriate word , the program must rely on the context . Many programs establish context through trigram analysis , a method based on a database of frequent three-word clusters in which probabilities are assigned that any two words will be followed by a given third word . For example , if a speaker says “ who am , ” the next word will be recognized as the pronoun “ I ” rather than the similar - sounding but less likely “ eye . ” Nevertheless , human intervention is sometimes needed to correct errors .

Programs for recognizing a few isolated words , such as telephone voice navigation systems , work for almost every user . On the other hand , continuous speech programs , such as dictation programs , must be trained to recognize an individual's speech patterns ; training involves the user reading aloud samples of text . Today , with the growing power of personal computers and mobile devices, the accuracy of speech recognition has improved markedly . Error rates have been reduced to about 5 percent in vocabularies containing tens of thousands of words . Even greater accuracy is reached in limited vocabularies for specialized applications such as dictation of radiological diagnosis .

ADVANTAGES :

    1. Ease of communication – No more illegible handwriting
    2. Quick document turnaround
    3. Flexibility to work in or out of the office
    4. Time saved with increased efficiency and less paperwork
    5. Tedious jobs can be streamlined and simplified
    6. Speech recognition software can produce documents in less than half the time it takes to type .
    7. Multitasking – dictation on the go
    8. Flexibility to share files across devices
    9. Fewer errors – provides an accurate and reliable method of documentation
    10. Secure pathways for information transmission
    11. Accessible from your iPhone, Android or tablet
    12. Workflow visibility – enabling easier management of priorities and turnarounds.

TYPES :

If you’re developing a speech product like a voice assistant or speech recognition software , at some point , you’ll find yourself in need of speech data to train your machine learning algorithms . It is of three types :

This spectrum allows us to bin speech recognition data into three broad categories :

  1. Controlled : Scripted speech data
  2. Semi-controlled : Scenario-based speech data
  3. Natural : Unscripted or conversational speech data

CONCLUSION :

Today , voice and natural language processing are at the forefront of any human machine interaction environment and in further future it will reach various glorious paths to avail its true potential .


Written By - Ritesh Pandita  ©

Comments

MORE LIKE THIS

BIG DATA ANALYTICS

REINFORCEMENT LEARNING

ARTIFICIAL - INTELLIGENCE