Robustness In Automatic Speech Recognition

Robustness In Automatic Speech Recognition

Robustness In Automatic Speech Recognition


Discuss About The Robustness In Automatic Speech Recognition.



This report paper is about issues and ethics issues when conducting research the field of speech recognition. The main objective of this report is to develop the deep understanding of research ethics in the field of speech recognition. There is need of collecting and analyzing information related to the economics in this field and have a comprehensive understanding of ethics policy used in the residential nation, state, and institute. Some of the sub-topics discussed in this research include research background, pros and cons of the research, ethics issues, integrity, and safety risks and issues, and response plan to issues and risks.

Research Background

The chosen research is about speech recognition. Speech recognition can be defined as an inter-disciplinary sub-field of computational linguistics that develops technologies and methodologies which enhances the translation and recognition of a language spoken into text by computers. This research is based on the ability or a program or machine to identify phrases and words in the spoken language and then converting them to format that is readable by machines. The research incorporates research and knowledge in the electrical engineering, computer science, and linguistics (APAIS, 2007).

The speech recognition functions by the use of algorithms through language and acoustic modelling. Language modelling matches sound with the sequence of words that sound similar while acoustic modelling denotes the relationship between linguistic units of audio and speech signals. The expected outcome of this research on speech recognition is to develop a system that can be applied in areas such as aircraft which is normally known as direct voice input, preparation of structured documents such as radiology report, simple data entry such as entering a credit card number, call routing  and voice user interfaces such as voice dialing (Acton, 2012).

This research requires an evaluation of issues and ethics issues while conducting research in the field of speech recognition. Ethics is used in research to evaluate the behaviour of the researcher and the results of the research. There are numerous safety issues and risks associated with the research on speech recognition ranging from the researchers and also the final users (Ayuso, 2012).

Pros And Cons Of Research

There are numerous advantages and disadvantages associated with the research. This section discusses the pros and cons that are likely to be faced in the voice recognition research. The following are some of the pros in the speech recognition:


One of the well-known advantages of voice recognition is that the correct spelling of the spoken word will always be captured even if the speaker did not actually know the spellings of the work. For those individuals who struggle with the correct spelling, they can easily dictate to the voice recognition system which is expected to automatically spell out the word correctly (DeMori, 2012).


The voice recognition enables a greater speed in the production of typed work hence many people are enamoured with the concept of voice recognition. Instead of having to learn on how to type a certain number of correct words for a given duration or use effectively the keyboard, the user may simply speak and the spoken words will be recognized (Haton, 2012).


It has been argued that the major reason for the development of voice recognition is due to the fact that it enables individuals with disabilities to operate and type computers.  For example, a person who cannot write with his hands or operate a computer for any reason can simply dictate to the computer on what he wishes to write and the system will respond accordingly (Haton, 2012).

Some Of The Cons Associated With Voice Recognition Are Explained Below:

Voice recognition recognizes ordinary vocabularies in the language of English. Despite this system being effective in writing most of the time, it is normally not enough. In numerous cases, financial or unusual names are not part of the vocabulary of this system. The system can also fail to correctly spell some traditional names. For example, the system may fail to differentiate whether it hears “Jaymz” or “James”. Slang can also be a serious issue when using the voice recognition. There will always be mistakes in spelling which may force the user to go back and fix the mistakes (Israel, 2014).


The voice recognition is always faced with occasional mishaps despite the system having been designed to facilitate work. There is usually some duration that the voice recognition will take before registering the vocabularies that have been said. This may be frustrating for those users who are used to only being hampered by the frequency of their fingers on the keyboard and not wishing that their flow of thoughts are interrupted (Jones, 2014).


There is need of first training the voice recognition to recognize the voices of users. It takes some duration for the system to learn the voice of the users as well as their way of speaking. There is need of first investing a lot of effort and patience into the voice recognition. Even after the system has undergone training, it is still a subject to making mistakes like failing to recognize names that are unusual or struggling to find the difference between words that are sounding similar (Schwartz, 2013).

Ethics Issues, Integrity And Safety, And Risks

The experience in the research on speech recognition is faced with numerous ethical and legal issues related to the development of this system. There have been questions of legal issues of accidental recording of unrelated individuals in the background of the person of interest. This has been considered to be entirely possible during the process of recording. There are no given guidelines for groups of individuals taking part in this kind of research. This may make it difficult for the researchers in the field of speech recognition to know the exact number that is needed to carry out this research (Yakovenko, 2015).

Integrity And Safety

It is also common for researchers to record and use audio and video in public places and share the recordings online. There is need of exploring the legal areas related to voices recordings since there are some places that are considered to be private and such recordings are illegal to use in these regions. It is common to see signs indicating that any form of recording is illegal in such places such as government offices or private places like hospitals where patient privacy is important. These legal areas are published in the latest edition of Journal of Law and Medicine in Australia (Schwartz, 2013).

It has been noted that the potential liabilities during the research in voice recognition could arise from numerous surveillance and listening device laws of Australia since voice recognition could record the private conversation without the approval of the third parties. The text converted is then subjected to further language processing, however, it is not clear if those words are listened to or heard as understood within the perspective of the current legal definitions of a personal conversation. There is need of the researcher in the field of speech recognition to be well conversant with the laws of listening and surveillance of Australia so that they are not found in the wrong side of the law (Porter, 2012).

Speech recognition combined with other technologies such as geo-location and data retention are possibly privacy-invasive and result in problems related to human rights and freedom and also right for privacy. There are also numerous risks associated with speech recognition such as stored data can be stolen, the data could be misused for other reasons such as for criminal investigation, the process may reveal information concerning the race, ethnic, or health of someone, and also the process of identification is subjected to mistakes (Patrick, 2009).

Response Plan To The Issues And Risks

There is need of the researcher to respect the privacy of individuals who may be part of the third party during the process of recording. This can be ensured through carrying out the recording process in a private place where there is no background noise which may be captured by the system. The researcher should also abide by the surveillance and listening device laws of Australia which restrict the use of listening devices in some regions which are considered to be private. The number of the researchers that are required to take part in this research are not legally restricted hence the researcher may confidently work with any number that suits this research (Paliwal, 2012).

It is unethical to use the speech recognition system in public places hence the researcher should restrict the use of this system in the only private place away from background noise that may accidentally be recorded. The serious integrity issue that may really affect the integrity of the research and also the system is when the system accidentally records some private information from the third party and then the researcher goes ahead to share the recordings online. The research should not violate the human rights and freedom and also right for privacy to whoever may be using the system or even to those in the background. It is the work of the researcher to ensure that the rights and freedom of the users and third parties are adhered to (Murray, 2017).

System Architecture

The Alpha System is a 64-bit reduced instruction set computing architecture for instruction set developed by Digital Equipment Corporation. The Alpha system is supported by the Operating System known as Tru64 UNIX, currently known as OpenVMS. This architecture was intended to be a high-performance design that is capable of supporting a high performance by the use of features such as clock rate. The accuracies in recognition have been steadily improved through the current system so as to ensure integrity issues associated with inaccurate recordings. The Alpha system is a continuous speech recognition of 71-96% accuracies, large vocabulary, and speaker independent due to the numerous features that make it more advanced than other systems (Morgan, 2011).

The conversion of the spoken word into the representation of text automatically needs numerous states as shown in the figure above. The first step is the conversation of acoustical vibration into the analogue signal by the use of a microphone or a cellphone. This analogue signal is then filtered to remove the components of the high frequency of the signal which lie outside the frequency range that human ear can possibly hear. The signal that is filtered is then digitized by the use of quantization and sampling. The waveform that is digitized is then blocked into frames which are then compressed by the use of one of the numerous encoding schemes (Jones, 2014).

In this phase, the preprocessing is complete and the technique of recognition can be applied to this audio input representation. Speech recognition is known as the process of converting words to sentences, phones to words, and frames to phone. A phone may be considered to be approximately equal to a single consonant or vowel sound. A normal 15-word sentence comprises of about 65 phones. Numerous frames form a phone. Alpha system architecture recognizes a sentence by performing a beam search through a prior network state for a phone sequence that best matches the sequence of frame input (Ayuso, 2012).

Alpha system architecture libraries employ discrete Groovy Transformer Sandbox which is currently predominant method of speech recognition. They are characterized by a prior set of transition probabilities between the words and a set of states. The beam search iteratively creates a tree of candidate paths through the Groovy Transformer Sandbox (McAuliffe, 2014).


Speech recognition can be defined as the process of converting words to sentences, phones to words, and frames to phone. This paper is about issues and ethical issues when conducting research the field of speech recognition. Some of the advantages of speech recognition discussed in this paper include correct spelling, greater speed, and also can be used with people with disabilities. Some of the advantages of speech recognition discussed in this paper include it requires training, there are many delays, and also limit in vocabularies. It has been noted that the potential liabilities during the research in voice recognition could arise from numerous surveillance and listening device laws of Australia since voice recognition could record the private conversation without the approval of the third parties.


Acton, A., 2012. Issues in Accounting, Administration, and Corporate Governance. Australia: Scholarly Editions.

APAIS, 2007. Australian public affairs information service. Australia: National Library Australia.

Ayuso, R., 2012. Speech Recognition and Coding. Colorado: Springer Science & Business Media.

DeMori, R., 2012. New Systems and Architectures for Automatic Speech Recognition and Synthesis. Toledo: Springer Science & Business Media.

Hatton, J., 2012. Robustness in Automatic Speech Recognition. California: Springer Science & Business Media.

Israel, M., 2014. Research Ethics and Integrity for Social Scientists. London: SAGE.

Jones, B. H., Chin, A. & Aiken, P., 2014. Risky business: Students and smartphones. TechTrends, 58(6), pp. 73-83.

Laface, P., 2012. Speech Recognition and Understanding. Melbourne: Springer Science & Business Media,

Lanphier, E., Urnov, F., 2015. Don’t edit the human germline. Nature, 519(7544), p. 410.

McAuliffe, D., 2014. Interprofessional Ethics. Sydney: Cambridge University Press.

Morgan, N., 2011. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. New York: John Wiley & Sons.

Murray, E., 2017. Nursing Leadership and Management: For Patient Safety and Quality Care. Austrailia: F.A. Davis.

Paliwal, K., 2012. Automatic Speech and Speaker Recognition. Berlin: Springer Science & Business management.

Patrick, W., 2009. Advances In Pattern Recognition Systems Using Neural Network Technologies. Perth: World Scientific.

Porter, L., 2012. Integrity Management in Australia. Australia: CRC Press.

Schwartz, M., 2013. Ethics, Values and Civil Society. Australia: Emerald Group Publishing.

Yakovenko, I., 2015. The efficacy of motivational interviewing for disordered gambling: systematic review and meta-analysis. Addictive Behaviors, Volume 43, pp.

Robustness In Automatic Speech Recognition