Friday, December 16, 2011

A nifty way to test Speech-To-Text uncertainties with ITU's Difficulty Percentage measure

In these experiments the LIRNEasia researchers used Freedom Fone Interactive Voice Response (IVR) system. First they conducted a survey with known values for the subjects to pick from. These answers were submitted through the IVR. Since the values were known to the human quality testers, this part of the experiment was associated with a speech-to-text trained system (or a speaker-dependent system or voice recognition type system). The second part involved the subjects submitting data that was not based on preset values. They were free to submit answers to questions as they pleased. This was regarded as an untrained or speaker-independent system.
Emulating Speech-To-Text Reliability with ITU Difficulty Scores

"The results show that with a speaker dependent system 95% of the information could be clearly deciphered opposed a speaker independent system that was only 70% clear (blue areas in Figure 1 and Figure 2). It is not surprising, the outcomes are intuitive. In our study reliability had two components, one was efficiency and the other was voice quality. The voice quality also took in to consideration the Mean Opinion Score and the Comparison Categorical Rating. The researchers wish to acknowledge that their may be disagreements in the sample sizes and number of Evaluators. These results are not ideal for drawing a ‘for-all” kind of conclusion. However, at this realize stage of the research it provides a quick and easy method to draw initial conclusions." ...Click to read full article