TY - JOUR
T1 - Development of modified analytical model for investigating acceptable delay of TCP-based speech recognition
AU - Jarin, Asril
AU - Fahmi, Husni
AU - Suryadi, null
AU - Ramli, Kalamullah
N1 - Publisher Copyright:
© 2017 American Scientific Publishers All rights reserved.
PY - 2017
Y1 - 2017
N2 - Many studies have proposed solutions to overcome the degradation of network speech recognition (NSR) caused by packet loss and jitter. The most popular cloud-based speech recognition systems, such as Google speech recognition and Apple Siri, have currently been employing TCP in cooperation with HTTP. TCP as a reliable transport protocol appropriately deliver all speech data to the server but may have delays due to unexpected network condition. In order to achieve a satisfactory performance of NSR against TCP delay, an acceptable delay should be fulfilled. In this paper, a scheme of TCP-based NSR with a speech segmenter at the client side is proposed and an analytical model to investigate the acceptable delay is developed on the basis of a study of the stored streaming via TCP employing a discrete-time Markov model. The speech segmenter allows TCP to send the speech signal sentence by sentence, so that the resulted texts are recognized using a language model. The acceptable delay is defined as a specified length of time required for the server to receive the entire data of sentence. A negative value of the number of early packets within the acceptable delay bound indicates that the sentence streaming is slow. Our model is validated via ns-3 simulations. Moreover, the model is verified for a distribution of 2500 Indonesian sentences using TANGRAM-II to prove the real-time factor (RTF) of TCP-based speech recognition and to identify its working region. The model advises that the real-time factor (RTF) is not achieved when loss rate is 0.014 and RTT is 100 ms. The streaming over TCP leads to a satisfactory performance within an acceptable delay of eight seconds when the loss rate is smaller than 0.05 and round-trip-time is 100 ms. When the round-trip-time is doubled, the streaming works within an acceptable delay of 17 seconds.
AB - Many studies have proposed solutions to overcome the degradation of network speech recognition (NSR) caused by packet loss and jitter. The most popular cloud-based speech recognition systems, such as Google speech recognition and Apple Siri, have currently been employing TCP in cooperation with HTTP. TCP as a reliable transport protocol appropriately deliver all speech data to the server but may have delays due to unexpected network condition. In order to achieve a satisfactory performance of NSR against TCP delay, an acceptable delay should be fulfilled. In this paper, a scheme of TCP-based NSR with a speech segmenter at the client side is proposed and an analytical model to investigate the acceptable delay is developed on the basis of a study of the stored streaming via TCP employing a discrete-time Markov model. The speech segmenter allows TCP to send the speech signal sentence by sentence, so that the resulted texts are recognized using a language model. The acceptable delay is defined as a specified length of time required for the server to receive the entire data of sentence. A negative value of the number of early packets within the acceptable delay bound indicates that the sentence streaming is slow. Our model is validated via ns-3 simulations. Moreover, the model is verified for a distribution of 2500 Indonesian sentences using TANGRAM-II to prove the real-time factor (RTF) of TCP-based speech recognition and to identify its working region. The model advises that the real-time factor (RTF) is not achieved when loss rate is 0.014 and RTT is 100 ms. The streaming over TCP leads to a satisfactory performance within an acceptable delay of eight seconds when the loss rate is smaller than 0.05 and round-trip-time is 100 ms. When the round-trip-time is doubled, the streaming works within an acceptable delay of 17 seconds.
KW - Acceptable Delay
KW - Network Speech Recognition
KW - Satisfactory Performance
KW - Streaming Over TCP
UR - http://www.scopus.com/inward/record.url?scp=85021088774&partnerID=8YFLogxK
U2 - 10.1166/asl.2017.9029
DO - 10.1166/asl.2017.9029
M3 - Article
AN - SCOPUS:85021088774
SN - 1936-6612
VL - 23
SP - 3654
EP - 3659
JO - Advanced Science Letters
JF - Advanced Science Letters
IS - 4
ER -