Development of modified analytical model for investigating acceptable delay of TCP-based speech recognition

Asril Jarin, Husni Fahmi, Suryadi, Kalamullah Ramli

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Many studies have proposed solutions to overcome the degradation of network speech recognition (NSR) caused by packet loss and jitter. The most popular cloud-based speech recognition systems, such as Google speech recognition and Apple Siri, have currently been employing TCP in cooperation with HTTP. TCP as a reliable transport protocol appropriately deliver all speech data to the server but may have delays due to unexpected network condition. In order to achieve a satisfactory performance of NSR against TCP delay, an acceptable delay should be fulfilled. In this paper, a scheme of TCP-based NSR with a speech segmenter at the client side is proposed and an analytical model to investigate the acceptable delay is developed on the basis of a study of the stored streaming via TCP employing a discrete-time Markov model. The speech segmenter allows TCP to send the speech signal sentence by sentence, so that the resulted texts are recognized using a language model. The acceptable delay is defined as a specified length of time required for the server to receive the entire data of sentence. A negative value of the number of early packets within the acceptable delay bound indicates that the sentence streaming is slow. Our model is validated via ns-3 simulations. Moreover, the model is verified for a distribution of 2500 Indonesian sentences using TANGRAM-II to prove the real-time factor (RTF) of TCP-based speech recognition and to identify its working region. The model advises that the real-time factor (RTF) is not achieved when loss rate is 0.014 and RTT is 100 ms. The streaming over TCP leads to a satisfactory performance within an acceptable delay of eight seconds when the loss rate is smaller than 0.05 and round-trip-time is 100 ms. When the round-trip-time is doubled, the streaming works within an acceptable delay of 17 seconds.

Original languageEnglish
Pages (from-to)3654-3659
Number of pages6
JournalAdvanced Science Letters
Issue number4
Publication statusPublished - 2017


  • Acceptable Delay
  • Network Speech Recognition
  • Satisfactory Performance
  • Streaming Over TCP


Dive into the research topics of 'Development of modified analytical model for investigating acceptable delay of TCP-based speech recognition'. Together they form a unique fingerprint.

Cite this