ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Endpoint detection using weighted finite state transducer

Hoon Chung, SungJoo Lee, YunKeun Lee

In this paper, we discuss the possibility of applying weighted finite state transducer (WFST) as a unified framework to solve endpoint detection problem. In general, endpoint detection is composed of two cascaded decision processes. The first process is voice activity detection (VAD) which makes frame-level speech/non-speech classification. The second process is utterance-level detection which makes final decision with state transition control and heuristic knowledge. In recent, statistical model-based approach is common on VAD but rule-based logic is dominant on utterance-level detection. However, such an approach can cause some problems. First, it requires expert knowledge to define rules and it also requires sophisticate implementation to avoid confliction among them. Second, it can yield suboptimal performance because each process has to be dealt with independently. Therefore, in order to handle these problems by integrating the two processes, we propose WFST-based endpoint detection framework. The experimental result shows that the endpoint detection problem can be solved in a straightforward way under the proposed framework.

doi: 10.21437/Interspeech.2013-197

Cite as: Chung, H., Lee, S., Lee, Y. (2013) Endpoint detection using weighted finite state transducer. Proc. Interspeech 2013, 700-703, doi: 10.21437/Interspeech.2013-197

  author={Hoon Chung and SungJoo Lee and YunKeun Lee},
  title={{Endpoint detection using weighted finite state transducer}},
  booktitle={Proc. Interspeech 2013},