In this paper, we discuss the possibility of applying weighted finite state transducer (WFST) as a unified framework to solve endpoint detection problem. In general, endpoint detection is composed of two cascaded decision processes. The first process is voice activity detection (VAD) which makes frame-level speech/non-speech classification. The second process is utterance-level detection which makes final decision with state transition control and heuristic knowledge. In recent, statistical model-based approach is common on VAD but rule-based logic is dominant on utterance-level detection. However, such an approach can cause some problems. First, it requires expert knowledge to define rules and it also requires sophisticate implementation to avoid confliction among them. Second, it can yield suboptimal performance because each process has to be dealt with independently. Therefore, in order to handle these problems by integrating the two processes, we propose WFST-based endpoint detection framework. The experimental result shows that the endpoint detection problem can be solved in a straightforward way under the proposed framework.
Bibliographic reference. Chung, Hoon / Lee, SungJoo / Lee, YunKeun (2013): "Endpoint detection using weighted finite state transducer", In INTERSPEECH-2013, 700-703.