Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Michal Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski


We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can operate in a low-latency incremental inference mode which matches the complexity of layer-by-layer batch mode. Compared to a similar fully-convolutional architecture, MASnet incorporates depthwise and pointwise convolutions for a large reduction in fused multiply-accumulate operations per second (FMA/s), at the cost of some reduction in SNR.


 DOI: 10.21437/Interspeech.2020-2443

Cite as: Romaniuk, M., Masztalski, P., Piaskowski, K., Matuszewski, M. (2020) Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks. Proc. Interspeech 2020, 3296-3300, DOI: 10.21437/Interspeech.2020-2443.


@inproceedings{Romaniuk2020,
  author={Michal Romaniuk and Piotr Masztalski and Karol Piaskowski and Mateusz Matuszewski},
  title={{Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={3296--3300},
  doi={10.21437/Interspeech.2020-2443},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2443}
}