We propose a method for estimating user activities by analyzing long-term (more than several seconds) acoustic signals represented as acoustic event temporal sequences. The proposed method is based on a probabilistic generative model of an acoustic event temporal sequence that is associated with user activities (e.g. "cooking") and subordinate categories of user activities (e.g. "fry ingredients" or "plate food") in which each user activity is represented as a probability distribution over unsupervised subordinate categories of user activities called activity-topics, and each activity-topic is represented as a probability distribution over acoustic events. This probabilistic generative model can express user activities that have more than one subordinate category of the user activities, which a model that takes into account only user activities cannot express adequately. User activity estimation with this model is achieved using a two-step process: frame-by-frame acoustic event estimation to output an acoustic event temporal sequence and user activity estimation with the proposed probabilistic generative model. Activity estimation experiments with real-life sounds indicated that the proposed method improved user activity estimation accuracy and stability of "unseen" acoustic event temporal sequences. In addition, the experiment showed that the proposed method could extract correct subordinate categories of user activities.
Bibliographic reference. Imoto, Keisuke / Shimauchi, Suehiro / Uematsu, Hisashi / Ohmuro, Hitoshi (2013): "User activity estimation method based on probabilistic generative model of acoustic event sequence with user activity and its subordinate categories", In INTERSPEECH-2013, 2609-2613.