Sound event classification and detection with weakly labeled data

Fayek, H, Tourbabin, V and Adavanne, S 2019, 'Sound event classification and detection with weakly labeled data', in Mandel, Michael; Salamon, Justin; Ellis, Daniel P.W. (ed.) Proceedings of the 4th Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE 2019), New York, United States, 2526 October 2019, pp. 15-19.


Document type: Conference Paper
Collection: Conference Papers

Title Sound event classification and detection with weakly labeled data
Author(s) Fayek, H
Tourbabin, V
Adavanne, S
Year 2019
Conference name DCASE 2019
Conference location New York, United States
Conference dates 2526 October 2019
Proceedings title Proceedings of the 4th Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE 2019)
Editor(s) Mandel, Michael; Salamon, Justin; Ellis, Daniel P.W.
Publisher Detection and Classification of Acoustic Scenes and Events
Place of publication United States
Start page 15
End page 19
Total pages 5
Abstract The Sound Event Classification (SEC) task involves recognizing the set of active sound events in an audio recording. The Sound Event Detection (SED) task involves, in addition to SEC, detecting the temporal onset and offset of every sound event in an audio recording. Generally, SEC and SED are treated as supervised classification tasks that require labeled datasets. SEC only requires weak labels, i.e., annotation of active sound events, without the temporal information, whereas SED requires strong labels, i.e., annotation of the onset and offset times of every sound event, which makes annotation for SED more tedious than for SEC. In this paper, we propose two methods for joint SEC and SED using weakly labeled data: a Fully Convolutional Network (FCN) and a novel method that combines a Convolutional Neural Network with an attention layer (CNNatt). Unlike most prior work, the proposed methods do not assume that the weak labels are active during the entire recording and can scale to large datasets. We report state-of-the-art SEC results obtained with the largest weakly labeled dataset - Audioset
Subjects Computer Perception, Memory and Attention
Neural, Evolutionary and Fuzzy Computation
Knowledge Representation and Machine Learning
Keyword(s) Convolutional neural network
sound classification
sound event detection
weakly supervised learning
DOI - identifier 10.33682/fx8n-cm43
Copyright notice © 2020 GitHub Inc. All rights reserved.
ISBN 9780578595962
Versions
Version Filter Type
Altmetric details:
Access Statistics: 72 Abstract Views  -  Detailed Statistics
Created: Tue, 12 May 2020, 10:21:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us