Video temporal segmentation using applause sound and end-of-act detection for a circus performance video archive

Iwan, L 2015, Video temporal segmentation using applause sound and end-of-act detection for a circus performance video archive, Doctor of Philosophy (PhD), Computer Science and Information Technology, RMIT University.

Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Iwan.pdf Thesis application/pdf 6.84MB
Title Video temporal segmentation using applause sound and end-of-act detection for a circus performance video archive
Author(s) Iwan, L
Year 2015
Abstract Typically, archival performance videos are: filmed in a single shot, lengthy, affected by camera operation, and originate from various video formats. To be useful, a video of a whole performance needs to be segmented into discrete acts that represent individual clips within the total performance; however, this is not a simple task due to the characteristics of the video content.

The Circus Oz video collection is an existing performance video archive that comprises over 1,074 videos totaling over 1,000 hours of viewing. To deliver their video collection to users, a prototype of the Circus Oz performance video archive system has been developed which includes system architecture and database schema.

For the purpose of video segmentation, we identify the specific clues that indicate where a performance video is likely to be segmented: that is when an applause sound is detected in combination with one or more other clues such as black frames and image changes.

An applause detection technique for multiple applause classes has been proposed. In order to evaluate the performance of the proposed technique, an audio data set together with applause ground truth data on a sample of the Circus Oz performance videos have been developed. This applause data set contains three applause classes: less clap, more clap, and pure clap.

The proposed applause detection technique uses both characteristic-based and classification- based approaches. Our experiments show that minimum applause strength and duration values are the two essential threshold values for improving the precision of applause detection using the classification-based approach. In this approach, we found the optimum combination of several audio features. In our applause classification experiment, we achieved 83%, 94%, and 100% correctly classified for quaternary, ternary, and binary class classification respectively.

Using the clues we identified, we proposed a method for detecting end-of-act using applause sound detection, black frames detection and image comparison. The experiment shows that the precision and recall of the end-of-act-detection method is 49% and 92% respectively, making the task of manual annotators much more productive.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Computer Science and Information Technology
Subjects Pattern Recognition and Data Mining
Image Processing
Computer System Architecture
Keyword(s) Video temporal segementation
Audio classification
Sound detection
Image sequence analysis
Machine learning
Performing arts
Image comparison
Version Filter Type
Access Statistics: 221 Abstract Views, 306 File Downloads  -  Detailed Statistics
Created: Wed, 24 Aug 2016, 14:04:16 EST by Keely Chapman
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us