Neural network based image representation for small scale object recognition

Bui, H 2018, Neural network based image representation for small scale object recognition, Doctor of Philosophy (PhD), Engineering, RMIT University.

Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Bui.pdf Thesis application/pdf 2.64MB
Title Neural network based image representation for small scale object recognition
Author(s) Bui, H
Year 2018
Abstract Object recognition can be abstractedly viewed as a two-stage process. The features learning stage selects key information that can represent the input image in a compact, robust, and discriminative manner in some feature space. Then the classification stage learns the rules to differentiate object classes based on the representations of their images in feature space. Consequently, if the first stage can produce a highly separable features set, simple and cost-effective classifiers can be used to make the recognition system more applicable in practice.

Features, or representations, used to be engineered manually with different assumptions about the data population to limit the complexity in a manageable range. As more practical problems are tackled, those assumptions are no longer valid, and so are the representations built on them. More parameters and test cases have to be considered in those new challenges, that causes manual engineering to become too complicated. Machine learning approaches ease those difficulties by allowing computer to learn to identify the appropriate representation automatically. As the number of parameters increases with the divergence of data, it is always beneficial to eliminate irrelevant information from input data to reduce the complexity of learning. Chapter 3 of the thesis reports the study case where removal of colour leads to an improvement in recognition accuracy.

Deep learning appears to be a very strong representation learner with new achievements coming in monthly basic. While training the phase of deep structures requires huge amount of data, tremendous calculation, and careful calibration, the inferencing phase is affordable and straightforward. Utilizing knowledge in trained deep networks is therefore promising for efficient feature extraction in smaller systems. Many approaches have been proposed under the name of “transfer learning”, aimed to take advantage of that “deep knowledge”. However, the results achieved so far could be classified as a learning room for improvement. Chapter 4 presents a new method to utilize a trained deep convolutional structure as a feature extractor and achieved state-of-the-art accuracy on the Washington RGBD dataset.

Despite some good results, the potential of transfer learning is just barely exploited. On one hand, a dimensionality reduction can be used to make the deep neural network representation even more computationally efficient and allow a wider range of use cases. Inspired by the structure of the network itself, a new random orthogonal projection method for the dimensionality reduction is presented in the first half of Chapter 5. The t-SNE mimicking neural network for low-dimensional embedding is also discussed in this part with promising results.

In another approach, feature encoding can be used to improve deep neural network features for classification applications. Thanks to the spatially organized structure, deep neural network features can be considered as local image descriptors, and thus the traditional feature encoding approaches such as the Fisher vector can be applied to improve those features. This method combines the advantages of both discriminative learning and generative learning to boost the features performance in difficult scenarios such as when data is noisy or incomplete. The problem of high dimensionality in deep neural network features is alleviated with the use of the Fisher vector based on sparse coding, where infinite number of Gaussian mixtures was used to model the feature space. In the second half of Chapter 5, the regularized Fisher encoding was shown to be effective in improving classification results on difficult classes. Also, the low cost incremental k-means learning was shown to be a potential dictionary learning approach that can be used to replace the slow and computationally expensive sparse coding method.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Engineering
Subjects Image Processing
Pattern Recognition and Data Mining
Signal Processing
Keyword(s) Object recognition
Neural network
Machine learning
Deep learning
Transfer learning
Image processing
Version Filter Type
Access Statistics: 73 Abstract Views, 48 File Downloads  -  Detailed Statistics
Created: Thu, 17 Jan 2019, 12:31:35 EST by Keely Chapman
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us