Integration of heterogeneous multidimensional data marts

Riazati, D 2012, Integration of heterogeneous multidimensional data marts, Doctor of Philosophy (PhD), Computer Science and Information Technology, RMIT University.


Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Riazati.pdf Thesis Click to show the corresponding preview/stream application/pdf;... 3.51MB
Title Integration of heterogeneous multidimensional data marts
Author(s) Riazati, D
Year 2012
Abstract  Data analysts often require access to integrated multidimensional data from local and external data warehouses.

The integration process is often undertaken by expert database practitioners who will need to analyze the structure of the data, and match schemas and data before creating an integrated view of the data for visualization and analysis.

Such a manual process may be acceptable for databases used in transaction processing applications but does not help decision makers who need access to the information quickly and cost effective in a constantly changing environment.

This thesis addresses several challenges towards automating the integration of data warehouses based on a dimensional model known as Star schema.

We recognize that the structure of multidimensional data, namely dimension hierarchies, is critical to the accuracy of the integration but is not always available or accessible.

To address this problem, we infer dimension hierarchies from their instances, and demonstrate that they are sufficient to ensure the accuracy of the integration even though they may vary from the intended hierarchies.

To improve the accuracy of matching Star schemas, we propose a more precise representation of Star schemas and demonstrate its effectiveness by comparing it against the existing approaches that treat Star schemas as relational models.

To match instances of dimensions, we demonstrate that a graph matching algorithm is effective and performs with a high level of accuracy.

We propose algorithms which enforce the tree structure of integrated data which is necessary for correct aggregation, and reduce false positive cases occurring during the instance matching.

The effectiveness of our algorithms is shown through experiments with real life data.

Despite perfectly matching schemas and hierarchies, there are often dimensions with mismatching data which restrict the scope of the integration.

We propose to relax the requirement for dimension compatibility, and introduce measures that quantify the loss of data resulting from the less strict requirement.

These measures enable data analysts to identify lossless fragments of data, and thereby, extend the scope of the integrated data.

To provide a more comprehensive view of data for analysis, we link the integrated data with the data exclusive to each source by extending the navigation operation for multidimensional data.

These contributions help towards shifting the integration problem away from expert database practitioners to empowered data analysts in combining multidimensional data from multiple sources in real time, and in a cost effective manner.

Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Computer Science and Information Technology
Keyword(s) Data Mart
Data Warehouse
Integration
Star Schema
Multidimensional Data
Schema Matching
Data Matching
Hierarchy
Drill-Across
Data Visualization
Versions
Version Filter Type
Access Statistics: 257 Abstract Views, 574 File Downloads  -  Detailed Statistics
Created: Tue, 30 Apr 2013, 15:53:37 EST by Brett Fenton
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us