Identifying significant behaviour in complex bipartite networks

Liebig, J 2016, Identifying significant behaviour in complex bipartite networks, Doctor of Philosophy (PhD), Mathematical and Geospatial Sciences, RMIT University.


Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Liebig.pdf Thesis application/pdf 10.96MB
Title Identifying significant behaviour in complex bipartite networks
Author(s) Liebig, J
Year 2016
Abstract The study of complex networks has received much attention over the past few decades, presenting a simple, yet efficient means of modelling and understanding complex systems. Networks are employed in various different areas, for instance, in the modelling of disease spread in human and animal contact networks. Networks also find applications in marketing, where various measures are used to recommend items to customers of, for instance, online shopping portals. Many other real world phenomena can be described and analysed using complex networks.

Most scientific literature focuses on the analysis of, so called, one-mode networks. However, many systems are best represented as bipartite networks. A network is bipartite if its vertices can be partitioned into two disjoint sets, where interaction takes place solely between vertices belonging to different sets. For instance, the network of scientists and papers, resulting from collaborations, is bipartite, with connections only existing between authors and papers. Similarly, the network of actors and the movies in which they appear is bipartite.

This thesis is motivated by the lack of network measures designed particularly for the analysis of bipartite networks. Since many one-mode network measures are not applicable to bipartite structures, often the only available path to analysing bipartite data is the examination of its projections. A projection converts a bipartite network into an ordinary one-mode network, causing loss of valuable information amongst other problems.

We are interested in both the theoretical aspects of bipartite networks and the applications to real world data. Throughout this thesis we analyse several real world networks with the aim of uncovering significant behaviour. We take two different approaches to gain a better understanding of complex bipartite networks. First, we deal with the problems that arise from the projection of bipartite networks, with the aim of overcoming these. Second, we develop network measures that are designed especially for bipartite networks.

Despite the many problems that arise from converting a bipartite network into a one-mode network, the study of projections is ubiquitous throughout the network science literature and projections are often preferred above the direct analysis of bipartite networks. The one-mode projection of a bipartite network is constructed by dropping one of its node sets and connecting two nodes of the remaining set if they share at least one neighbour in the bipartite network, leading to an inflation of edges in the projection. Furthermore, the indirect inference of edges between nodes in the one-mode projection leads to noise, that is, many edges with insignificant meaning are introduced. We develop a novel technique of identifying the significant connections that form the backbone of one-mode projections by considering the degree distributions of the bipartite network. We show that this identification of significant edges cannot be achieved by trivial methods such as an application of a threshold to the edge weights. Furthermore, we show that the weights of one-mode projections of real world bipartite networks follow a Poisson binomial distribution.

Real world one-mode projections often have well hidden community structures. These structures can be uncovered by dropping insignificant connections, as identified by our technique. In addition, our technique allows a ranking of edges by importance. We apply this backbone technique to three different real world networks, and show that our method is a very efficient way of identifying communities within diverse networks, such as the political parties in a Facebook network of posts by candidates and user likes.

The development of new network measures that can be applied directly to bipartite networks is a crucial step towards a better understanding of these structures. One of the most important and widely used network measures is the clustering coefficient. Due to the particular structure of bipartite networks, the clustering coefficient cannot be directly applied to them. Although several definitions for the bipartite clustering coefficient have been presented in the literature, they are inconsistent and hence we explore this topic in great depth. We identify different types of bipartite networks based on their development over time, consequently requiring different definitions of the bipartite clustering coefficient. We precisely define the different types of networks before providing new definitions of the clustering coefficients for each type of bipartite network.

We apply our clustering coefficients to discover the most influential nodes in real world bipartite networks by introducing the notion of the driving score. The driving score indicates the extent to which each individual node contributes to the overall clustering behaviour of the network. Another application of our clustering coefficient is the prediction of the future popularity of new items in rating networks. We are able to considerably improve existing predictions.

Crime networks form a very interesting group of bipartite networks. Knowledge about their dynamics is especially important for the implementation of efficient crime prevention measures. We present two case studies of crime networks revealing many interesting insights, by using a combination of both the approaches outlined above. For instance, our analysis reveals significant co-occurrences of illegal activity and identifies areas that exhibit similar crime dynamics.

The calculation of many network measures, including the ones we introduce in this thesis, require the enumeration of sub-graphs. In the last chapter of this thesis, we investigate several efficient ways of enumerating sub-graphs in bipartite networks, by studying, combining and modifying existing algorithms. We also present preliminary work on the theoretical problem of path enumeration.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Mathematical and Geospatial Sciences
Subjects Pattern Recognition and Data Mining
Combinatorics and Discrete Mathematics (excl. Physical Combinatorics)
Keyword(s) Bipartite networks
Significant behaviour
One-mode projection
Backbone extraction
Clustering coefficient
Versions
Version Filter Type
Access Statistics: 195 Abstract Views, 308 File Downloads  -  Detailed Statistics
Created: Fri, 03 Feb 2017, 08:35:36 EST by Adam Rivett
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us