Chapter 2: Background

Graph Statistics and Kernel Methods

Node degree (equation)
Eigenvector centrality: objective
Per-node scalar proprotional to the average centrality of its neighbours
Eigenvector centrality:
Eigenvector centrality: description of each node's ec-value
The corresponding element of the adjacency matrix's (positive) eigenvector
Eigenvector centrality: how can we guarantee positive centrality values?
By taking the eigenvector corresponding to 's largest eigenvalue
(see Perron-Frobenius theorem)
Clustering coefficient: objective
Per-node scalar indicating the fraction of neighbours that are themselves connected
Clustering coefficient: equation
Clustering coefficient: geometric interpretation
Measures the proportion of closed triangles (vs theoretical max) in a node's neighbourhood.
Ego graph
The subgraph containing a node, its neighbours, and all the edges between its neighbours.
Motif / graphlet
Patterns / subgraphs within a graph
Idea behind generalised clustering coefficient
Measure the proportion of arbitrary motifs within a node's ego graph.
Three basic statistical features for nodes in a graph (Hamilton book)
Degree, eigenvector centrality, clustering coefficient

Graph-level features and graph kernels

Bag-of-nodes approach to graph-level features
Aggregate node-level statistics (e.g. histograms of degrees, centralities, etc.)
What does it mean for two graphs to be isomorphic?
There is a bijection (one-to-one mapping) between nodes of the graphs where adjacency is preserved
Term for one-to-one node mapping between graphs that preserves adjacency.
What is the canonical algorithm for determining isomorphism
The Weisfeiler-Lehman (WL) algorithm
Weisfeiler-Lehman (WL) algorithm
  1. Assign initial labels to each node (typically degree)
  1. Until the groups of nodes with similar labels doesn't change (or set limit):
    1. For each node:
      1. Assign a new label to each node based on the multi-set of (previous) labels in its neighbourhood
How is the WL algorithm adapted to generate the WL Kernel
Running the WL algorithm for a set number of iterations (or to completion?) and computing summary statistics (e.g. histogram) over the labels
Graphlet kernel
Count the number of times a given graphlet occurs in the full graph (hard combinatorial problem)
Random walk kernel (graphs)
Run random walks and count the occurrence of different degree sequences
What is a kernel in the context of graphs?
An algorithm for computing statistics about the entire graph
Give four examples of graph kernels
  1. Bag-of-nodes kernel
  1. WL kernel
  1. Graphlet kernel
  1. Random walk kernel

Neighbourhood overlap detection

What do we term a matrix containing node neighbourhood overlap statistics?
A similarity matrix
What do similarity matrices tell us that graph kernel methods don't
Information about relationships between nodes - i.e. edges
What kind of method is the sorensen index?
A neighbourhood overlap / node similarity method
Sorensen index: equation
Sorensen index: description
Normalizes the count of common neighbors by the sum of the node degrees
What kind of method is the Resource Allocation (RA) index?
A neighbourhood overlap / node similarity method
Resource Allocation (RA) index: equation
Resource Allocation (RA) index: description
Sums the inverse degrees of the common neighbors