Chapter 2: Background

Graph Statistics and Kernel Methods

Node degree (equation)

Eigenvector centrality: objective

Per-node scalar proprotional to the average centrality of its neighbours

Eigenvector centrality:

Eigenvector centrality: description of each node's ec-value

The corresponding element of the adjacency matrix's (positive) eigenvector

Eigenvector centrality: how can we guarantee positive centrality values?

By taking the eigenvector corresponding to 's largest eigenvalue

(see Perron-Frobenius theorem)

Clustering coefficient: objective

Per-node scalar indicating the fraction of neighbours that are themselves connected

Clustering coefficient: equation

Clustering coefficient: geometric interpretation

Measures the proportion of closed triangles (vs theoretical max) in a node's neighbourhood.

Ego graph

The subgraph containing a node, its neighbours, and all the edges between its neighbours.

Motif / graphlet

Patterns / subgraphs within a graph

Idea behind generalised clustering coefficient

Measure the proportion of arbitrary motifs within a node's ego graph.

Three basic statistical features for nodes in a graph (Hamilton book)

Degree, eigenvector centrality, clustering coefficient

Graph-level features and graph kernels

Bag-of-nodes approach to graph-level features

Aggregate node-level statistics (e.g. histograms of degrees, centralities, etc.)

What does it mean for two graphs to be isomorphic?

There is a bijection (one-to-one mapping) between nodes of the graphs where adjacency is preserved

Term for one-to-one node mapping between graphs that preserves adjacency.

Isomorphism

What is the canonical algorithm for determining isomorphism

The Weisfeiler-Lehman (WL) algorithm

Weisfeiler-Lehman (WL) algorithm

Assign initial labels to each node (typically degree)

Until the groups of nodes with similar labels doesn't change (or set limit):

For each node:

Assign a new label to each node based on the multi-set of (previous) labels in its neighbourhood

How is the WL algorithm adapted to generate the WL Kernel

Running the WL algorithm for a set number of iterations (or to completion?) and computing summary statistics (e.g. histogram) over the labels

Graphlet kernel

Count the number of times a given graphlet occurs in the full graph (hard combinatorial problem)

Random walk kernel (graphs)

Run random walks and count the occurrence of different degree sequences

What is a kernel in the context of graphs?

An algorithm for computing statistics about the entire graph

Give four examples of graph kernels

Bag-of-nodes kernel

WL kernel

Graphlet kernel

Random walk kernel

Neighbourhood overlap detection

What do we term a matrix containing node neighbourhood overlap statistics?

A similarity matrix

What do similarity matrices tell us that graph kernel methods don't

Information about relationships between nodes - i.e. edges

What kind of method is the sorensen index?

A neighbourhood overlap / node similarity method

Sorensen index: equation

Sorensen index: description

Normalizes the count of common neighbors by the sum of the node degrees

What kind of method is the Resource Allocation (RA) index?

A neighbourhood overlap / node similarity method

Resource Allocation (RA) index: equation

Resource Allocation (RA) index: description

Sums the inverse degrees of the common neighbors