‪Efficient GNNs: How Can Graphs Go From Last To Fast?

Speaker: Nicholas Lane (Cambridge)

Huge progress made over recent years in model efficiency/compression - but GNNs virtually absent from this discussion!

Improved performance in this area critical to future success of GNNs.

Key domains where GNN improvements will yield benefits:

Recommender systems

Chip design

Simulating physics/chemistry

Drug discovery

Scene generation

Great progess has been made in creating new GNN models that can solve various problems; but we lag behind badly in efficiency.

ML Efficiency is hard, for GNNs even harder - why?

Gnns lack regularity we see in, say, CNNs - e.g. no ordering or fixed neighbourhood

This regularity is often the root of optimisation

GNN overheads depend on type of input data

Observe that these different datasets require massively different amounts of compute.

For most non-GNN architectures, it is not the case that overheads are closely tied to type of data - CNNs shown on dotted lines.

Degree-Quant

What is quantisation?

Use fewer bits tot represent params - particularly for performing inference.

Benefits: smaller models, less data movement, lower latency

GNN-specific quantisation challenges

Large variance in tensor values after aggregation

High (in-)degree nodes disproportionately affect gradient error

Method

Build on top of existing stochastic quantisation-aware training.

Introduce two important ideas. Due to recognition of some nodes being highly sensitive, we want to stochastically protect nodes based on:

Stochastic masking that is applied to all nodes - protect these nodes by performing forward-pass at full precision

Based on where they sit in the topology

Build a scheme to carefully set the boundaries of quantisation

In CNNs where you set these boundaries doesn't make much of a difference - it turns out the same it not true for graphs!

Results

By applying this quantisation scheme for int8 gives competetive levels of accuracy as FP32.

Means that serving production GNN models can be done much faster with minimal accuracy loss!

Measured speedup of up to 4.7x on CPU (INT8)

Efficient Graph Convolutions

Key contributions:

Less memory than alternatives: vs.

Lower latency than alternatives

Higher accuracy

(wow!)

Where have GNN improvements originated?

Key idea: anisotropic GNNs - treat neighbours differently

→ computational challenges:

all edges must now be materialised

compute becomes a lot less uniform - varies in different parts of the topology

hardware challenges result - SpMM operations coming through only really support GCN-type models well. Poor support for "arbitrary message passing"