‪Efficient GNNs: How Can Graphs Go From Last To Fast?

Speaker: Nicholas Lane (Cambridge)

Huge progress made over recent years in model efficiency/compression - but GNNs virtually absent from this discussion!
Improved performance in this area critical to future success of GNNs.
Key domains where GNN improvements will yield benefits:
  • Recommender systems
  • Chip design
  • Simulating physics/chemistry
  • Drug discovery
  • Scene generation
Great progess has been made in creating new GNN models that can solve various problems; but we lag behind badly in efficiency.

ML Efficiency is hard, for GNNs even harder - why?

  • Gnns lack regularity we see in, say, CNNs - e.g. no ordering or fixed neighbourhood
  • This regularity is often the root of optimisation

GNN overheads depend on type of input data

notion image
Observe that these different datasets require massively different amounts of compute.
For most non-GNN architectures, it is not the case that overheads are closely tied to type of data - CNNs shown on dotted lines.

Degree-Quant

What is quantisation?

Use fewer bits tot represent params - particularly for performing inference.
Benefits: smaller models, less data movement, lower latency

GNN-specific quantisation challenges

  1. Large variance in tensor values after aggregation
  1. High (in-)degree nodes disproportionately affect gradient error
notion image

Method

Build on top of existing stochastic quantisation-aware training.
  1. Introduce two important ideas. Due to recognition of some nodes being highly sensitive, we want to stochastically protect nodes based on:
      • Stochastic masking that is applied to all nodes - protect these nodes by performing forward-pass at full precision
      • Based on where they sit in the topology
  1. Build a scheme to carefully set the boundaries of quantisation
In CNNs where you set these boundaries doesn't make much of a difference - it turns out the same it not true for graphs!

Results

By applying this quantisation scheme for int8 gives competetive levels of accuracy as FP32.
Means that serving production GNN models can be done much faster with minimal accuracy loss!
Measured speedup of up to 4.7x on CPU (INT8)

Efficient Graph Convolutions

Key contributions:
  1. Less memory than alternatives: vs.
  1. Lower latency than alternatives
  1. Higher accuracy
(wow!)

Where have GNN improvements originated?

Key idea: anisotropic GNNs - treat neighbours differently
→ computational challenges:
  • all edges must now be materialised
  • compute becomes a lot less uniform - varies in different parts of the topology
  • hardware challenges result - SpMM operations coming through only really support GCN-type models well. Poor support for "arbitrary message passing"
    • notion image

Method: The ECG Building Block

 
notion image
 
The difference here is that each node has its own set of weights? Is this not just the kind of thing we usually see with message-passing?
The difference here is that each node has its own set of weights? Is this not just the kind of thing we usually see with message-passing?
 
notion image