Speaker: Nicholas Lane (Cambridge)
Huge progress made over recent years in model efficiency/compression - but GNNs virtually absent from this discussion!
Improved performance in this area critical to future success of GNNs.
Key domains where GNN improvements will yield benefits:
- Recommender systems
- Chip design
- Simulating physics/chemistry
- Drug discovery
- Scene generation
Great progess has been made in creating new GNN models that can solve various problems; but we lag behind badly in efficiency.
- Gnns lack regularity we see in, say, CNNs - e.g. no ordering or fixed neighbourhood
- This regularity is often the root of optimisation
Observe that these different datasets require massively different amounts of compute.
For most non-GNN architectures, it is not the case that overheads are closely tied to type of data - CNNs shown on dotted lines.
Use fewer bits tot represent params - particularly for performing inference.
Benefits: smaller models, less data movement, lower latency
- Large variance in tensor values after aggregation
- High (in-)degree nodes disproportionately affect gradient error
Build on top of existing stochastic quantisation-aware training.
- Introduce two important ideas. Due to recognition of some nodes being highly sensitive, we want to stochastically protect nodes based on:
- Stochastic masking that is applied to all nodes - protect these nodes by performing forward-pass at full precision
- Based on where they sit in the topology
- Build a scheme to carefully set the boundaries of quantisation
In CNNs where you set these boundaries doesn't make much of a difference - it turns out the same it not true for graphs!
By applying this quantisation scheme for int8 gives competetive levels of accuracy as FP32.
Means that serving production GNN models can be done much faster with minimal accuracy loss!
Measured speedup of up to 4.7x on CPU (INT8)
- Less memory than alternatives: vs.
- Lower latency than alternatives
- Higher accuracy
Key idea: anisotropic GNNs - treat neighbours differently
→ computational challenges:
- all edges must now be materialised
- compute becomes a lot less uniform - varies in different parts of the topology
- hardware challenges result - SpMM operations coming through only really support GCN-type models well. Poor support for "arbitrary message passing"