Machine Learning on Dynamic Graphs: Temporal Graph Networks

Speaker: Emanuele Rossi (Imperial/Twitter)

Problem: Many Graphs are Dynamic

E.g. social/interaction networks

How do we make use of timing information to make better representations of nodes?

Can we predict when and how the graph will change in the future?

This talk focuses on the first question.

From static to dynamic graphs

CTDGs: many types of events

Node/edge - creation/deletion/feature change

Why is learning on dynamic graphs different?

Model must:

Handle different kinds of events

Use time information

Efficiently incorporate the graph changing - don't recompute everything

Different tasks: predict when something will happen

Model

Encoding a temporal graph

Simplification: assume the graph consists entirely/only of edge creation events

First idea: Process events in order using an RNN with a different hidden state for each node

Pros: Built in bias of sequentiality

Cons:

Not using the graph of interactions directly

Memory staleness: nodes only updated when involved in interaction

Note: typo - second line should start with

RNN for a node takes previous state, plus message

Second idea: Use a GNN with attention and use timestamps as edge features

Pros:

More efficient - no need for sequential processing

Uses the graph explicitly, mitigates staleness because neighbours will still be updated

Cons

Can only handle edge addition events

Not suitable for online updates

Temporal Graph Networks

Attempts to combine best of both worlds

Defines general framework which can handle all types of events

Each type of event generates a message used to update node representations

Uses GNN directly on graph of interactions

Module 1: Memory

One per node

Analogous to RNN hidden state

Contains a vector for each node the model has seen so far—a compressed representation of all past interactions of a node

This is not a parameter - it is also updated at test time

Module 2: Message Function

Each event creates a message

Messages used to update memory

Given an interaction , messages are computed

This is done for source and destination

Module 3: Memory Updater

Memory updated using newly computed messages

This is an RNN (GRU used)

Module 4: Graph Embedding

Adding a GNN over everything from previous modules

Essentially, a GNN is run over the current graph, using the memory as additional features

Uses graph attention over node features and memory

Link Prediction with TGN

Data split is chronologically

All previous events used in prediction of next one

Training this efficiently is not trivial - see paper for details!

Scalability

Memory is a concern for scalability as it is not a parameter → we can just think of it as an additional feature vector for each node

So we only ever need memory for nodes involved in a batch

Model is scalable as GraphSage → can scale to very large graphs!

Experiments

Link prediction:

It's not clear to me how they've wired up the non-temporal methods here!

Other experiments also presented. Performance similarly strong.

Ablation study also included. Memory and embeddings both very important.

Q & A

Timestamp representation is a difficult problem - some work has been done on this. Raw timestamps are bad, embeddings a bit better. More work needed though!