Speaker: Emanuele Rossi (Imperial/Twitter)
E.g. social/interaction networks
- How do we make use of timing information to make better representations of nodes?
- Can we predict when and how the graph will change in the future?
This talk focuses on the first question.
Node/edge - creation/deletion/feature change
- Handle different kinds of events
- Use time information
- Efficiently incorporate the graph changing - don't recompute everything
- Different tasks: predict when something will happen
Simplification: assume the graph consists entirely/only of edge creation events
First idea: Process events in order using an RNN with a different hidden state for each node
Pros: Built in bias of sequentiality
- Not using the graph of interactions directly
- Memory staleness: nodes only updated when involved in interaction
RNN for a node takes previous state, plus message
Second idea: Use a GNN with attention and use timestamps as edge features
- More efficient - no need for sequential processing
- Uses the graph explicitly, mitigates staleness because neighbours will still be updated
- Can only handle edge addition events
- Not suitable for online updates
Attempts to combine best of both worlds
- Defines general framework which can handle all types of events
- Each type of event generates a message used to update node representations
- Uses GNN directly on graph of interactions
- One per node
- Analogous to RNN hidden state
- Contains a vector for each node the model has seen so far—a compressed representation of all past interactions of a node
- This is not a parameter - it is also updated at test time
- Each event creates a message
- Messages used to update memory
- Given an interaction , messages are computed
- This is done for source and destination
- Memory updated using newly computed messages
- This is an RNN (GRU used)
- Adding a GNN over everything from previous modules
- Essentially, a GNN is run over the current graph, using the memory as additional features
- Uses graph attention over node features and memory
- Data split is chronologically
- All previous events used in prediction of next one
- Training this efficiently is not trivial - see paper for details!
- Memory is a concern for scalability as it is not a parameter → we can just think of it as an additional feature vector for each node
- So we only ever need memory for nodes involved in a batch
- Model is scalable as GraphSage → can scale to very large graphs!
It's not clear to me how they've wired up the non-temporal methods here!
Other experiments also presented. Performance similarly strong.
Ablation study also included. Memory and embeddings both very important.
- Timestamp representation is a difficult problem - some work has been done on this. Raw timestamps are bad, embeddings a bit better. More work needed though!