Graph Neural Networks For Learning About Never Before Seen Phenomena

 
Speaker: Marinka Zitnik (Harvard)

notion image
What makes graph representation techniques well suited for the analysis of high-dimensional interconnected medical data?
→ Biological systems are interconnected at different scales:
  • e.g. RNA-proteins-compounds-disease
  • Patient networks
  • Hierarchies of cell systems
  • Disease pathways
  • Biomenical knowledge graphs
  • Gene interaction networks
  • Cell-cell-similarity netowrks
notion image

Meta learning for graphs

Never-before-seen disease → we want to repurpose existing drugs as approval of new drugs time-costly.
Why is finding treatments for new disease challenging?
  • Generalising to new phenomena is hard
  • Prevailing GNN methods require abundant label information
  • However, labeled examples are scarce
The question is then: how can we design powerful meta-learners which can transfer learning from one labeled example to others? How to make predictions on a new graph when we only have an handful of labels?
notion image
notion image
 
Key idea: local subgraphs - consider a distribution over subgraphs as the distribution over tasks from which a global set of parameters are learned.
Use this strategy to do link prediction.
Why are subgraphs useful:
notion image
When labels are scarse, label propagation is not sufficient → here structure similarity is more useful.
G-Meta learns a metric to classify query subgraphs using the closest point from the support set.

COVID-19 Drug Repurposing

COVID-19 Repurposing Dataset
notion image
What human proteins does the virus bind to? Interactions between human and protein graphs.
How to represent COVID-19? Network neighbourhood of human PPI network targeted by virus.
One of these diseases is COVID-19. The closest drugs are displayed.
One of these diseases is COVID-19. The closest drugs are displayed.
TL;DR: AI-based methods are really good here!
TL;DR: AI-based methods are really good here!
notion image
Results:
notion image
 
Interesting Finding: 76/77 drugs that successfully reduced viral infections do not directly bind proteins targeted by COVID.
These drugs rely on network-based actions that can's be identified by traditional docking-based strategies.

Key ML Lessons

  • Domain scientists without AI expertise still need a way to interact with AI systems and need to be able to feedback in the ML loop.
  • Zero-shot learning: generalising to new graphs is hard.

Between Organisms

Additional idea: how can we leverage this different type of graph transfer learning? Can we learn for humans from what we know on other organisms? G-Meta can also be used here.
notion image
 
Key difference here: trained across many different graphs.
Key difference here: trained across many different graphs.
Here G-META is 29.9% over previous works and can scale to large graphs - 100x increase in graph size.

Therapeutics Data Commons

First unified framework to systematically evaluate ML across range of therapeutics. Creates new opportunities for graph-learning to be applied.
notion image