Cortex: A Compiler for Recursive Deep Learning Models

Speaker: Pratik Fegade

Recursive models such as:
  • DAG-RNN for image segmentation
  • Semantic text classification using TreeLSTM

Motivation

Consider parse tree data structure, where leaf nodes have word embeddings and we recursively do linear transforms of child embedding sums.
def treeFC(n): if isleaf(n): return Emb[words[n]] else: lh = treeFC(n.left) rh = treeFC(n.right) add = lh + rh return W * add
Currently, only the embedding part and linear transform are ofloaded to GPU (e.g. CUDNN/CUBLAS inscructions)
Involves back-and-forth between slow off-chip global memory and GPU ops.
What we want: "aggressive kernel fusion" → do it all as one kernel call.

Challenges

Executing recuresive control flow efficiently on accelerators.
Exploit data reuse
Can no longer use vendor libraries

Cortex compiler

For inference of recursive models
Note: in above example we have complete knowledge of control flow based on tree structure (don't have to look at data)
In such cases → data structure linearisation: compute order of nodes beforehand.
Then recursive lowering:
  1. to a loop rather than recursive.
  1. Loop can then be unrolled, etc.
  1. Conditional check specialisation: uses linearised data struct to remove conditional statements, creating different code for the different cases.

Evaluation

Compare against PyTorch, DyNet & Cavs. Latter two are designed for developing dynamic NNs.
Much better than all - especially PyTorch.

Why?

Main benefit: low scheduling overheads
  • Because of single GPU kernel
  • Also enables end-to-end optimisation