Cortex: A Compiler for Recursive Deep Learning Models

Speaker: Pratik Fegade

Recursive models such as:

Consider parse tree data structure, where leaf nodes have word embeddings and we recursively do linear transforms of child embedding sums.


def treeFC(n):
  if isleaf(n):
    return Emb[words[n]]
else:
  lh = treeFC(n.left)
  rh = treeFC(n.right)
  add = lh + rh
  return W * add

Currently, only the embedding part and linear transform are ofloaded to GPU (e.g. CUDNN/CUBLAS inscructions)

Involves back-and-forth between slow off-chip global memory and GPU ops.

What we want: "aggressive kernel fusion" → do it all as one kernel call.

Executing recuresive control flow efficiently on accelerators.

Exploit data reuse

Can no longer use vendor libraries

For inference of recursive models

Note: in above example we have complete knowledge of control flow based on tree structure (don't have to look at data)

In such cases → data structure linearisation: compute order of nodes beforehand.

Then recursive lowering:

Conditional check specialisation: uses linearised data struct to remove conditional statements, creating different code for the different cases.

Compare against PyTorch, DyNet & Cavs. Latter two are designed for developing dynamic NNs.

Much better than all - especially PyTorch.

Main benefit: low scheduling overheads