Speaker: Pratik Fegade
Recursive models such as:
- DAG-RNN for image segmentation
- Semantic text classification using TreeLSTM
Motivation
Consider parse tree data structure, where leaf nodes have word embeddings and we recursively do linear transforms of child embedding sums.
def treeFC(n): if isleaf(n): return Emb[words[n]] else: lh = treeFC(n.left) rh = treeFC(n.right) add = lh + rh return W * add
Currently, only the embedding part and linear transform are ofloaded to GPU (e.g. CUDNN/CUBLAS inscructions)
Involves back-and-forth between slow off-chip global memory and GPU ops.
What we want: "aggressive kernel fusion" → do it all as one kernel call.
Challenges
Executing recuresive control flow efficiently on accelerators.
Exploit data reuse
Can no longer use vendor libraries
Cortex compiler
For inference of recursive models
Note: in above example we have complete knowledge of control flow based on tree structure (don't have to look at data)
In such cases → data structure linearisation: compute order of nodes beforehand.
Then recursive lowering:
- to a loop rather than recursive.
- Loop can then be unrolled, etc.
- Conditional check specialisation: uses linearised data struct to remove conditional statements, creating different code for the different cases.
Evaluation
Compare against PyTorch, DyNet & Cavs. Latter two are designed for developing dynamic NNs.
Much better than all - especially PyTorch.
Why?
Main benefit: low scheduling overheads
- Because of single GPU kernel
- Also enables end-to-end optimisation