Speaker: Andrei Ivanov
Existing Transformer implementations attain much less than peak GPU flops
On BERT-large they demonstrate performance improvement:
- 30% over PyTorch
- 20% over Tensorflow + XLA
- 8% over DeepSpeed
Data movement is the bottleneck
Group operators into 3 classes:
Tensor contractions = matmuls
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F4947e2e2-fb80-4bb3-8545-bb67239e1e66%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20221016%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20221016T192358Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D4b49f7ac3fcaa303c2052e1c31b19ea866b19ea17a3051aeec2cdc151cbc5704%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&id=f5cb5b8f-02f8-440d-8dba-7986839e7b76&cache=v2)
Norm = softmax / layernorm
encoder, biases, "repulse/repose activations", residual connections
Dataflow graph: multi-head attention
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F9a998458-aaa8-41fc-b839-dd30433dcba4%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20221016%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20221016T192358Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D74a87cc3b3f934c94a1686ea6b53c595623f9106a5b499570c68982819d92ef0%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&id=2c073bcd-73f5-4115-ad02-e139a298b607&cache=v2)
Operator fusion opportunities
![notion image](https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F5ee0dccd-722c-4ae6-93b2-35ad561b1672%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20221016%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20221016T192358Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D75b80a998cd0a23fba0f466099e97263c9b9af2b0f921af2ae55c141ea188d37%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&id=05691779-0a5d-4f8a-b167-08e824fb6158&cache=v2)