IPUs

Overview

IPU Design Philosophy
Designed to satisfy:
  1. Irregular fine-grained computation → true MIMD
  1. Irregular data accesses
  1. High bandwidth, low latency memory access
IPU key design decisions:
  1. No penalty for different instructions
  1. No penalty for irregular memory access
  1. No shared memory, just v fast local scratchpad SRAM
Name of the 2nd gen basic IPU machine
IPU-M2000
Name of the 2nd gen smaller IPU pod machine
IPU-POD16
Name of the 2nd gen larger IPU pod machine
IPU-POD64
Name of the 2nd gen IPU processor
Colossus MK2 GC200 IPU
Name of the smallest IPU pod machine that just turns M2000 into a complete system
IPU-POD4 DA
IPU Systems: what is exchange memory?
Combination of:
In-Processor-Memory: on-chip memory
Streaming Memory: external but dedicated to the IPU
Host Memory: memory on the host server
IPU: what is Colossus?
The current architecture of the Graphcore IPU (GC2 & GC200)

Colossus™ MK2 GC200 IPU

Colossus MK2 GC200 IPU: how many IPU-Tiles
1472
Colossus MK2 GC200 IPU: number of IPU-Cores
1472
Colossus MK2 GC200 IPU: number of IPU-Cores per IPU-Tile
1
Colossus MK2 GC200 IPU: name of on-tile memory
In-Processor-Memory
Colossus MK2 GC200 IPU: threads per core
6
Colossus MK2 GC200 IPU: amount of In-Processor-Memory per IPU
900MB
Colossus MK2 GC200 IPU: amount of In-Processor-Memory per Tile (1sf)
600KB
Colossus MK2 GC200 IPU: In-Processor-Memory memory bandwidth per IPU
47.5 TB/s
Colossus MK2 GC200 IPU: name of communication fabric
IPU-Exchange
Colossus MK2 GC200 IPU: IPU-Exchange bandwidth
8 TB/s using any communication pattern
Colossus MK2 GC200 IPU: host interface port type
PCIe Gen4
Colossus MK2 GC200 IPU: how many PCIe ports
16
Colossus MK2 GC200 IPU: PCIe bandwidth
64 GB/s
Colossus MK2 GC200 IPU: name of chip-to-chip technology
IPU-Links
Colossus MK2 GC200 IPU: number of IPU-links
10
Colossus MK2 GC200 IPU: IPU-links bandwidth
320 GB/s
Colossus MK2 GC200 IPU: FLOPS (FP16)
250 TFLOPS
Steps of Bulk Synchronous Parallelism
  1. all do local computation
  1. exchange across IPU-exchange
  1. barrier sync = wait for all to finish

IPU-M2000

IPU-M2000: how many GC200s?
4
IPU-M2000: how many FLOPS (FP16)
1 PFLOP
IPU-M2000: how much exchange memory is supported?
Up to 450GB
IPU-M2000: system for managing IPU links + streaming memory
IPU Gateway
IPU-M2000: type of DRAM
DDR4 DIMM (2x)