Chip Placement with DRL

Contents:My Summary Introduction Context Problem Current Approach Proposed Approach Generalisation Contribution Related Work Categories of previous approaches Difference of proposed approach Method Problem statement Overview Cost function Architecture Supervised embedding learning Policy Network Results

My Summary

The high-level problem here is that the chip-placement problem is currently done with a lot of manual work and evaluating our true objective is a very slow and expensive process. This method gives a much faster approach using a neat approximation.

More broadly, the chip placement problem is an optimisation problem—specifically, a non-differentiable discrete constrained multi-objective one. That it is non-differentiable means that we can't just use standard gradient-based methods to optimise the objective(s).

The authors' solution is to turn this into an RL problem - each discrete step (placing the next-smallest macro) becomes an action, the state is the current chip layout, and the reward is some approximation of our objectives.

That this appears to work well suggests similar problems could also have RL solutions. This is the most interesting aspect of the paper, and could make RL very real-world-useful for many non-differentiable optimisation problems!

To train the model, the authors use two steps:

Train a value network on a generated dataset using various chip layouts, using supervised learning with the slow "true" reward as a target. This generates embeddings used in ⬇️

Train an RL policy to output macro placements. The reward is an approximation of the true reward.

The architecture diagram here is very helpful in understanding how this all works:

In terms of the architecture, a (single layer?) MPNN is used to learn the graph embeddings, and deconv nets are used to output the action.

The authors claim that the GNN architecture is novel, but it seems extremely standard to me.

Step 2) is done on training tasks, and can then be zero-shot transferred or fine-tuned on a subsequent chip/task.

I don't know enough about the domin to compare well with respect to the baselines, but what can be said is that a) zero-shot performance appears to be really good - we can generalise really well to unseen architectures (thanks GNNs!), b) finetuning can give fairly substantial boosts.

In terms of real-world use, this method can take 6 hours with fine-tuning on a new task to generate its best chip-placement, relative to weeks for standard human-in-the-loop methods. This appears pretty promising. The authors suggest their approach is competitive to existing non-ML methods. I'm a little sceptical, and Yann LeCun has posted ⬇️, so I'm reserving judgement for now.

Introduction

Context

End of Moore's law / Dennard scaling means specialist AI hardware v. valuable

AI chips currently take years to design ➡️ we have to try and anticipate requirements of models years in advance

Aim = shortening development cycle

If we can use AI for chip design, will create virtuous circle

Problem

Task: place a netlist graph (connectivity of the circuit) of macros (e.g. SRAMs) and standard cells (logic gates) onto a chip canvas (bounded 2D space)

Objective: optimize PPA (power, performance, area), while adhering to placement constraints

Current Approach

Human experts still required, many weeks of iteration

Huge netlist graphs (millions to billions of nodes)

⬆️ cost of computing true target metrics (hours to days for single design)

Proposed Approach

Pose chip placement as RL problem

Key motivation for using RL: our (multi-) objective is non differentiable

Iterations of training ➡️ for each one, all of the macros are placed by the agent, then the standard cells are placed by a secondary (non-learned) method

Fast approximate reward used

Generalisation

First placement approach which can generalise to new unseen netlists

As agent is exposed to more different chips, becomes better and better on new ones

Contribution

Placement method that learns from experience

Superior to baselines on real AI chips, and comparable to human experts

Related Work

Categories of previous approaches

partitioning-based methods (divide-and-conqueor)

stochastic/hill-climing (simmulated annealing)

analytic solvers (e.g. no-linear optimisers) inc. SOTA RePLAce

Difference of proposed approach

Like analytic solvers uses gradient updates to optimise objective

However, optimisation no longer from-scratch - leverages knowledge from prior placements on different chips (domain adaptation)

Unlike other approaches, optimises target metrics directly without having to make convex approximations

Method

Problem statement

Map the nodes of a netlist(connectivity of the circuit) onto a chip canvas (bounded 2D space), such that the PPA (power, performance, area) is optimised.

Overview

MDP:

States: possible partial placements of the netlist on the canvas. A concatenation of features including:

graph embedding of the netlist (inc. placed & unplaced nodes)

node embedding of the current macro

netlist metadata

mask representing the feasibility of placing the current node onto each cell of the grid

Actions: (given the current macro to place) the set of locations in the canvas space (discrete grid cells) the macro can be placed, without violating any hard density constraints

State transition: just the obvious deterministic outcome?

Reward:

True reward: output from slow commercial tool (EDA)

Proxy reward: 0 for all actions except the last, where = - weighted sum of proxy wirelength and congestion

T = number of macros in netlist

Algorithm: PPO

Cost function

Average of returns over all netlists in dataset, given current policy:

where is the dataset of graphlist of size and

Architecture

Supervised embedding learning

Why? We can train this easily across many different netlists/placements, giving strong generalisation.

The ValueNet is trained (supervised) first, which enables the learning of the embeddings:

Dataset = 10,000 chip placements; input = standard state, label = true (slow) reward ← built by picking 5 accelerator netlists and then generating 2,000 placements for each by using a basic policy network.

GraphConv: standard MPNN:

embeddings = 32-dim

= feed-forward

= learnable weights

Single GNN layer??

outputs = node & edge embeddings

For a specific task we can fine-tune these weights further.