applying machine learning to circuit design
DESCRIPTION
Applying Machine Learning to Circuit Design. David Hettlinger Amy Kerr Todd Neller. Channel Routing Problems (CRPs). Chip Design Each silicon wafer contains hundreds of chips. Chip components, specifically transistors, are etched into the silicon. - PowerPoint PPT PresentationTRANSCRIPT
Applying Machine Learning
to Circuit Design
David HettlingerAmy Kerr
Todd Neller
Channel Routing Problems (CRPs)
Chip DesignEach silicon wafer contains hundreds of chips.Chip components, specifically transistors, are etched into the silicon.Often groups of components need to be wired together.
A Simple CRP Instance3 4 1 0 2 4
4 2 0 3 1 2
Net number 4
Silicon Channel
One Possible Solution
0
2
3 4
5
1
3 4 1 0 2 4
4 2 0 3 1 2Goal: Decrease the number of horizontal tracks.
CRPs: Constraints
Horizontal: Horizontal segments cannot overlap.Vertical : If subnet x has a pin at the top of column a, and subnet y has a pin at the bottom of column a, then subnet x must be placed above subnet y.
Simulated Annealing (SA)Background
“Annealing” came from a process blacksmiths use.Metal is heated then cooled slowly to make it as malleable as possible.Statistical physicists developed SA.
SA Problem ComponentsDefinable states: The current configuration (state) of a problem instance must be describable.New state generation: A way to generate new states.Energy function: A formula for the relative desirability of a given state.Temperature and annealing schedule: A temperature value and a function for changing it over time.
A Visual Example
Local Minimum Global
Minimum
Applying Simulated Annealing to CRPs
Definable states: The partitioning of subnets into groups.
New states generation: Change the grouping of the subnets.
Energy function: The number of horizontal tracks needed to implement a given partition.
Temperature and annealing schedule: Start the temperature just high enough to accept any new configuration. As for the annealing schedule, reinforcement learning can help find that.
A Simple CRP Example
Start State of a CRP Instance Partition Graph of this State
A Simple CRP ExampleStates 1 and 2
Partition Graphs of theses States
A Simple CRP ExampleStates 1, 2 and 3 Partition Graphs of these States
A Simple CRP ExampleStarting through Ending States Partition Graphs of these States
A Generated CRP InstanceStart State A Solution
15 Horizontal Tracks
12 Horizontal Tracks
SA Problem ComponentsDefinable states: The current configuration (state) of a problem instance must be describable.New state generation: A way to generate new states.Energy function: A formula for the relative desirability of a given state.Temperature and annealing schedule: A temperature value and a function for changing it over time.
The Drunken Topographer
Imagine an extremely hilly landscape with many hills and valleys high and lowGoal: find lowest spotMeans: airlift a drunk!Starts at random spotStaggers randomlyMore tired rejects more uphill steps
Super-Drunks, Dead-Drunks, and Those In-
BetweenThe Super-Drunk never tires
Never rejects uphill stepsHow well will the Super-Drunk search?
The Dead-Drunk is absolutely tired
Always rejects uphill stepsHow well will the Dead-Drunk search?
Now imagine a drunk that starts in fine condition and very gradually tires.
Traveling Salesman Problem
Have to travel a circuit around n cities (n = 400)Different costs to travel between different cities (assume cost = distance)State: ordering of cities (> 810865 orderings for 400 cities)Energy: cost of all travelStep: select a portion of the circuit and reverse the ordering
Determining the Annealing Schedule
The schedule of the “cooling” is critical
Determining this schedule by hand takes days
Takes a computer mere hours to compute!
Reinforcement Learning Example
Goal: Ace a class
Trial & Error: study for various amts. of timeShort term reward: exam grades, amt
free time Long term rewards:
Grade affects future opportunities: i.e. whether we can slack off
laterOur semester grades (goal is to max. this!)
Need to learn how long we need to study to get an A
Reinforcement Learning (RL)
Learns completely by trial & error
Receives rewards for each action
Goal: maximize long-term numerical reward
1. Immediate reward (numerical)
2. Delayed reward: actions affect future situations & opportunities for future
rewards
No preprogrammed knowledge
No human supervision/mentorship
RL: The Details
Agent = the learner (i.e. the student)
Environment = everything the agent cannot completely control.
Includes reward functions (i.e. grade scale)Descript. of current state (i.e. current average)
Call this description a “Sensation”
Agent
Environment
Sensation Reward Action
RL: Value FunctionsUse immediate & delayed rewards to evaluate desirability of actions/learn taskValue function of a state-action pair, Q(s,a) The expected reward for taking action a
from state s
Strategy: Most of the time, choose the action that corresponds to the maximal Q(s,a) value for the state.
Remember, must explore sometimes!
Includes immediate & delayed reward
RL: Q-Functions
Agent tries various actions.
We must learn Q(s,a)
To start, set Q(s,a) = 0 for all s, a.
Each time experiences action a from state s, updates estimate of Q(s,a) towards the actual reward experienced
If usually pick the action a’ that has the maximal Q-value for that state
max. total reward optimal performance
Example: Grid WorldCan always move up, down, right, left
Board wraps around
Goal
Start
Get to goal in as few steps as
possible.
Reward = ?
Meaning of Q?
What are the optimal paths?
Applying RL to CRP
Can learn an approx. optimal annealing schedule using RL
Reward function:
Penalized for the amount of time used to find this better configuration Computer learns to find an approx
optimal annealing schedule in a time-efficient manner.
Program self-terminates!
Rewarded for reaching better configuration
ConclusionsCRP is an interesting but complex problemSimulated Annealing helps us solve CRPsSimulated Annealing requires annealing schedule (how to change temperature over time)Reinforcement learning – which is just learning through trial & error – lets a computer learn an annealing schedule in hours instead of days.
Any Questions?