easecam: an energy and storage efficient tcam-based ip-lookup architecture rabi mahapatra texas...
Post on 21-Dec-2015
215 views
TRANSCRIPT
EaseCAMEaseCAM: An Energy And : An Energy And Storage Efficient TCAM-based Storage Efficient TCAM-based
IP-Lookup ArchitectureIP-Lookup Architecture
Rabi MahapatraRabi Mahapatra
Texas A&M University;Texas A&M University;
OverviewOverview
IntroductionIntroduction Research GoalResearch Goal Proposed approachProposed approach ResultsResults Conclusion & Future workConclusion & Future work
IntroductionIntroduction
IP Lookup Packet Queue
DRAM Routing Table
Header Processing
HdrData Data Hdr
IP Address Next Hop
IntroductionIntroduction HW and SW solutions for IP lookupHW and SW solutions for IP lookup
Software solutions unable to match link speed.Software solutions unable to match link speed. Hardware solutions can accommodate today’s link speeds Hardware solutions can accommodate today’s link speeds
TCAMs most popular hardware device TCAMs most popular hardware device Consume up to 15 W/chip, (4-8 chips)Consume up to 15 W/chip, (4-8 chips) Increased cooling costs and fewer portsIncreased cooling costs and fewer ports
Current ApproachCurrent Approach
Power Reduction in TCAMPower Reduction in TCAM Partitioning of TCAM Array Partitioning of TCAM Array [Infocom’03, Hot Interconnect’02][Infocom’03, Hot Interconnect’02]
Compaction (minimization) Compaction (minimization) [Micro’02][Micro’02]
Update techniques Update techniques [Micro’02][Micro’02]
Routing updateRouting update TCAM updatesTCAM updates
Bottleneck with existing Bottleneck with existing approachesapproaches
Power reductionPower reduction Number of entries enabled is not boundedNumber of entries enabled is not bounded Does not avoid storing redundant informationDoes not avoid storing redundant information
UpdateUpdate Minimization techniques are not incrementalMinimization techniques are not incremental Update time is not independent of routing table size Update time is not independent of routing table size
MotivationMotivation
Solution for bounded and reduced power Solution for bounded and reduced power consumptionconsumption
Truly incremental Routing and TCAM Truly incremental Routing and TCAM updateupdate
ContributionsContributions
A pipelined architecture for IP LookupA pipelined architecture for IP Lookup New prefix properties (prefix aggregation New prefix properties (prefix aggregation
and prefix expansion)and prefix expansion) Upper bound on number of entries Upper bound on number of entries
enabled (256 x 3)enabled (256 x 3) Novel Page filling, memory management Novel Page filling, memory management
and incremental update techniquesand incremental update techniques
Solution: Prefix propertiesSolution: Prefix properties
Prefix AggregationPrefix Aggregation
128.194.1.1/32128.194.1.2/32128.194.1.8/30128.194.1.16/28
128.194.1.0/24
128.194.1.0/24 is the LCS for the given set of prefixes 128.194.1.0/24 is the LCS for the given set of prefixes (rounded to nearest octet)(rounded to nearest octet)
Prefixes aggregated based on LCS mostly have the same Prefixes aggregated based on LCS mostly have the same next hopnext hop
Gives a bound on the number of prefixes minimized (256)Gives a bound on the number of prefixes minimized (256)
Solution: Prefix propertiesSolution: Prefix properties
RouterRouterTotal Total PrefixesPrefixes
Max Prefix Max Prefix CompactionCompaction
Prefix Prefix Aggregation Aggregation based based CompactionCompaction
ATT-ATT-CanadaCanada
112412112412 5447654476 5783757837
BBN-BBN-planetplanet
124538124538 6964669646 7150071500
TABLE I. Comparision of prefix compaction using prefix aggregation property and
Espresso II for attcanada and bbnplanet router
Solution: Prefix PropertiesSolution: Prefix Properties
Prefix expansionPrefix expansion Prefixes having same length can be minimizedPrefixes having same length can be minimized To increase minimization, extend prefixes of different To increase minimization, extend prefixes of different
length to nearest octet by adding don’t-careslength to nearest octet by adding don’t-cares Extending to nearest octet useful for incremental Extending to nearest octet useful for incremental
updateupdate
10010110110111011111
1011
100101XX1011011X1011111X1011XXXX
Solution: Prefix propertiesSolution: Prefix properties
Overlapping prefixesOverlapping prefixes Prefix length < 8 not present in routing tablePrefix length < 8 not present in routing table Number of matching prefixes for IP address is ≤ Number of matching prefixes for IP address is ≤ 2525 Property is used to selectively enable bounded Property is used to selectively enable bounded
number of entries in TCAM, number of entries in TCAM, (256 x 3)(256 x 3)
Solution: ArchitectureSolution: Architecture
2 level architecture, w1 2 level architecture, w1 bits in 1bits in 1stst level and 32- level and 32-w1 in 2w1 in 2ndnd level level
Segment size Segment size corresponding to 1corresponding to 1stst w(8) bits is variablew(8) bits is variable
Power bounded by Power bounded by segment sizesegment size
2nd Level
1 2
127 128
254255
. .
. .
. .
24 bits
128.x
1.x
2.x
Variable Sized Segment
W1=8 bits
127.x
254.x
255.x
1st Level
Segmented Architecture for routing lookup using TCAM.
Solution: ArchitectureSolution: Architecture
Memory CompactionMemory Compaction Apply prefix properties Apply prefix properties
to remove to remove redundanciesredundancies
Apply pruning, prefix Apply pruning, prefix aggregation and aggregation and minimization in minimization in successionsuccession
Put all prefixes < w1 Put all prefixes < w1 into bucket (Rarely into bucket (Rarely occurring prefixes)occurring prefixes)
Number of Entries After Compaction
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
1st level bit length (w1)
Ent
ries
bbnplanetattcanada
Total number of entries after compaction
Solution: ArchitectureSolution: Architecture
Paged TCAM architecturePaged TCAM architecture Group the prefixes of length > w1 based on their LCSGroup the prefixes of length > w1 based on their LCS The LCS values (The LCS values (cubescubes)) that that covercover the prefixesthe prefixes The The cubescubes now correspond to the now correspond to the page idpage id Prefixes covered by Prefixes covered by cubecube are stored in actual pages are stored in actual pages
(Pages formed using LCS as page-id can result in under-utilization)(Pages formed using LCS as page-id can result in under-utilization)
Architecture Block DiagramArchitecture Block Diagram
PageTable 1
PageTable I
PageTable N
PageI+1
PageI
PageI+Cmax
Bucket
. . . . . .
. . . . . .
. . . . . . . . . . .
Comparator
(32-w1) bits
32 bits
32 bits
32 bits
(N*)
IP address
IP address
IP address Enable Line
. . . . . .
. . . . . .
. . . . . .
Pages formed using LCS as page-id can result in under-utilization)Pages formed using LCS as page-id can result in under-utilization)
How to avoid Under-utilization?How to avoid Under-utilization?
LCS aggregationLCS aggregation1.1. Aggregate prefixes Aggregate prefixes
having different LCS having different LCS by modifying the cubeby modifying the cube
2.2. Set page-size to Set page-size to optimal value – avoid optimal value – avoid too large and small too large and small pagespages
ObserveObserve: The maximum : The maximum size of page can be size of page can be 256, based on the 256, based on the above propertyabove property
101
100
10*
Solution: Page Filling AlgorithmSolution: Page Filling Algorithm
Page Filling Heuristics (2)Page Filling Heuristics (2)1.1. Generates cubes such that it covers maximum Generates cubes such that it covers maximum
prefixes and page size < 256prefixes and page size < 256
2.2. Aggregate the page ID’s in the page tables and Aggregate the page ID’s in the page tables and store them in comparators for a 0store them in comparators for a 0thth level lookup level lookup
3.3. Find the total memory consumed (pages, page Find the total memory consumed (pages, page tables and comparator) for different values of w1 tables and comparator) for different values of w1
4.4. Get optimal value of w1 and page size Get optimal value of w1 and page size β β for which for which total memory is the leasttotal memory is the least
Solution: Page FillingSolution: Page Filling
Page filling heuristics ensures:Page filling heuristics ensures: No page has more than β*γ entries, where γ No page has more than β*γ entries, where γ
is the page fill-factoris the page fill-factor Number of cubes that cover all the prefixes Number of cubes that cover all the prefixes
are minimumare minimum Total memory consumption is the least for a Total memory consumption is the least for a
specific value of w1 and βspecific value of w1 and β
Architecture Block DiagramArchitecture Block Diagram
Power Enabled blocks in EaseCAM
PageTable 1
PageTable I
PageTable N
PageI+1
PageI
PageI+Cmax
Bucket
. . . . . .
. . . . . .
. . . . . . . . . . .
Comparator
(32-w1) bits
32 bits
32 bits
32 bits
(N*)
IP address
IP address
IP address Enable Line
. . . . . .
. . . . . .
. . . . . .
Solution: ArchitectureSolution: Architecture
BucketBucket Prefixes of size < w1 are stored in bucketPrefixes of size < w1 are stored in bucket Word length of bucket is 32Word length of bucket is 32 Either bucket or pages are searched during Either bucket or pages are searched during
each lookup in the 2each lookup in the 2ndnd level level
Solution: ArchitectureSolution: Architecture
Empirical model for memoryEmpirical model for memory α: fraction of total entries in the bucket α: fraction of total entries in the bucket αfαf : bucket fill factor: bucket fill factor γ: page fill factorγ: page fill factor Cmax: number of page ids in the page tableCmax: number of page ids in the page table N: the number of entries N: the number of entries Pagemax: total number of pages Pagemax: total number of pages βw1: represents the optimal page size βw1: represents the optimal page size
Mimimum memory requirementMimimum memory requirement = βw1* Pagemax * (32-w1)/32 + Pagemax + = βw1* Pagemax * (32-w1)/32 + Pagemax +
Pagemax/Cmax + N*α/ αf Pagemax/Cmax + N*α/ αf
Incremental UpdatesIncremental Updates
100s updates/sec and 10 updates/sec after 100s updates/sec and 10 updates/sec after routing flapsrouting flaps
InsertionInsertion If length of prefix > w1,If length of prefix > w1,
1.1. Minimize the prefix and find the new cubeMinimize the prefix and find the new cube
2.2. Number of prefixes minimized < 256Number of prefixes minimized < 256
3.3. Update the page table and comparator if requiredUpdate the page table and comparator if required
4.4. Update the TCAM with changed entriesUpdate the TCAM with changed entries
5.5. TCAM insertion time and minimization time is time TCAM insertion time and minimization time is time boundedbounded
Solution: Incremental UpdateSolution: Incremental Update
DeletionDeletion Delete the prefix from TCAMDelete the prefix from TCAM Update the page table entry and comparator if Update the page table entry and comparator if
requiredrequired Total number of prefixes minimized < 256Total number of prefixes minimized < 256 TCAM update time is also boundedTCAM update time is also bounded
Solution: Incremental UpdateSolution: Incremental Update
RouterRouterTotal Total
PrefixesPrefixes
Micro’02Micro’02 Approach Approach Proposed Proposed
Approach Approach
SizeSizeTime Time (sec)(sec)
SizeSizeTimeTime(sec)(sec)
attcanadaattcanada112412112412 1514615146 1098.471098.47 223223 0.0050.005
bbnplanetbbnplanet124538124538 75807580 63.0463.04 256256 0.0060.006
Comparision of incremental update time
Solution: Memory ManagementSolution: Memory Management
Managing page overflowManaging page overflow ReasonReason: Lower value of γ.: Lower value of γ. Pages with same cube are recomputed Pages with same cube are recomputed Free pages available in TCAM are usedFree pages available in TCAM are used Comparators are also updated when requiredComparators are also updated when required
ResultsResults
Power consumption per lookupPower consumption per lookup
Pow er consum ption bound (bbnplanet)
0
100
200
300
400
500
600
700
800
8 9 10 11 12 13 14 15 161st leve l bit length (w 1)
Po
we
r (m
W)
Bucket
Pages
Pow er Consum ption bound (attcanada)
0
100
200
300
400
500
600
700
800
8 9 10 11 12 13 14 15 161st leve l bit length (w 1)
Po
wer
(m
W)
Bucket
Pages
bbnplanet router attcanada router
ResultsResults
Case studyCase study Memory requirements (γ=1 and α=1) Memory requirements (γ=1 and α=1)
RouterRouter Raw data (entries)Raw data (entries)After Compaction After Compaction
(entries)(entries)
Effect of Effect of Architecture Architecture
(entries)(entries)
AttcanadaAttcanada 112412112412 5783757837 5018250182
BbnplanetBbnplanet 124538124538 7150071500 5988359883
Reduction in memory requirements
Results: Access timeResults: Access time
Pre-estimation using Cacti 3.0 on CAM Pre-estimation using Cacti 3.0 on CAM structurestructure
RouterRouterRaw data Raw data (ns)(ns)
After After Compaction Compaction (ns)(ns)
Effect of Effect of Architecture Architecture (ns)(ns)
AttcanadaAttcanada 240.53240.53 81.8781.87 4.394.39
BbnplanetBbnplanet 265.32265.32 117.9117.9 4.434.43
Reduction in access time
Results: PowerResults: Power
Pre-estimation using Cacti 3.0 on CAM structurePre-estimation using Cacti 3.0 on CAM structure
Reduction in power
RouterRouterRaw dataRaw data
(W)(W)After After
Compaction (W)Compaction (W)
Effect of Effect of Architecture Architecture
(W)(W)
AttcanadaAttcanada 14.35W14.35W 7.38W7.38W 0.135W0.135W
BbnplanetBbnplanet 15.9W15.9W 12.31W12.31W 0.12W0.12W
ConclusionConclusion
Significant reduction in memory Significant reduction in memory consumption based on prefix compactionconsumption based on prefix compaction
Pipelined architecture to store prefixes to Pipelined architecture to store prefixes to achieve bounded power consumptionachieve bounded power consumption
Efficient memory management and Efficient memory management and incremental update techniquesincremental update techniques
Future workFuture work
Apply Cacti model to TCAM structureApply Cacti model to TCAM structure Identify/design low-power TCAM cellIdentify/design low-power TCAM cell Consider classification together with IP-Consider classification together with IP-
lookuplookup Fast on-chip logic minimizationFast on-chip logic minimization Explore parallel architectures & algorithms Explore parallel architectures & algorithms
for IP processing.for IP processing.
Thank You !!Thank You !!
Questions?Questions?