![Page 1: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/1.jpg)
Sailing Basics
wind
Buoy
1
![Page 2: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/2.jpg)
My moto:
Sailing - wind shift
wind
Buoy
2
a
a
a
![Page 3: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/3.jpg)
Sailing competitiongetting first
Boat 1
Boat 2
wind
Buoy
- What is the Strategy of Boat 1?
- What is the Strategy of Boat 2?
My Moto: Do not follow Invent3
![Page 4: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/4.jpg)
25th American Cap – September 23rd 1983 (Trophy was held by NYYC from 1857)
Liberty I against Australia II7 rounds race; Status: Score 3:0 to Liberty I, 4th round started
4
wind
Buoy
Start line
![Page 5: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/5.jpg)
Uri WeiserTechnion
Haifa, Israel
Potential future research in computing
Heterogeneous systems’ optimizationmemory subsystems - Process-in-Storage when?
1The talk covers research done by: Prof. Y. Etsion, Prof. I. Keidar, Prof. A. Kolodny, T. Morad, , Prof. A. Mendelson, G. Shmron, Prof. U. Weiser
![Page 6: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/6.jpg)
2
The talk today
The environment changes
Slow down in Moore's law
Slow down in the ability to enhance single core performance
Heterogonous systems
Big Data
Potential change in the way we handle data
new thinking about moving data?
Heterogeneous system optimization
Big Data --- data handling
Is it data movement?
Is it bandwidth?
Example
![Page 7: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/7.jpg)
2
Environment changes Slow down in process technology
Slow down in Single thread core performance trend
Power limitation
!
!!
!
![Page 8: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/8.jpg)
2
Environment changes Slow down in process technology
Slow down in Single thread core performance trend
Power limitation
The Era of Heterogeneous systems (we are there already)
How to handle heterogeneousity?
Heterogeneous vs. general purpose engine?
Size, power, energy, location of the accelerators?
Application phase specific
![Page 9: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/9.jpg)
2
A New Architecture Avenues?
Heterogeneous computing
The Era of Heterogeneous systems
HW/SW to fit application
Dynamic tuning
Accelerators
Optimizations: performance, energy efficiency
Big Data = big
In general non repeated access to all the
“Big Data”
What are the implications?
![Page 10: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/10.jpg)
Heterogeneous computing
Performance/power
Apps range
Continue performance trend by via Heterogeneous systems
Perf
orm
an
ce/p
ow
er
Accelerators
3
![Page 11: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/11.jpg)
Heterogeneous Computing
Pe
rfo
rman
ces/
Po
we
r
General Purpose
Accelerator
11
![Page 12: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/12.jpg)
Heterogeneous Systems’
Environment
Environment with limited resources
Need to optimize system’s targets within
resource constrains
Resources may be:- Power, energy, area, space, $
System's targets may be:- Performance, power, energy, area, space, $
12
![Page 13: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/13.jpg)
Heterogeneous system
Heterogeneous system design under resource
constrainthow to divide resources (e.g. area, power, energy) to achieve maximum
system’s output (e.g. performance, throughput, energy savings)
Accelerator target (an example): Minimize execution time under Area constraint
𝑎1𝑎2
𝑎3
𝑎𝑛
𝑎4
𝑨 =
𝒊=𝟏
𝒊=𝒏
𝒂𝒊
t2 t3 tnt1
time
ti = execution time of an application’s section (run on a reference computing system)
Example:
13
![Page 14: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/14.jpg)
MultiAmdahl:
t1* F1(a1)+ t2* F2(a2) + + tn* Fn(an)
a4
𝑎1
𝑎2
𝑎3
𝑎𝑛
t2 t3 tnt1
F1(a1) F2(a2) Fn(an)
T =
A = a1 + a2 + a3 + … + an
Target: Minimize T under a constraint A
14
![Page 15: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/15.jpg)
MultiAmdahl:
Optimization using Lagrange
multipliersMinimize execution time (T)
under an Area (a) constraint
t2 t3 tnt1
F1(a1) F2(a2) Fn(an)
15
tj F’j(aj) = ti F’i(ai)
F’= derivation of the accelerator function
ai = Area of the i-th accelerator
ti = Execution time on reference computer
![Page 16: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/16.jpg)
MultiAmdahl Framework
Applying known techniques* to
new environments
Can be used during system’s
definition and/or dynamically to
tune system
* Gossen’s second law (1854), Marginal utility, Marginal rate of substitution (Finance)
16
![Page 17: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/17.jpg)
Example: CPU vs. Accelerators
Future GP CPU size vs. transistor budget growth
Test case: 4 accelerators and GP (big) CPU
Applications: evenly distributed benchmarks mix w/ 10% sequential code
Heterogeneous Insight:
In an increased-transistor-budget-environment,
General Purpose (big) CPU importance will grow 17
![Page 18: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/18.jpg)
Example: CPU vs. Accelerators
GP CPU size vs. power budget
Test case: 4 accelerators and GP (big) CPU
Applications: evenly distributed benchmarks mix w/ 10% sequential code
18
Heterogeneous Insight:
In a decreased-power-budget-environment,
Accelerators importance will grow
![Page 19: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/19.jpg)
What is F(ai)?
You can look at it as the acceleration vs.
area (or energy, power etc.) BUT
There are more parameters that impact the
function F(ai) e.g. LOCATION*
* example as part of the “Process In Storage”
19
![Page 20: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/20.jpg)
2
Big Data
90% of data in the world has been created in the
last 2 years
By 2020 world’s data will grow 50 times from today
Change the way we handle
Big Data processing
![Page 21: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/21.jpg)
2
Big Data Environment
Big Data = big
In most of the cases some of the data is irrelevant
(Extract Transform and Load (ETL)) for the solution
or its relevancy is simple (e.g. wordcount)
In general there is a non repeated access to all the
“Big Data”
What are the implications?
![Page 22: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/22.jpg)
4
A New Architecture Avenues in
Big Data Environment
Heterogeneous computing – ”tuning” HW to
respond to specific needs
example: Big Data memory access pattern
Potential savings
Data Movements
Bandwidth
![Page 23: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/23.jpg)
Heterogeneous computing :
Application Specific Accelerators
Performance/power
Apps range
Continue performance trend by tuned architecture to bypass current technological hurdles
Perf
orm
an
ce/p
ow
er
Accelerators
3
Tuned architectures
Apps behavior
![Page 24: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/24.jpg)
6
Reduction of system’s energy in
Big Data environment
understand where energy is wasted
Identify the energy hungry parts and performance
bottleneck
Provide a TAILORED solution for Data Center*
usage
* It would not be simple
![Page 25: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/25.jpg)
Input: Unstructured data
Big Data usage of DATA
7
Read Once
Non-Temporal
Memory Access
Funnel
beta=BWout
BWin
![Page 26: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/26.jpg)
Structuring
Input: Unstructured data
Structured data (aggregation)
A
ML Model creation
Data structuring = ETL
C
B
C Model usage @ client
8
Machine Learning
![Page 27: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/27.jpg)
9
Does Big Data exhibit special
memory access pattern?
It probably should since Revisiting ALL Big Data items will cause huge/slow
data transfers from Data sources
There are 2 access modes of memory operations:
Temporal Memory Access
Non-Temporal Memory access
Many Big Data computations exhibit a Non-Temporal
Memory-Accesses and/or Funnel operation
![Page 28: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/28.jpg)
Non-Temporal Memory access Initial analysis: Hadoop-grep Single Memory Access Pattern
~50% of Hadoop-grep unique memory references are single access
10
![Page 29: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/29.jpg)
Non-Temporal Memory AccessesPreliminary Results
WordCount:
Access to Storage:Non-temporal locality
Sort:
Access to Storage:NO Non-temporal locality
0
10000
20000
30000
40000
50000
60000
70000
80000
0 10 20 30 40 50
Time [s]
WordCount I/O Utilization
0
20000
40000
60000
80000
100000
120000
0 200 400 600 800 1000 1200
Time [s]
SORT I/O
Access rate
[KB/s]
Time
Time
11
Access rate
[KB/s]
![Page 30: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/30.jpg)
Current systems
12
• Memory subsystem is tuned for “Temporal Memory Access”
DRAM – tuned for repeated page access
Cache – tuned for repeated cache block access
L1$
L2$
LLC Cache
DRAM
NV Storage
Registers
3GB/sec
25GB/sec
500GB/sec
TB/sec
Core
“C
ach
es
”
However, many Big Data applications exhibit
Non-Temporal Memory Accesses (NTMA)
![Page 31: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/31.jpg)
13
Where energy is wasted?
• DRAM
• Limited BW
![Page 32: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/32.jpg)
From: Mark Horowitz, Stanford “Computing’s Energy Problems”
From: Bill Dally (nVidia and Stanford), Efficiency and Parallelism, the challenges of future computing
14
![Page 33: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/33.jpg)
Energy:
DRAM
15
![Page 34: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/34.jpg)
Memory Subsystem - copies
L1$
L2$
LL Cache
DRAM
NV Storage
RegistersKBs
10’s KBs
MBs
TBs
GBs
10’s MBs
3GB/sec
25GB/sec
500GB/sec
TB/sec
Size
Core
BW
- Source
Copy 1 (main memory)
Copy 2 (LL Cache)
Copy 3 (L2 Cache)
Copy 4 (L1 Cache)
Copy 5 (Registers) - Destination
16
![Page 35: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/35.jpg)
Memory Subsystem – DRAM bypass == DDIO
L1$
L2$
LL Cache
DRAM
NV Storage
Registers
3-20GB/sec
25GB/sec
500GB/sec
TB/sec
Core
BW
- Source
Copy 1 (main memory)
Copy 2 (LL Cache)
Copy 3 (L2 Cache)
Copy 4 (L1 Cache)
Copy 5 (Registers) - Destination
Potential savings:
@ 0.5n J/B (DRAM)
10 – 20 GB/s NV BW
5W – 10W
Reference: “Optimizing Read-Once Data Flow in Big-Data Applications”
Morad, Ghomron, Erez, Weiser, Kolodny, in Computer Architecture Letters Journal 2016 17
![Page 36: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/36.jpg)
Initial Experiment
Example program: read file from disk and XOR all values
DDIO-aware code on a real systemSmall buffer (fit into 2 ways of LLC)
Low latency from write to read (avoid evictions)
Zero-copy (O_DIRECT flag)
Bypass OS page cache (O_DIRECT flag)
Run code on chip that is connected to the SSD (OS affinity)
Compare system with DDIO enabled and DDIO disabled
Measure runtime, power and energy
18
![Page 37: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/37.jpg)
BandwidthWhen should we use Funnel at the Data source
19
![Page 38: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/38.jpg)
Memory Hierarchy is Optimized for A: Bandwidth issue System are built for Temporal Locality
20Highest Bandwidth
L1$
L2$
LLC Cache
DRAM
NV Storage
RegistersKBs
10’s KBs
MBs
TBs
GBs
10’s MBs
3-20GB/sec
25GB/sec
500GB/sec
TB/sec
Size
Core
BW Existing
BW
NTMA
Desired BW
![Page 39: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/39.jpg)
# of cores
Bandwidth
[MB/s]
# of cores
CPU
utilization
[%]
Bandwidth
[MB/s]Read Once – Non-Temporal Memory Accesses
# of cores
Bandwidth[MB/s]
CPU
utilization
[%]
Temporal Memory Accesses
# of cores
Bandwidth
[MB/s]
Hint: Memory access per operation
B: Memory access per operation impact BW
CPU Utilizations
21
![Page 40: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/40.jpg)
Solution:
Flow of “Non-Temporal Data Accesses”
Core
L1$
L2$
LLC Cache
DRAM
NV Storage
Registers
The Funnel
22
Use Funnel when Bandwidth bottleneck occurs
- “high” memory accesses per Instruction
- Limited BW
- Non temporal locality memory access
*private communication with: Moinuddin Qureshi
![Page 41: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/41.jpg)
“Funnel”ing “Read-Once” data in storage
*Kang, Yangwook, Yang-suk Kee, Ethan L. Miller, and Chanik Park. "Enabling cost-effective data processing with smart ssd." In Mass Storage Systems
and Technologies (MSST), 2013 IEEE 29th Symposium on, pp. 1-12. IEEE, 2013.
**K. Eshghi and R. Micheloni. “SSD Architecture and PCI Express Interface”
Typical SDD architecture*
23
![Page 42: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/42.jpg)
Analytical model of the Funnel42
Post
process
Bandwidth (BW) IN
Bandwidth BW OUT
Funnel
B
B
= BWOUT/BWIN
24
![Page 43: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/43.jpg)
Purposed Architecture
43
PCIe
TLB
CPU performs NTMA and TMA work
SSD Storage
B
Funnel
B=Bandwidth
Baseline Configuration
PCIe
TLB
2,LcE
CPU performs TMA workSSD performs NTMA work
B
Funnel
Funnel Configurations
B
B B
25
![Page 44: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/44.jpg)
Funnel Performance44
Pe
rfo
rma
nce
imp
rove
me
nt
CPU becomes
bottleneck
CPU becomes
bottleneck
𝟏
𝐏𝐂𝐈𝐞 𝐁𝐖
𝟏
𝐒𝐒𝐃 𝐁𝐖
PCIeTL
B
CPU performs NTMA
and TMA work
SSD Storage
B
Funnel
B=Bandwidth
PCIe
TLB
2,LcE
CPU performs: TMA
workSSD performs NTMA
work
B
Funnel
beta
beta
Pe
rfo
rma
nc
e
26
![Page 45: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/45.jpg)
Funnel energy
Funnel
improvement
CPU becomes the
bottleneck
Funnel processor
overhead
PCIeTL
B
CPU performs NTMA
and TMA work
SSD Storage
B
Funnel
B=Bandwidth
PCIe
TLB
2,LcE
CPU performs TMA
workSSD performs NTMA
work
B
Funnel
beta
En
erg
y
CPU becomes the
bottleneck
27
![Page 46: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/46.jpg)
Solution: ?
Non-Temporal Memory Accesses should be
processed as close as possible to the data source
Data that exhibit Temporal Locality should use
current Memory Hierarchy
Use Machine Learning (context aware*) to distinguish
between the two phases
Open questions:SW model
Shared Data
HW implementation
Computational requirement at the “Funnel”
*Reference: “Semantic locality and Context based prefetching” Peled, Mannor, Weiser, Etsion in ISCA 201530
![Page 47: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/47.jpg)
Summary
Lots of potential icebreaking potential research
Hetro
Big Data related
Memory access is a critical path in computing
Funnel should be used for:Reduction of Data movement
Free up system’s memory resources (re-Spark)
Solve the System’s BW issues for “Read Once” cases
Simple-energy-efficient engines at the front end
Issues
…
31
![Page 48: Moving the Needle - samos-conference.com · 25th American Cap –September 23rd 1983 (Trophy was held by NYYC from 1857) Liberty I against Australia II 7 rounds race; Status: Score](https://reader034.vdocuments.net/reader034/viewer/2022042408/5f23ae1055332626f715e143/html5/thumbnails/48.jpg)
48