making openvx really “real time”

Post on 18-Apr-2022

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making OpenVX Really “Real Time”

Ming Yang1, Tanya Amert1, Kecheng Yang1,2, Nathan Otterness1, James H. Anderson1, F. Donelson Smith1, and Shige Wang3

1The University of North Carolina at Chapel Hill 2Texas State University

3General Motors Research

700 ms

A new approach for

graph scheduling

Shorter response time +

Less capacity loss

1. State of the art

2. Our approach

3. Future work

!6

!7 Source: https://www.khronos.org/openvx/

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Example OpenVX Graph

Native Camera Control

Downstream Application Processing

Graph-based architecture

Application Application

GPU FPGA DSP

Portability to diverse hardware

Does OpenVX really target “real-time” processing?

!8 Source: https://www.khronos.org/openvx/

1. It lacks real-time concepts

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Example OpenVX Graph

Native Camera Control

Downstream Application Processing

2. Entire graphs = monolithic schedulable entities

Does OpenVX really target “real-time” processing?

!9 Source: https://www.khronos.org/openvx/

1. It lacks real-time concepts2. Entire graphs = monolithic schedulable entities

DC

BA

Does OpenVX really target “real-time” processing?

DCBA

!10 Source: https://www.khronos.org/openvx/

1. It lacks real-time concepts2. Entire graphs = monolithic schedulable entities

DC

BA

Monolithic schedulingTime

A …A B C D

Does OpenVX really target “real-time” processing?

Prior Work• OpenVX nodes = schedulable entities [23, 51]

!11

Coarse-grained scheduling

D

C

B

A

Coarse-grained schedulingTime

A

B

C

D …

Task A:

Task B:

Task C:

Task D:

DC

BA

Prior Work• OpenVX nodes = schedulable entities [23, 51]

!12

Coarse-grained scheduling

Remaining problems: 1. More parallelism to be explored2. Suspension-oblivious analysis was applied and

causes capacity loss.

Fine-Grained SchedulingThis Work

!14

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

!15

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Coarse-Grained Scheduling

!16

Time…

Task A:

Task B:

Task C:

Task D:

DC

BA

Suspension for GPU execution

Time

Task A:

Task E:

Task C:

Task D:

Task F:

Task G:

DC

A

GPU execution

E F G

Fine-Grained Scheduling

!17

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Deriving Response-Time Bounds for a DAG*

Step 1: Schedule the nodes as sporadic tasks

Step 2: Compute bounds for every node

Step 3: Sum the bounds of nodes on the critical path

!18

* C. Liu and J. Anderson, “Supporting Soft Real-Time DAG-based Systems on Multiprocessors with No Utilization Loss,” in RTSS, 2013.

!19

Deriving Response-Time Bounds for a DAG

DC

AB E F

!20

Deriving Response-Time Bounds for a DAG

DC

AB E F

CPU

GPU

!21

Deriving Response-Time Bounds for a DAG

DC

AB

E

F

Need a response-time bound analysis for GPU tasks

2048

2048

A system model of GPU Tasks

!22

τ1 = (3076,6,2,1024)

SM1

SM0

0 6 Time3

τi = (Ci, Ti, Bi, Hi)

Period

Number of blocks

Number of threads per block (or block size)

Per-block worst-case workload

C1

B1

H1 = 1024

T1

Response-Time Bounds Proof Sketch

!23

τk,j

rk,j + Rkτk

1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.

Response-Time Bounds Proof Sketch

!24

τk,j

rk,j + Rkτk

1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.

Time

Releases:

1 2 3 4 5

1

2

3

4

5

Without intra-task parallelism:

With intra-task parallelism:

Response-Time Bounds Proof Sketch

!25

τk,j

rk,j + Rkτk

SM1

SM0

Time

Rk

τk,j

rk,j

2. We then bound the unfinished workload from jobs released at

or before .rk,j

3. We prove the job finishes before

.rk,j + Rk

1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.

!26

1. Coarse-grained vs. fine-grained

2. Response-time bounds analysis

3. Case study

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!27

Resize Image Compute Gradients

Compute Orientation Histograms

Normalize Orientation HistogramsResize Image

Resize ImageCompute GradientsCompute

Gradients

Compute Orientation HistogramsCompute

Orientation Histograms

Normalize Orientation Histograms

Normalize Orientation Histograms

vxHOGCellsNodevxHOGCells

NodevxHOGFeature

sNodevxHOGFeaturesNodevxHOGFeaturesNode

vxHOGCellsNode

• Application: Histogram of Oriented Gradients (HOG)

CPU+GPU Execution (Coarse-Grained) GPU Execution (Fine-Grained)

• Application: Histogram of Oriented Gradients (HOG)

• 6 instances

• 33 ms period

• 30,000 samples

• Platform: NVIDIA Titan V GPU + Two eight-core Intel CPUs.

• Schedulers: G-EDF, G-FL (fair-lateness)

!28

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!29

Left is better

Time

% o

f sam

ples

50% samples have response time less

than 60 ms

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!30

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!31

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Half the average response time

[1] [2]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!32

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Half the average response time

One-third the maximum response time

[1] [2]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!33

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

[1] [2]

[3]

Half the average response time

One-third the maximum response time

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!34

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

[1] [2]

[3]

[3]

Half the average response time

One-third the maximum response time

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!35

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) N/A

[1] [2]

[3]

[3]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!36

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) N/A N/A

[1] [2]

[3]

[3]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!37

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) 542.39 N/A N/A

[1] [2]

[3]

[3]

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!38

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) 542.39 N/A N/A

[1] [2]

[3]

[3]

An alert driver takes 700 ms to react.

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

!39

FL: fair-lateness

[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47

Maximum Response Time (ms) 125.66 427.07 170091.06

Analytical Bound (ms) 542.39 N/A N/A

[1] [2]

[3]

[3]

An alert driver takes 700 ms to react.

• Fair-lateness-based scheduler is beneficial as it reduced node response times by up to 9.9%.

• Overheads of supporting fine-grained scheduling was 14.15%.

Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling

Conclusions

1. Fine-grained scheduling

2. Response-time bounds analysis for GPU tasks

3. Case study

!40

Future Work

1. Cycles in the graph

2. Other resource constraints

3. Schedulability studies

!41

Thanks!

top related