Measurement-based WCET Analysis for Multi-core Architectures

Measurement based WCET Analysis for Multi-
core 
Architectures
Hardik Shah
, Andrew Coombes, Andreas Raabe,
Kai Huang and Alois Knoll
Technische Universität München
, RapitaSystems Ltd
and fortiss GmbH
9/30/2024
2
M
u
l
t
i
-
c
o
r
e
s
 
i
n
 
s
a
f
e
t
y
 
c
r
i
t
i
c
a
l
 
s
y
s
t
e
m
s
 
 
Safety critical semiconductor industry
o
~2% of the whole market
o
Customized multi-cores an expensive solution
o
Even a trivial change (change of an arbiter) could
be prohibitively expensive
o
Unavailability/closed of cycle accurate simulator
for extracting “recorded trace”
9/30/2024
3
G
o
a
l
 
 
 
Provide an inexpensive multi-core solution
which is WCET analyzable
o
Use unmodified production chips
o
Use unmodified measurement based WCET
analysis tool suitable of single-core architectures
Preserve cost, performance and time-2-
market benefits
9/30/2024
4
C
o
n
t
r
i
b
u
t
i
o
n
 
 
A technique to measure the WCET of applications
on multi-core architectures using existing tools for
single-core 
architectures
o
Demonstrated by measuring WCET of multi-path
applications executed on a quad-core NIOS processor
using RapiTime timing analyzer
Unmodified production chip preserves cost and
performance benefits
Certifiable solution, in principle
9/30/2024
5
A
g
e
n
d
a
 
 
Related work
Background
Worst case interference aware WCET analysis
o
Optimizations
Test cases
Demo
Conclusion
9/30/2024
6
R
e
l
a
t
e
d
 
w
o
r
k
 
 
I
 
(
W
C
E
T
 
a
n
a
l
y
s
i
s
)
 
 
Static
o
Abstract architecture and application models
Hybrid measurement based [13 – Kirner et al]
o
RapiTime (more in background section)
Measurements in the presence of stress patterns
o
Only valid under PD arbiter [25]
9/30/2024
7
R
e
l
a
t
e
d
 
w
o
r
k
 
 
I
I
 
(
W
C
E
T
 
a
n
a
l
y
s
i
s
)
 
 
Others
o
Real-time calculus [19 - 
Pellizzoni et al.
]
o
Model checking [15 – Lv et al.]
Closest
o
[16 – Nowotsch et al.]  Monitoring and suspension
mechanism on shared resource usage
Limited accesses in a unit time
[15, 16, 19] Holistic approaches
9/30/2024
8
R
e
l
a
t
e
d
 
w
o
r
k
 
 
I
I
I
 
(
T
a
i
l
o
r
e
d
 
a
r
c
h
i
t
e
c
t
u
r
e
s
)
 
 
Time analyzable multi-cores
o
MERASA [29], parMERASA [28]
Repeatable time machines
o
PRET [14], CoMPSoC [11]
Probabilistic timing analysis
o
PROARTIS [7], PROXIMA [12, 20]
9/30/2024
9
Background: Emulation devices
 
 
Test chips with enhanced debug facilities
Produced in low numbers and supplied to OEMs
before the production chips are sold
Much cheaper to modify
9/30/2024
10
Background: Hybrid measurement based WCET analysis
 
 
On target measurements
Complex architectures are analyzable
Intrusive
Analyzed by
RapiTime
9/30/2024
11
Background: Hybrid measurement based WCET analysis
 
 
Used in RapiTime
timing analyzer
Instrumentation points
o
Time stamp trace
o
ET profile
Critical path detection
using MOETs of BBs
9/30/2024
12
Background: Round robin arbiter
 
 
W
L
rr
 = N x SS, B
L
rr
 = SS (N – total number of masters)
Experienced latency 
ϵ
 [B
L
, W
L
]
RapiTime approach is invalid for multi-cores
9/30/2024
13
Worst case interference augmented tracing
 
 
Adds a cache observer module in emulation device
of the production chip
Occurrence time of cache misses and their
experienced latencies are saved in a trace
9/30/2024
14
Offline trace manipulation
 
 
Artificially inflates the MOET of BB by appending
each cache miss with W
L
Artificial delay in
occurrence
9/30/2024
15
WCET calculation from manipulated trace
 
 
Single core
worst case path
Input to
RapiTime
9/30/2024
16
Worst case interference augmented tracing
 
 
Benefits
o
Does not alter production chips
Cost and performance benefits are preserved
o
Unmodified single-core tools
o
WCET of application under complex arbiters, e.g. CCSP
[4], PBS [27], can be measured
o
Analysis in isolation (incremental certification)
9/30/2024
17
Worst case interference augmented tracing
 
 
Drawbacks
o
Additional master interface - lower operating frequency
o
Trace size
9/30/2024
18
Optimized solution
 
 
No master interface
Only iPoint trace
o
Same as single core measurements
9/30/2024
19
Optimized solution
 
 
Benefits
o
Simple architecture and ultra low area footprint
o
No capacitive loading
emulation device can run at same frequency as the
production device
Drawbacks
o
WCET under complex arbiters, e.g. CCSP, PBS, is high
due to the assumption of W
L
 for each cache miss
9/30/2024
20
Overestimation of our approach
 
 
Intrusive (same as single core technique)
o
Measured WCET has impact of iPoint() executions
o
iPoint() modifies cache state
Deduction of iPoint() execution time is not enough !
o
Impacts history based branch predictors
Assumption of W
L
 under RR is not pessimistic
o
Highly interference vulnerable applications [26]
9/30/2024
21
Area overhead
 
 
Test architecture
o
NIOS II-F Quad core processors, 512 (4K) I$, D$
o
Optimized cache observer
o
On-chip shared SRAM
o
@ 125 MHz, Cyclone III FPGA
o
5% increase in area of emulation device (basic)
9/30/2024
22
Test results – 512 B I$ and D$
 
 
Multi-path applications from M
ä
lardalen
Benchmark suit
WCET
multi-core
WCET
single-core
Cost of porting
from single-
core to multi-
core
Instrumentation
overhead
9/30/2024
23
Test results – 4 KB I$ and D$
 
 
avg(WCET
ni
/WCET
nis
) reduces as cache size
increases due to the less number of cache misses
9/30/2024
24
Demo:
 
 
www6.in.tum.de/Main/Shah
9/30/2024
25
Future extension
 
 
Partitioned L2 cache
o
Observe dedicated
partition as well as the
shared partition
o
May not be considered
a COTS design
9/30/2024
26
Conclusion
 
 
A novel technique to measure WCET of
applications on multi-core architectures
Existing single-core analysis tools
Unmodified production-chips
o
Preserves cost and performance benefits
o
Trivial addition to emulation chips is required
Incremental certification
   
Thank you Questions?
Slide Note
Embed
Share

This research focuses on providing an inexpensive multi-core solution for safety-critical systems by utilizing unmodified production chips and measurement-based WCET analysis tools. The goal is to enable WCET analysis on multi-core setups while preserving cost, performance, and time-to-market benefits. Various related works and optimizations are discussed to achieve accurate worst-case interference-aware WCET analysis. The approach aims to address the challenges of costly customized multi-cores in safety-critical applications.

  • WCET analysis
  • Multi-core architectures
  • Safety-critical systems
  • Measurement-based analysis
  • Worst-case interference

Uploaded on Sep 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Measurement based WCET Analysis for Multi- core Architectures Hardik Shah, Andrew Coombes, Andreas Raabe, Kai Huang and Alois Knoll Technische Universit t M nchen, RapitaSystems Ltd and fortiss GmbH

  2. Multi-cores in safety critical systems Safety critical semiconductor industry o ~2% of the whole market o Customized multi-cores an expensive solution o Even a trivial change (change of an arbiter) could be prohibitively expensive o Unavailability/closed of cycle accurate simulator for extracting recorded trace 9/30/2024 2

  3. Goal Provide an inexpensive multi-core solution which is WCET analyzable o Use unmodified production chips o Use unmodified measurement based WCET analysis tool suitable of single-core architectures Preserve cost, performance and time-2- market benefits 9/30/2024 3

  4. Agenda Related work Background Worst case interference aware WCET analysis o Optimizations Test cases Demo Conclusion 9/30/2024 5

  5. Related work I (WCET analysis) Static o Abstract architecture and application models Hybrid measurement based [13 Kirner et al] o RapiTime (more in background section) Measurements in the presence of stress patterns o Only valid under PD arbiter [25] 9/30/2024 6

  6. Related work II (WCET analysis) Others o Real-time calculus [19 - Pellizzoni et al.] o Model checking [15 Lv et al.] Closest o [16 Nowotsch et al.] Monitoring and suspension mechanism on shared resource usage Limited accesses in a unit time [15, 16, 19] Holistic approaches 9/30/2024 7

  7. Related work III (Tailored architectures) Time analyzable multi-cores o MERASA [29], parMERASA [28] Repeatable time machines o PRET [14], CoMPSoC [11] Probabilistic timing analysis o PROARTIS [7], PROXIMA [12, 20] 9/30/2024 8

  8. Background: Emulation devices Test chips with enhanced debug facilities Produced in low numbers and supplied to OEMs before the production chips are sold Much cheaper to modify 9/30/2024 9

  9. Background: Hybrid measurement based WCET analysis Analyzed by RapiTime On target measurements Complex architectures are analyzable Intrusive 9/30/2024 10

  10. Background: Hybrid measurement based WCET analysis Used in RapiTime timing analyzer Instrumentation points o Time stamp trace o ET profile Critical path detection using MOETs of BBs 9/30/2024 11

  11. Background: Round robin arbiter WLrr = N x SS, BLrr = SS (N total number of masters) Experienced latency [BL, WL] RapiTime approach is invalid for multi-cores 9/30/2024 12

  12. Worst case interference augmented tracing Adds a cache observer module in emulation device of the production chip Occurrence time of cache misses and their experienced latencies are saved in a trace 9/30/2024 13

  13. Offline trace manipulation Artificial delay in occurrence Artificially inflates the MOET of BB by appending each cache miss with WL 9/30/2024 14

  14. WCET calculation from manipulated trace Input to RapiTime Single core worst case path 9/30/2024 15

  15. Worst case interference augmented tracing Benefits o Does not alter production chips Cost and performance benefits are preserved o Unmodified single-core tools o WCET of application under complex arbiters, e.g. CCSP [4], PBS [27], can be measured o Analysis in isolation (incremental certification) 9/30/2024 16

  16. Worst case interference augmented tracing Drawbacks o Additional master interface - lower operating frequency o Trace size 9/30/2024 17

  17. Optimized solution No master interface Only iPoint trace o Same as single core measurements 9/30/2024 18

  18. Optimized solution Benefits o Simple architecture and ultra low area footprint o No capacitive loading emulation device can run at same frequency as the production device Drawbacks o WCET under complex arbiters, e.g. CCSP, PBS, is high due to the assumption of WL for each cache miss 9/30/2024 19

  19. Overestimation of our approach Intrusive (same as single core technique) o Measured WCET has impact of iPoint() executions o iPoint() modifies cache state Deduction of iPoint() execution time is not enough ! o Impacts history based branch predictors Assumption of WL under RR is not pessimistic o Highly interference vulnerable applications [26] 9/30/2024 20

  20. Area overhead Architecture LEs With cache observer 13555 Without the cache observer 14272 Test architecture o NIOS II-F Quad core processors, 512 (4K) I$, D$ o Optimized cache observer o On-chip shared SRAM o @ 125 MHz, Cyclone III FPGA o 5% increase in area of emulation device (basic) 9/30/2024 21

  21. Test results 512 B I$ and D$ WCET multi-core WCET single-core Instrumentation overhead Cost of porting from single- core to multi- core Multi-path applications from M lardalen Benchmark suit 9/30/2024 22

  22. Test results 4 KB I$ and D$ avg(WCETni/WCETnis) reduces as cache size increases due to the less number of cache misses 9/30/2024 23

  23. Demo: www6.in.tum.de/Main/Shah 9/30/2024 24

  24. Future extension Partitioned L2 cache o Observe dedicated partition as well as the shared partition o May not be considered a COTS design 9/30/2024 25

  25. Conclusion A novel technique to measure WCET of applications on multi-core architectures Existing single-core analysis tools Unmodified production-chips o Preserves cost and performance benefits o Trivial addition to emulation chips is required Incremental certification Thank you Questions? 9/30/2024 26

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#