Persistent Memory Use with WHISPER by University of Wisconsin-Madison & Hewlett-Packard Labs

 
An Analysis of Persistent
Memory Use with WHISPER
 
S
anketh Nalli
, Swapnil Haria
Michael 
M. 
Swift
, Mark D. Hill
H
aris Volos*, Kimberly Keeton*
 
University of Wisconsin-Madison &
*Hewlett-Packard 
Labs 
(HP
L
)
WHISPER
 
Facilitate better system support for Persistent Memory
W
isconsin-
H
P Labs 
S
uite for 
Per
sistence
Discovered behaviors:
4% accesses 
to PM, 96% accesses to DRAM
5-50 epochs/tx
, contributed by memory allocation & logging
Re-referencing PM cachelines: 
Rare
 across threads,
common
 within a thread
2
WHISPER: 
research.cs.wisc.edu/multifacet/whisper
 
Outline
 
WHISPER: 
W
isconsin-
H
P Labs 
S
uite for 
P
ersistence
Analysis
Results
 
3
 
Persistent Memory is coming soon
 
Persistent Memory = NVM attached to CPU on memory bus
Offers low latency reads and persistent writes
Allows user-level, byte-addressable loads and stores
4
What guarantees do applications need ?
Durability
 = Content is available to user after failure
Consistency
 = Content is recoverable and usable after failure
5
PM
Pointer
Data
C
ACHE
Pointer
Data
Pointer
 
1 . Data update
followed by pointer
update in cache
 
2. Pointer is evicted
from cache to PM
 
3. Data lost on 
failure
,
dangling pointer persists
Achieving consistency
 
Ordering
 = Useful building block of consistency mechanisms
Epoch
 = Set of writes to PM guaranteed to be durable before
ANY writes to PM in following epochs become durable
Ordering primitives: sfence, mfence of x86-64
6
PM
Data
1 . Store data
update in cache
 
2 . Flush data
update to PM
 
3 . Store pointer
update in cache
 
4 . Flush pointer
update to PM
Data
flush
Data
Pointer
Data
Pointer
flush
 
Native
Application-specific
optimizations
Persistent Heaps
Atomic allocations,
type safety
PM-aware 
Filesystems
POSIX interface
Persistent Memory (PM)
Application
NVML
Mnemosyne
load/
store
TX
load/store
TX
7
ext4-DAX
read/write
VFS
PMFS
PM systems for consistency
What’s the problem ?
 
Lack of standard workloads slows research
Micro-benchmarks may not be representative
Partial understanding of how applications use PM
8
WHISPER
9
 
WHISPER: 
W
isconsin-
H
P Labs 
S
uite for 
P
ersistence
Analysis
Results
 
10
 
 
 
Outline
 
Identify writes to PM
 
PIN for userspace, 
mmiotrace
 for the kernel
On average, 
101 lines in applications that update PM
67 
line 
in the kernel
 that update PM
I
NSTRUMENT
I
DENTIFY
E
XECUTE
A
NALYZE
 
PM
Runtime
 
PM
Writes
 
PIN/mmiotrace
 
PM Application
 
Trace
 
Stats
 
11
Instrument writes to PM
I
NSTRUMENT
I
DENTIFY
E
XECUTE
A
NALYZE
PM 
Runtime
PM 
Writes
PIN/mmiotrace
PM Application
Trace
Stats
 
C macros capture all modes of updating PM and,
PM transaction start/end, cacheline flushes, fences
Example: Update and persist size of filesystem journal
 
log
.
size = size;
flush_buffer(
log.
size);
asm(“sfence”);
12
 
PM_SET
(
log.
size, size);
PM_FLUSH
(
log.
size, 8);
PM_FENCE
();
I
NSTRUMENT
I
DENTIFY
E
XECUTE
A
NALYZE
 
PM
Runtime
 
PM
Writes
 
PIN/mmiotrace
 
PM Application
 
Trace
 
Stats
 
Python analyzer
 and
 dependency-checker
Analyze 
trace for 
several statistics
Number of epochs/tx
Epoch dependencies
Epoch sizes
 
13
 
Execute and Analyze
 
WHISPER: 
W
isconsin-
H
P Labs 
S
uite for 
P
ersistence
Analysis
Results
 
14
 
 
 
 
Outline
How many accesses to PM ?
15
Suggestion: Do not impede volatile accesses
 
How many epochs/transaction ?
 
Durability after every epoch is impedes execution
Assumption
: 3 epochs/TX = log + data + commit
Reality
: 5 to 50 epochs/TX
Highest rate of epochs: Native & TM libraries
Suggestion: 
Enforce durability
 only
 
at 
the end of a transaction
16
How large are epochs typically ?
Suggestion: Consider 
optimiz
ing
 for 
small epochs
 
Determines amount of
state buffered per epoch
Small epochs are abundant
75
%
 
update single
cacheline
Large epochs in PMFS
17
# of 64B 
cachelines
What contributes to epochs ?
Log entries
Undo log: Alternating epochs of log and data
Redo log: 1 Log epoch + 1 data epoch
Persistent memory allocation
1 to 5 epochs
18
Suggestion: Use redo logs and reduce epochs
from memory allocator
What are epoch dependencies ?
A
B
C
D
1
2
3
Thread 1
Thread 2
 
Self-dependency
: B 
 D
Cross-dependency: 2 
 C
Why do they matter ?
Dependency can
 
stall
execution
Measured dependencies in
50 us window
19
How common are dependencies ?
20
Suggestion: Design m
ulti-versioned caches 
OR avoid updating same cacheline across epochs
Summary
 
WHISPER: 
W
isconsin-
H
P Labs 
S
uite for 
Per
sistence
4% accesses 
to PM, 96% accesses to DRAM
5-50 epochs/TX
, primarily 
small
 in size
Memory allocation
, logging
 
introduce extra 
epochs
Cross-dependencies 
rare
, self-dependencies 
common
More results in ASPLOS’17 paper and code at:
21
research.cs.wisc.edu/multifacet/whisper/
 
22
 
Extra
A Simple Transaction using Epochs
23
TM_BEGIN();
pobj
.data = 42;
pobj
.init = True;
TM_END();
 
 
transaction_begin:
 
log[
pobj
.init] ← True
 
log[
pobj
.data] ← 42
  
write_back(log)
 
wait_for_write_back()
 
pobj
.init ← True
 
pobj
.data ← 42
 
write_back(
pobj
)
 
wait_for_write_back()
transaction_end
;
 
Epoch 1
 
 
 Log entries
 stored &
 persisted.
 
 
 
 
Epoch 2
 
 
 Variables
 stored &
 persisted.
 
R
u
n
t
i
m
e
s
 
c
a
u
s
e
 
w
r
i
t
e
 
a
m
p
l
i
f
i
c
a
t
i
o
n
 
PMFS
Mnemosyne
Logs every PM write
PMFS
NVML
Clears log
Auxiliary structures
< 5% 
writes to PM
Non-temporal writes
Mnemosyne logs
PMFS user-data
 
 
24
Slide Note

Good Morning, I am Sanketh & Today we’ll see how applications use persistent memory.

This is joint work with Swapnil, my advisors Mike and Mark and HP Enterprise.

Embed
Share

This analysis delves into the use of persistent memory with WHISPER, uncovering behaviors, system support enhancements, and the benefits of persistent memory technology. It discusses guarantees applications need, achieving consistency, and systems tailored for persistent memory utilization, providing valuable insights for researchers and industry professionals keen on leveraging this innovative technology.

  • Persistent Memory
  • WHISPER
  • University of Wisconsin-Madison
  • Hewlett-Packard Labs
  • Technology

Uploaded on Sep 20, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. An Analysis of Persistent Memory Use with WHISPER Sanketh Nalli, Swapnil Haria Michael M. Swift, Mark D. Hill Haris Volos*, Kimberly Keeton* University of Wisconsin-Madison & *Hewlett-Packard Labs (HPL)

  2. WHISPER Facilitate better system support for Persistent Memory Wisconsin-HP Labs Suite for Persistence Discovered behaviors: 4% accesses to PM, 96% accesses to DRAM 5-50 epochs/tx, contributed by memory allocation & logging Re-referencing PM cachelines: Rare across threads, common within a thread WHISPER: research.cs.wisc.edu/multifacet/whisper 2

  3. Outline WHISPER: Wisconsin-HP Labs Suite for Persistence Analysis Results 3

  4. Persistent Memory is coming soon Persistent Memory = NVM attached to CPU on memory bus Offers low latency reads and persistent writes Allows user-level, byte-addressable loads and stores 4

  5. What guarantees do applications need ? Durability = Content is available to user after failure Consistency = Content is recoverable and usable after failure Data PM Data Pointer Pointer Pointer CACHE 1 . Data update followed by pointer update in cache 2. Pointer is evicted from cache to PM 3. Data lost on failure, dangling pointer persists 5

  6. Achieving consistency Data Data Data Pointer Data PM flush Pointer flush 3 . Store pointer update in cache 4 . Flush pointer update to PM 2 . Flush data update to PM 1 . Store data update in cache Ordering = Useful building block of consistency mechanisms Epoch = Set of writes to PM guaranteed to be durable before ANY writes to PM in following epochs become durable Ordering primitives: sfence, mfence of x86-64 6

  7. PM systems for consistency Native Application Application-specific optimizations TX TX load/ store NVML read/write Mnemosyne Persistent Heaps load/store VFS Atomic allocations, type safety ext4-DAX PMFS PM-aware Filesystems Persistent Memory (PM) POSIX interface 7

  8. Whats the problem ? Lack of standard workloads slows research Micro-benchmarks may not be representative Partial understanding of how applications use PM 8

  9. WHISPER Benchmark Type Brief description (*Adapted to PM) N-store* Database H-store like DB. Undo logs for consistency Echo* KV store Scalable, multi-version key-value store Memcached* Mnemosyne Distributed key-value store Vacation* Mnemosyne Online travel reservation system Redis NVML REmote Dictionary Service C-tree NVML Microbenchmarks for simulations Hashmap NVML Microbenchmarks for simulations NFS PMFS Linux server/client for remote file access Exim PMFS Mail server;stores mails in per-user file MySQL PMFS Widely used RDBMS for OLTP 9

  10. Outline WHISPER: Wisconsin-HP Labs Suite for Persistence Analysis Results 10

  11. Identify writes to PM PM Application PIN/mmiotrace PM Writes PM Stats Trace Runtime IDENTIFY INSTRUMENT EXECUTE ANALYZE PIN for userspace, mmiotrace for the kernel On average, 101 lines in applications that update PM 67 line in the kernel that update PM 11

  12. Instrument writes to PM PM Application PIN/mmiotrace PM Writes PM Stats Trace Runtime IDENTIFY INSTRUMENT EXECUTE ANALYZE C macros capture all modes of updating PM and, PM transaction start/end, cacheline flushes, fences Example: Update and persist size of filesystem journal log.size = size; PM_SET(log.size, size); flush_buffer(log.size); PM_FLUSH(log.size, 8); asm( sfence ); PM_FENCE(); 12

  13. Execute and Analyze PM Application PIN/mmiotrace PM Writes PM Stats Trace Runtime IDENTIFY INSTRUMENT EXECUTE ANALYZE Python analyzer and dependency-checker Analyze trace for several statistics Number of epochs/tx Epoch dependencies Epoch sizes 13

  14. Outline WHISPER: Wisconsin-HP Labs Suite for Persistence Analysis Results 14

  15. How many accesses to PM ? Total number of accesses in a WHISPER application 4% Accesses to PM Accesses to DRAM 96% Suggestion: Do not impede volatile accesses 15

  16. How many epochs/transaction ? Durability after every epoch is impedes execution Assumption: 3 epochs/TX = log + data + commit Reality: 5 to 50 epochs/TX Highest rate of epochs: Native & TM libraries Suggestion: Enforce durability only at the end of a transaction 16

  17. How large are epochs typically ? # of 64B cachelines 1 2 3 4 5 6-63 >=64 Determines amount of state buffered per epoch 100% Fraction of epochs 75% Small epochs are abundant 50% 75%update single cacheline 25% 0% Large epochs in PMFS Suggestion: Consider optimizing for small epochs 17

  18. What contributes to epochs ? Log entries Undo log: Alternating epochs of log and data Redo log: 1 Log epoch + 1 data epoch Persistent memory allocation 1 to 5 epochs Suggestion: Use redo logs and reduce epochs from memory allocator 18

  19. What are epoch dependencies ? 1 Self-dependency: B D A Cross-dependency: 2 C B 2 Why do they matter ? C Dependency can stall execution 3 D Thread 2 Thread 1 Measured dependencies in 50 us window 19

  20. How common are dependencies ? % cross-dep % self-dep 0.01 echo 54.5 0.003 nstore-ycsb 40.2 0.03 nstore-tpcc 27.18 0 redis 82.5 0 ctree 79 0 hashmap 81 0.01 vacation 40 0.2 memcached 63.5 5 nfs 55 1.16 exim 45.27 0.04 mysql 17.89 0 Suggestion: Design multi-versioned caches OR avoid updating same cacheline across epochs 100 Epoch dependencies as a percentage of total epochs 20

  21. Summary WHISPER: Wisconsin-HP Labs Suite for Persistence 4% accesses to PM, 96% accesses to DRAM 5-50 epochs/TX, primarily small in size Memory allocation, logging introduce extra epochs Cross-dependencies rare, self-dependencies common More results in ASPLOS 17 paper and code at: research.cs.wisc.edu/multifacet/whisper/ 21

  22. Extra 22

  23. A Simple Transaction using Epochs transaction_begin: Epoch 1 log[pobj.init] True Log entries TM_BEGIN(); log[pobj.data] 42 stored & pobj.data = 42; write_back(log) persisted. wait_for_write_back() pobj.init = True; Epoch 2 pobj.init True TM_END(); Variables pobj.data 42 stored & write_back(pobj) persisted. wait_for_write_back() transaction_end; 23

  24. Runtimes cause write amplification Runtimes cause write amplification PMFS Mnemosyne Logs every PM write PMFS NVML Clears log Auxiliary structures < 5% writes to PM Non-temporal writes Mnemosyne logs PMFS user-data Write Amplification 1200 1100 1000 1000 PERCENTAGE 800 600 400 200 100 10 0 24

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#