Integrating DPDK/SPDK with Storage Applications for Improved Performance

 
I
n
t
e
g
r
a
t
i
n
g
 
D
P
D
K
/
S
P
D
K
 
w
i
t
h
 
s
t
o
r
a
g
e
 
a
p
p
l
i
c
a
t
i
o
n
 
@ Vishnu Itta
@ Mayank Patel
 
Containerized Storage
for Containers
 
 
 
Problem Statement
Current state of legacy applications and its issues
Possible solutions
Integrating DPDK related libraries with legacy applications
Approach of client/server model to use SPDK
Future work
 
A
g
e
n
d
a
 
P
r
o
b
l
e
m
 
S
t
a
t
e
m
e
n
t
 
 
S
t
a
t
e
 
o
f
 
l
e
g
a
c
y
 
s
t
o
r
a
g
e
 
a
p
p
l
i
c
a
t
i
o
n
s
 
 
 
Highly depends on locks for synchronization
Heavy system calls, context switches
Dependency on kernel for scheduling
Frequent memory allocations
Dependency on locks and heavy page faults due to huge memory
requirement
Copy of data multiple times between userspace and device
 
L
e
a
d
s
 
t
o
Doesn’t scale with current hardware trends
Context switches
IO latencies
 
P
o
s
s
i
b
l
e
 
s
o
l
u
t
i
o
n
s
 
 
 
lockless data structures
Reactor pattern
Usage of HUGE pages
Integrating with userspace NVM driver from SPDK
 
L
e
a
d
s
 
t
o
Scalability with CPUs
Lesser context switches
Better performance
 
 
I
n
t
e
g
r
a
t
i
n
g
 
D
P
D
K
 
r
e
l
a
t
e
d
 
l
i
b
r
a
r
i
e
s
 
w
i
t
h
 
s
t
o
r
a
g
e
a
p
p
l
i
c
a
t
i
o
n
s
 
 
 
 
L
i
n
k
i
n
g
 
D
P
D
K
 
w
i
t
h
 
a
p
p
l
i
c
a
t
i
o
n
 
 
 
 
Shared library
CONFIG_RTE_BUILD_SHARED_LIB=y
 
Static library
CONFIG_RTE_BUILD_SHARED_LIB=n
 
Using rte.ext or rte. Makefiles from dpdk/mk
For building application : rte.extapp.mk
For shared libraries : rte.extlib.mk, rte.extshared.mk
Fo building object files : rte.extobj.mk
 
I
n
t
e
g
r
a
t
i
o
n
 
o
f
 
D
P
D
K
 
l
i
b
r
a
r
i
e
s
 
w
i
t
h
 
A
p
p
l
i
c
a
t
i
o
n
l
i
b
r
a
r
i
e
s
 
 
 
Usage of constructor in libraries
 
Force linking of unused libraries with shared library
 
LDFLAGS += --no-as-needed -lrte_mempool --as-needed
 
rte_eal_init () - to initialize EAL - from legacy application
 
 
 
 
M
e
m
o
r
y
 
a
l
l
o
c
a
t
i
o
n
 
w
i
t
h
 
r
t
e
_
e
a
l
 
l
i
b
r
a
r
y
 
Memory allocation from HUGE pages
 
No overhead of address transformation at kernel
 
Issues with this approach in legacy multi-threaded applications:
Spinlock with multiple threads
Dedicated core to threads
 
 
 
 
C
a
c
h
e
 
b
a
s
e
 
a
l
l
o
c
a
t
i
o
n
 
w
i
t
h
 
m
e
m
p
o
o
l
 
l
i
b
r
a
r
y
 
Allocator of fixed-size object
 
Ring base handler for managing object block
 
Each get/put request does CAS operations
 
 
R
i
n
g
 
l
i
b
r
a
r
y
 
t
o
 
s
y
n
c
h
r
o
n
i
z
e
 
t
h
r
e
a
d
s
 
o
p
e
r
a
t
i
o
n
s
One thread to process operations
Message passing between core thread to/from other threads through RING buffer
Single CAS operation
Issue
Dedicated CPU core to thread
But, is there a way to integrate SPDK with current state of legacy applications without much
redesign?
 
A
p
p
r
o
a
c
h
 
o
f
 
c
l
i
e
n
t
/
s
e
r
v
e
r
 
m
o
d
e
l
 
t
o
 
u
s
e
 
S
P
D
K
 
 
Fast movement of data between processes (zero copy)
Based on shared memory with lockless ring buffers
Processes exchange data without involving kernel
Legacy App
SPDK
Table with data
pointers
Used
ring buf
Avail ring
buf
Vhost-user
server
Vhost-user client
 
v
h
o
s
t
-
u
s
e
r
 
+
 
v
i
r
t
i
o
 
Minimalistic lib has been implemented for prototyping: 
https://github.com/openebs/vhost-user
It is needed for embedding to legacy application replacing read/write calls
Provides simple API for read and write ops, sync and async variants
Storage device to open is UNIX domain socket with listening SPDK vhost server instead of
traditional block device
 
v
h
o
s
t
-
u
s
e
r
 
c
l
i
e
n
t
 
i
m
p
l
 
Our baseline is SPDK with NVMe userspace driver
Vhost-user library can achieve around 90% of
native SPDK’s performance if tuned properly
Said differently: overhead of virtio library can be as
low as 10%
For comparison, using SPDK with libaio which
simulates what could have been achieved with
traditional IO interface, achieves 80%
 
R
e
s
u
l
t
s
 
While the results may not be appealing  enough, it is not the end
SPDK scales nicely with number of CPUs. If vhost-user can scale too, that could make the gap
between traditional IO interfaces and vhost-user bigger
Requires more work on vhost-user library (adding support for multiple vrings in order to be able to
issue IOs to SPDK in parallel)
The concern is still there if legacy apps could fully utilize this potential without major rewrite remains
 
F
u
t
u
r
e
 
w
o
r
k
Slide Note
Embed
Share

This presentation discusses the challenges faced by legacy storage applications and proposes solutions through the integration of DPDK/SPDK libraries. Topics covered include the current state of legacy applications, issues such as heavy system calls and memory allocations, possible solutions like lockless data structures and reactor pattern, and techniques for integrating DPDK libraries with application libraries. The goal is to enhance scalability, reduce context switches, and improve performance in storage applications.


Uploaded on Jul 28, 2024 | 4 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Integrating DPDK/SPDK with storage application @ Vishnu Itta @ Mayank Patel Containerized Storage for Containers

  2. Agenda Problem Statement Current state of legacy applications and its issues Possible solutions Integrating DPDK related libraries with legacy applications Approach of client/server model to use SPDK Future work

  3. Problem Statement

  4. State of legacy storage applications Highly depends on locks for synchronization Heavy system calls, context switches Dependency on kernel for scheduling Frequent memory allocations Dependency on locks and heavy page faults due to huge memory requirement Copy of data multiple times between userspace and device Leads to Doesn t scale with current hardware trends Context switches IO latencies

  5. Possible solutions lockless data structures Reactor pattern Usage of HUGE pages Integrating with userspace NVM driver from SPDK Leads to Scalability with CPUs Lesser context switches Better performance

  6. Integrating DPDK related libraries with storage applications

  7. Linking DPDK with application Shared library CONFIG_RTE_BUILD_SHARED_LIB=y Static library CONFIG_RTE_BUILD_SHARED_LIB=n Using rte.ext or rte. Makefiles from dpdk/mk For building application : rte.extapp.mk For shared libraries : rte.extlib.mk, rte.extshared.mk Fo building object files : rte.extobj.mk

  8. Integration of DPDK libraries with Application libraries Usage of constructor in libraries Force linking of unused libraries with shared library LDFLAGS += --no-as-needed -lrte_mempool --as-needed rte_eal_init () - to initialize EAL - from legacy application

  9. Memory allocation with rte_eal library Memory allocation from HUGE pages No overhead of address transformation at kernel Issues with this approach in legacy multi-threaded applications: Spinlock with multiple threads Dedicated core to threads

  10. Cache base allocation with mempool library Allocator of fixed-size object Ring base handler for managing object block Each get/put request does CAS operations

  11. Ring library to synchronize threads operations One thread to process operations Message passing between core thread to/from other threads through RING buffer Single CAS operation Issue Dedicated CPU core to thread But, is there a way to integrate SPDK with current state of legacy applications without much redesign?

  12. Approach of client/server model to use SPDK

  13. vhost-user + virtio Fast movement of data between processes (zero copy) Based on shared memory with lockless ring buffers Processes exchange data without involving kernel Used ring buf Vhost-user server SPDK Legacy App Table with data pointers Vhost-user client Avail ring buf

  14. vhost-user client impl Minimalistic lib has been implemented for prototyping: https://github.com/openebs/vhost-user It is needed for embedding to legacy application replacing read/write calls Provides simple API for read and write ops, sync and async variants Storage device to open is UNIX domain socket with listening SPDK vhost server instead of traditional block device

  15. Results Our baseline is SPDK with NVMe userspace driver Vhost-user library can achieve around 90% of native SPDK s performance if tuned properly Said differently: overhead of virtio library can be as low as 10% For comparison, using SPDK with libaio which simulates what could have been achieved with traditional IO interface, achieves 80%

  16. Future work While the results may not be appealing enough, it is not the end SPDK scales nicely with number of CPUs. If vhost-user can scale too, that could make the gap between traditional IO interfaces and vhost-user bigger Requires more work on vhost-user library (adding support for multiple vrings in order to be able to issue IOs to SPDK in parallel) The concern is still there if legacy apps could fully utilize this potential without major rewrite remains

  17. QUESTIONS?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#