Enhancing Database Accelerators with OpenCAPI Technology

undefined
 
A
d
o
p
t
i
n
g
 
O
p
e
n
C
A
P
I
 
f
o
r
 
H
i
g
h
 
B
a
n
d
w
i
d
t
h
D
a
t
a
b
a
s
e
 
A
c
c
e
l
e
r
a
t
o
r
s
 
Authors:               Jian Fang
1
, Yvo T.B. Mulder
1
,
 Kangli Huang
1
, Yang Qiao
1
,
                             Xianwei Zeng
1
,
 Jan Hidders
2
,
 Jinho Lee
3
, H. Peter Hofstee
1,3
 
H
2
RC @ SC’17, 
Denver, USA
 
Speaker: Jian Fang (j.fang-1@tudelft.nl)
 
November 17
th
, 2017
 
2
 
Netezza Data Appliance Architecture
 
N
e
t
e
z
z
a
 
Source: 
The Netezza data appliance architecture:
A platform for high performance data warehousing and analytics
3
S-Blade in Netezza
Source: 
The Netezza data appliance architecture: 
A platform for high performance data warehousing and analytics
4
Netezza Data Appliance Architecture
N
e
t
e
z
z
a
Source: 
The Netezza data appliance architecture: 
A platform for high performance data warehousing and analytics
Bottleneck
5
DB with FPGAs: 
What is new?
 
Databases move from Disk to 
Memory
Databases move from Disk to 
Flash
Source: 
https://www.datanami.com/2015/10/21/
neo4j-touts-10x-performance-boost-of-graphs-on-ibm-power-fpgas/
6
OpenCAPI Helps
 
OpenCAPI brings FPGAs
memory scale bandwidth
OpenCAPI 3.0(x8) -> 25GB/s
OpenCAPI 4.0(x32) -> 100GB/s
 
Shared memory
Address Translation
Save extra copies
 
Target on more than
computation-intensive
applications
Source
: http://opencapi.org
 
100GB/s in total with 4 Channels
OpenCAPI 3.0
 
100GB/s in total with 1 Channel
OpenCAPI 4.0
 
OpenCAPI 4.0
High
Bandwidth
Low
Latency
Shared
Memory
 
7
 
Acceleration DBs with OpenCAPI
 
Decompress-Filter
 
Hash-Join
 
Merge-Sorter
 
• • •
8
Decompress-Filter
Parquet format
Partitionable
Supports GZIP, LZO, Snappy, ...
Snappy (de)compress algorithm
Based on LZ77, byte-oriented
Low compress ratio, but fast (de)compress speed
Computation-bound
Highly data dependent
Multiple engines to keep up the bandwidth
Trade off between stronger but fewer engines and simpler but more engines
(64KB history for each engine)
Memory access pattern
Sequential read for each stream (engine)
Do we need
compression &
decompression?
9
Hash-Join
Memory
-bound
L
ow locality of the data and multiple passes of data transfers
The internal memory (BRAM) is too small to store the hash table
Memory access pattern
Sequentially read the relations
Randomly write/read the hash table
Granularity matters during random accesses
Require:
 
40B tuple
Access :
 
64B cacheline
Wasted:
 
24B
Fang J, et al. Analyzing In-Memory Hash Joins: Granularity Matters, ADMS 2017.
10
Merge-Sorter
 
Need strong sorter for the final pass
Memory access pattern
Sequentially read within each stream, but randomly choose between streams
Solutions
Even-odd sorter to continuously produce multiple tuples per cycle
Multi-stream bufferring to feed this beast
 
11
 
Summary
 
Databases have/need faster rate moving data
 
W
i
t
h
 
O
p
e
n
C
A
P
I
,
 
F
P
G
A
s
 
c
a
n
 
h
e
l
p
 
D
B
s
 
m
o
r
e
 
Challenges of high bandwidth acclerator design
Three examples
 
12
 
Authors
 
13
 
More Detail:
Progress with Power Systems and CAPI
https://ibm.ent.box.com/v/OpenPOWERWorkshopMicro50/file/239719608792
Leveraging the bandwidth of OpenCAPI with reconfigurable logic
https://indico-jsc.fz-juelich.de/event/55/other-view?view=standard
Contact Me: 
j.fang-1@tudelft.nl
Slide Note
Embed
Share

Explore how OpenCAPI is revolutionizing database accelerators by enabling high bandwidth connections, improving data movement speeds, reducing latency, and enhancing memory scalability. The adoption of OpenCAPI in conjunction with FPGA technology promises significant performance boosts for computation-intensive applications in high-performance data warehousing and analytics environments.

  • OpenCAPI Technology
  • Database Accelerators
  • High Bandwidth
  • FPGA Integration
  • Data Warehousing

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Adopting OpenCAPI for High Bandwidth Database Accelerators Authors: Jian Fang1, Yvo T.B. Mulder1, Kangli Huang1, Yang Qiao1, Xianwei Zeng1, Jan Hidders2, Jinho Lee3, H. Peter Hofstee1,3 1 H2RC @ SC 17, Denver, USA Speaker: Jian Fang (j.fang-1@tudelft.nl) November 17th, 2017

  2. Netezza Data Appliance Architecture Netezza 2 Source: The Netezza data appliance architecture: A platform for high performance data warehousing and analytics

  3. S-Blade in Netezza Compressed Data Filtered Data Decompressed Data 3 Source: The Netezza data appliance architecture: A platform for high performance data warehousing and analytics

  4. Netezza Data Appliance Architecture Netezza Bottleneck 4 Source: The Netezza data appliance architecture: A platform for high performance data warehousing and analytics

  5. DB with FPGAs: What is new? Databases move from Disk to Memory Databases move from Disk to Flash FPGAs still help? Faster Data Movement 5 Source: https://www.datanami.com/2015/10/21/ neo4j-touts-10x-performance-boost-of-graphs-on-ibm-power-fpgas/

  6. OpenCAPI 3.0 OpenCAPI 3.0 OpenCAPI 3.0 OpenCAPI 3.0 OpenCAPI Helps 100GB/s in total with 4 Channels OpenCAPI 3.0 OpenCAPI brings FPGAs memory scale bandwidth OpenCAPI 3.0(x8) -> 25GB/s OpenCAPI 4.0(x32) -> 100GB/s High OpenCAPI 4.0 Bandwidth 100GB/s in total with 1 Channel OpenCAPI 4.0 Low Latency Shared memory Address Translation Save extra copies Shared Memory Target on more than computation-intensive applications 6 Source: http://opencapi.org

  7. Acceleration DBs with OpenCAPI Decompress-Filter Hash-Join Merge-Sorter 7

  8. Decompress-Filter Parquet format Partitionable Supports GZIP, LZO, Snappy, ... Snappy (de)compress algorithm Based on LZ77, byte-oriented Low compress ratio, but fast (de)compress speed Computation-bound Highly data dependent Multiple engines to keep up the bandwidth Trade off between stronger but fewer engines and simpler but more engines (64KB history for each engine) Memory access pattern Sequential read for each stream (engine) Do we need compression & decompression? 8

  9. Hash-Join Memory-bound Low locality of the data and multiple passes of data transfers The internal memory (BRAM) is too small to store the hash table Memory access pattern Sequentially read the relations Randomly write/read the hash table Granularity matters during random accesses 40% Waste Require: Access : Wasted: 40B tuple 64B cacheline 24B 9 Fang J, et al. Analyzing In-Memory Hash Joins: Granularity Matters, ADMS 2017.

  10. Merge-Sorter Need strong sorter for the final pass Memory access pattern Sequentially read within each stream, but randomly choose between streams Solutions Even-odd sorter to continuously produce multiple tuples per cycle Multi-stream bufferring to feed this beast Odd Cycle Even Cycle Q(N) Q(N) Q4 Q4 Q3 Q3 3,5,6,1,8,4,2,7 9,10,11,12,13,14,15,16 10,15,11,16,13,14,12,9 1,2,3,4,5,6,7,8 10 OUTPUT INPUT

  11. Summary Databases have/need faster rate moving data With OpenCAPI, FPGAs can help DBs more Challenges of high bandwidth acclerator design Three examples 11

  12. Authors Kangli Huang TU Delft Yang Qiao TU Delft Jian Fang TU Delft Yvo T.B. Mulder TU Delft Jan Hidders Jinho Lee IBM Research Xianwei Zeng TU Delft H. Peter Hofstee TU Delft & IBM Research Vrije Universiteit Brussel 12

  13. Thank You More Detail: Progress with Power Systems and CAPI https://ibm.ent.box.com/v/OpenPOWERWorkshopMicro50/file/239719608792 Leveraging the bandwidth of OpenCAPI with reconfigurable logic https://indico-jsc.fz-juelich.de/event/55/other-view?view=standard Contact Me: j.fang-1@tudelft.nl 13

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#