Enabling AVX-512 Assembly Instructions in Valgrind by Tatyana Volnina (Mineeva)

 
Enabling AVX-512 assembly
instructions in Valgrind
 
Tatyana Volnina (Mineeva)
 
Agenda
 
AVX-512 instruction summary
AVX-512 implementation in Valgrind
Q/A
 
AVX-512 instructions
 
 
Vector register layout
 
Masking
 
Masked vector sum:
 
 
AVX-512 features
 
New instruction prefix (EVEX):
Instruction-specific rounding mode
Embedded broadcast
New memory displacement encoding
8-bit and 16-bit vector elements
 
AVX-512 subsets
- Instruction set is already enabled
N 
- Number of instructions to add
 
AVX-512 Valgrind
 
 
General Approach
 
Separate AVX-512 code
Reuse existent IRs
Do not affect non-AVX512 runs
 
AVX-512 performance haven’t been measured yet
 
AVX-2 Valgrind
Tool instrumentation
(up to
 256 bit IR)
User’s
binary
 
AVX-512 Valgrind
Tool instrumentation
(up to 
512 bit
 IR)
User’s
binary
 
AVX-512 instruction handling
 
Refer to the data structures
in Valgrind core
 
AVX-512 Valgrind tests
 
Valgrind core:
Nas Parallel Benchmarks - one failure on an existent Valgrind limitation
HPC workloads
Memcheck:
ISPC and PMDK regression tests
Intel Inspector memory tests
Helgrind:
Data race
 
benchmarks (Lawrence Livermore)
AVX-2 and AVX-512 reports matched
 
VG-to-PIN comparison
 
Problem: If an instruction is emulated incorrectly under Valgrind, it can
pass unnoticed for a long time
Solution: dump register values under Valgind and non-Valgrind run and
compare them
 
Wrote logging tools for Valgrind and Intel PIN:
-
Heavyweight and not user-friendly
-
Invaluable to detect emulation errors
 
Next steps
 
Upstream
Test on user applications
Which CPUs need enabling?
Which analysis tools need enabling?
 
Questions
 
Slide Note

There are a few clarification notes on the slides below.

Embed
Share

Explore the implementation of AVX-512 instructions in Valgrind through detailed agendas, instruction layouts, masking techniques, new features, subsets, and methodologies. Discover how Valgrind enables parsing of assembly instructions for AVX-512 and AVX-2, facilitating code separation and IR generation.

  • AVX-512
  • Valgrind
  • Assembly Instructions
  • Tatyana Volnina
  • Implementation

Uploaded on Oct 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Enabling AVX-512 assembly instructions in Valgrind Tatyana Volnina (Mineeva) 1

  2. Agenda AVX-512 instruction summary AVX-512 implementation in Valgrind Q/A 2

  3. AVX-512 instructions 3

  4. Vector register layout AVX-512 AVX-2 SSE (x86-64) ZMM0 YMM0 XMM0 ZMM1 YMM1 XMM1 ZMM15 YMM15 XMM15 ZMM16 YMM16 XMM16 ZMM31 YMM31 XMM31 bits 512 256 255 128 127 0 4

  5. Masking Masked vector sum: ZMM1 (8 x 64-bit ints) 1 2 3 4 5 6 7 8 + ZMM2 (8 x 64-bit ints) 4 4 3 3 2 2 1 1 k1 (8 bits) 0 1 0 0 1 1 0 1 = = Zero-mode sum 0 6 0 0 7 8 0 9 Merge-mode sum (ZMM1 as base) 1 6 3 4 7 8 7 8 5

  6. AVX-512 features New instruction prefix (EVEX): Instruction-specific rounding mode Embedded broadcast New memory displacement encoding 8-bit and 16-bit vector elements 6

  7. AVX-512 subsets AVX-512 subsets Total missing F CD VL DQ BW ER PF IFMA VBMI VNNI VPOPCNTDQ VBMI2 BITALG VPCLMULQDQ GFNI VAES VP2INTERSECT BF16 Knight s Landing 0 Skylake 0 2 4 4 2 8 3 1 3 4 Ice Lake 31 2 4 4 2 8 3 1 3 4 2 Tiger Lake Sapphire Rapids 33 4 2 3 9 - Instruction set is already enabled N - Number of instructions to add 7

  8. AVX-512 Valgrind 8

  9. General Approach Separate AVX-512 code Reuse existent IRs Do not affect non-AVX512 runs AVX-512 performance haven t been measured yet 9

  10. AVX-2 Valgrind Parse assembly instruction User s binary Up to AVX-2 Translate into intermediate representation Tool instrumentation (up to 256 bit IR) Up to 256 bits Generate assembly Up to SSE 10

  11. AVX-512 Valgrind Parse assembly instruction User s binary Up to AVX-512 Translate into intermediate representation Tool instrumentation (up to 512 bit IR) Up to 512 bits Generate assembly Up to SSE or AVX-512 intrinsics 11

  12. AVX-512 instruction handling Describe assembly instructions in .csv file Generate regression tests Generate C data structures Refer to the data structures in Memcheck Did not work Refer to the data structures in Valgrind core 12

  13. AVX-512 Valgrind tests Valgrind core: Nas Parallel Benchmarks - one failure on an existent Valgrind limitation HPC workloads Memcheck: ISPC and PMDK regression tests Intel Inspector memory tests Helgrind: Data race benchmarks (Lawrence Livermore) AVX-2 and AVX-512 reports matched 13

  14. VG-to-PIN comparison Problem: If an instruction is emulated incorrectly under Valgrind, it can pass unnoticed for a long time Solution: dump register values under Valgind and non-Valgrind run and compare them Wrote logging tools for Valgrind and Intel PIN: - Heavyweight and not user-friendly - Invaluable to detect emulation errors 14

  15. Next steps Upstream Test on user applications Which CPUs need enabling? Which analysis tools need enabling? 15

  16. Questions 16

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#