Introduction to Thrust Parallel Algorithms Library
Thrust is a high-level parallel algorithms library, providing a performance-portable abstraction layer for programming with CUDA. It offers ease of use, distributed with the CUDA Toolkit, and features like host_vector, device_vector, algorithm selection, and memory management. With a large set of al
0 views • 18 slides
Parallel Chi-square Test for Feature Selection in Categorical Data
The chi-square test is a popular method for feature selection in categorical data with classification labels. By calculating chi-square values in parallel for all features simultaneously, this approach provides a more efficient solution compared to serial computation. The process involves creating c
1 views • 4 slides
Parallel Implementations of Chi-Square Test for Feature Selection
The chi-square test is an effective method for feature selection with categorical data and classification labels. It helps rank features based on their chi-square values or p-values, indicating importance. Parallel processing techniques, such as GPU implementation in CUDA, can significantly speed up
0 views • 4 slides
PuReMD Design - Initialization, Interactions, and Experimental Results
PuReMD Design involves the initialization of neighbor lists, bond lists, hydrogen bond lists, and coefficients of QEq matrix for bonded interactions. It also implements non-bonded interactions such as charge equilibration, Coulomb's forces, and Van der Waals forces. The process includes the generati
0 views • 23 slides
Communication Costs in Distributed Sparse Tensor Factorization on Multi-GPU Systems
This research paper presented an evaluation of communication costs for distributed sparse tensor factorization on multi-GPU systems. It discussed the background of tensors, tensor factorization methods like CP-ALS, and communication requirements in RefacTo. The motivation highlighted the dominance o
0 views • 34 slides
Update on Tools Integration, Measurement, and Modeling
TAU, a performance analysis framework, is being ported to ARM64 Linux and Power 8 Linux environments with updated instrumentation features. It offers measurement sampling support and integrates with various libraries for efficient performance tracking. Additionally, the TAU interface enables energy
0 views • 22 slides
OpenACC Compiler for CUDA: A Source-to-Source Implementation
An open-source OpenACC compiler designed for NVIDIA GPUs using a source-to-source approach allows for detailed machine-specific optimizations through the mature CUDA compiler. The compiler targets C as the language and leverages the CUDA API, facilitating the generation of executable files.
0 views • 28 slides
Fast Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments
This research focuses on enabling efficient and fast noncontiguous data movement between GPUs in hybrid MPI+GPU environments. The study explores techniques such as MPI-derived data types to facilitate noncontiguous message passing and improve communication performance in GPU-accelerated systems. By
0 views • 18 slides