High Performance Computing

High Performance Computing
Slide Note
Embed
Share

Delve into parallel computing hardware, programming models, performance analysis, and parallel program parallelization options. Explore different parallel architectures, develop efficient algorithms, analyze performance, and parallelize tasks efficiently. Learn about parallel processing concepts, programming principles, and basic communication operations in high-performance computing.

  • Computing
  • Parallel
  • Performance Analysis
  • Programming Models
  • Parallel Architectures

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. High Performance Computing (410241) Subject Teacher: Prof Vijay More Examination Scheme In semester Assessment: 30 End Semester Assessment : 70 Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  2. Course Objectives To study parallel computing hardware and programming models To be conversant with performance analysis and modeling of parallel programs To understand the options available to parallelize the programs To know the operating system requirements to qualify in handling the parallelization Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  3. Course Outcomes On completion of the course, student will be able to Describe different parallel architectures, interconnect networks, programming models Develop an efficient parallel algorithm to solve given problem Analyze and measure performance of modern parallel computing systems Build the logic to parallelize the programming task Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  4. Unit 1 Parallel Processing Concepts Introduction: Motivating Parallelism, Scope of Parallel Computing, Parallel Programming Platforms: Implicit Parallelism, Trends in Architectures, Limitations Performance, Dichotomy of Parallel Computing Platforms, Physical Organization of Parallel Platforms, Communication Costs in Parallel Machines, Scalable design principles, Architectures: N-wide superscalar architectures, Multi-core architecture. Microprocessor of Memory, and System Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  5. Unit 2 Parallel Programming Principles of Parallel Algorithm Design: Preliminaries, Decomposition Techniques, Characteristics of Tasks and Interactions, Mapping Techniques for Load Balancing, Methods for Containing Interaction Overheads, Parallel Algorithm Models, The Age of Parallel Processing, the Rise of GPU Computing, A Brief History of GPUs, Early GPU. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  6. Unit 3 Basic Communication Operations One-to-All Broadcast and All-to-One Reduction, All-to-All Broadcast and Reduction, All-Reduce and Prefix-Sum Operations, Scatter and Gather, All-to-All Personalized Communication, Circular Shift, Improving the Speed of Some Communication Operations. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  7. Unit 4 Analytical Models of Parallel Programs Analytical Models: Sources of overhead in Parallel Programs, Performance Metrics for Parallel Systems, and The effect of Granularity on Performance, Scalability of Parallel Systems, Minimum execution time and minimum cost, optimal execution time. Dense Matrix Algorithms: Matrix-Vector Multiplication, Multiplication. Matrix-Matrix Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  8. Unit 5 Parallel Algorithms- Sorting and Graph Issues in Sorting on Parallel Computers, Bubble Sort and its Variants, Parallelizing Quick sort, All-Pairs Shortest Paths, Algorithm for sparse graph, Parallel Depth-First Search, Parallel Best-First Search. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  9. Unit 6 CUDA Architecture CUDA Architecture, Using the CUDA Architecture, Applications of CUDA Introduction to CUDA C-Write and launch CUDA C kernels, Manage GPU memory, Manage communication and synchronization, Parallel programming in CUDA- C. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  10. Text Books: 1. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, "Introduction to Parallel Computing", 2nd edition, Addison-Wesley, 2003, ISBN: 0-201-64865-2 2. 2. Jason sanders, Edward Kandrot, CUDA by Example , Addison-Wesley, ISBN-13: 978-0- 13-138768-3 Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  11. Reference Books: 1. Kai Hwang, Scalable Parallel Computing , McGraw Hill 1998, ISBN:0070317984 2. Shane Cook, CUDA Programming: A Developer's Guide to Parallel Computing with GPUs , Morgan Kaufmann Publishers Inc. San Francisco, CA, USA 2013 ISBN: 9780124159884 3. David Culler Jaswinder Pal Singh, Parallel Computer Architecture: A Hardware/Software Approach , Morgan Kaufmann,1999, ISBN 978-1-55860-343-1 4. Rod Stephens, Essential Algorithms , Wiley, ISBN: 978-1-118- 61210-1 Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  12. Introduction to Parallel Computing: 1. Motivating Parallelism 2. Scope of Parallel Computing

  13. Motivating Parallelism The role of parallelism in accelerating computing speeds has been recognized for several decades. Its role in providing range of datapaths and increased access to storage elements has been significantly large in commercial applications. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  14. The scalable performance and lower cost of parallel platforms is reflected recently in the wide variety of applications. Processor (CPU) is the active part of the computer, which does all the work of data manipulation and decision making. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  15. Datapath is the hardware that performs all the required data processing operations, for example, ALU, registers, and internal buses. Control is the hardware that tells the datapath what to do, in terms of switching, operation selection, between ALU components, etc. data movement Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  16. Developing software has traditionally been time and effort intensive. parallel hardware and If one is to view this in the context of rapidly improving uniprocessor speeds, one is tempted to question the need for parallel computing. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  17. There are some well defined trends in hardware design, which uniprocessor (or architectures may not be able to sustain the rate of realizable performance increments in the future. indicate that implicitly parallel) Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  18. This is the result of a number of fundamental physical and computational limitations. The materialization of standardized parallel programming libraries, and hardware has significantly reduced the time to (parallel) solution. environments, Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  19. The Computational Power Argument Moore s law states [1965]: The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  20. Gordon Moore at Fairchild R & D in 1962. Moore attributed this doubling rate to exponential behavior of die sizes, finer minimum dimensions, and circuit and device cleverness . Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  21. In 1975, he revised this law as follows: There is no room left to squeeze anything out by being clever. Going forward from here we have to depend on the two size factors - bigger dies and finer dimensions. He revised his rate of circuit complexity doubling to 18 months and projected from 1975 onwards at this reduced rate. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  22. A die in the context of integrated circuits is a small block of semiconducting material, on which a given functional circuit is fabricated. By 2004, clock frequencies had gotten fast enough- around 3 GHz that any further increases would have caused the chips to melt from the heat they generated. So while the manufacturers continued to increase the number of transistors per chip, they no longer increased the clock frequencies. Instead, they started putting multiple processor cores on the chip. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  23. The logical alternative is to rely on parallelism, both implicit and explicit. Most serial processors rely extensively on implicit parallelism. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  24. Implicit parallelism is a characteristic of a programming language compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language s constructs that allows a A pure implicitly parallel language does not need special directives, functions to enable parallel execution operators or Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  25. The Memory/Disk Speed Argument While clock rates of high-end processors have increased at roughly 40% per year over the past decade, DRAM access times have only improved at the rate of roughly 10% per year over this interval. This mismatch in speed causes significant performance bottlenecks. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  26. Parallel platforms provide increased bandwidth to the memory system. Parallel platforms also provide higher aggregate caches. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  27. Principles of locality of data reference and bulk access, which guide parallel algorithm design also apply to memory optimization. Some of the fastest growing applications of parallel computing utilize not their raw computational speed, rather their ability to pump data to memory and disk faster Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  28. Locality of Reference Property: Particular portion of memory address space is accessed frequently by a program during its execution in any time window. E.g. Innermost loop. There are three dimensions of locality property. 1. Temporal locality 2. Spatial locality 3. Sequential locality Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  29. 1. Temporal Locality Items recently referred are likely to be referenced in near future by loop, stack, temporary variables, or subroutines, etc. Once a loop is entered (or subroutine is called), a small code segment is referenced many times repeatedly. This temporal locality is clustered in recently used areas. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  30. 2. Spatial Locality It is the tendency of a process to access items whose addresses are near to each other, e.g. tables, arrays involves access to special areas which are clustered together. Program segment containing subroutines and macros are kept together in the neighbourhood of memory space. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  31. 3. Sequential Locality Execution of instructions in a program follow a sequential order instructions encountered. unless out-of-order The ratio of in-order execution to out-of-order execution is generally 5 to 1. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  32. The Data Communication Argument As the network evolves, the vision of the Internet as one large computing platform has emerged. This view is exploited by applications such as SETI@home and Folding@home. In many other applications (typically databases and data mining) the volume of data is such that they cannot be moved. Any analyses on this data must be performed over the network using parallel techniques. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  33. The (SETI@home) is the collective name for a number of activities undertaken to search for intelligent extraterrestrial life. search for extraterrestrial intelligence Folding@home is a distributed computing project for disease research that simulates protein folding, computational drug design, and other types of molecular dynamics. The project uses the idle processing resources of thousands of personal computers owned by volunteers who have installed the software on their systems. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  34. Scope of Parallel Computing Applications Parallelism finds applications in very diverse application domains for different motivating reasons. These range from improved application performance to cost considerations Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  35. Applications in Engineering and Design Design of airfoils (optimizing lift, drag, stability), internal combustion engines (optimizing charge distribution, burn), high-speed circuits (layouts for delays and capacitive and inductive effects), and structures (optimizing structural integrity, design parameters, cost, etc.). Design and simulation of micro- and nano-scale systems. Process optimization, operations research Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  36. Scientific Applications Functional and structural characterization of genes and proteins. Advances in computational physics and chemistry have explored new materials, understanding of chemical pathways, chemical bonds, and more efficient processes. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  37. Applications in astrophysics have explored the evolution of galaxies, thermonuclear processes, and the analysis of extremely large datasets from telescopes. Weather modeling, mineral prospecting, flood prediction, etc., are other important applications. Bioinformatics and astrophysics also present some of the most challenging problems with respect to analyzing extremely large datasets Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  38. Commercial Applications Some of the largest parallel computers power the Wall Street risk analysis, portfolio management, automated trading Data mining and analysis for optimizing business and marketing decisions. Large scale servers (mail and web servers) are often implemented using parallel platforms. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  39. Applications such as information retrieval and search are typically powered by large clusters. Cloud Computing is inherently on distributed systems processed by parallel systems over widespread network. Big Data is also emerging technology which uses integration of various parallel and distributed systems over the network Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  40. Computer aided engineering (CAE): Automotive design and testing, transportation, structural, mechanical design Chemical engineering: Process and molecular design Digital content creation (DCC) and distribution: Computer aided graphics in film and media Economics/financial: Wall Street risk analysis, portfolio management, automated trading Electronic design and automation (EDA): Electronic component design and verification Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  41. Geosciences and geo-engineering: Oil and gas exploration and reservoir modeling Mechanical design and drafting: 2D and 3D design and verification, mechanical modeling Defence and energy: Nuclear stewardship, basic and applied research Government labs: Basic and applied research University/academic: Basic and applied research Weather forecasting: Near term and climate/earth modeling Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  42. Applications in Computer Systems Network intrusion detection, cryptography, multiparty computations are some of the core users of parallel computing techniques. Embedded systems increasingly rely on distributed control algorithms. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  43. A modern automobile consists of number of processors communicating to perform complex tasks for optimizing handling and performance. Conventional structured peer-to-peer networks impose overlay networks and utilize algorithms directly from parallel computing. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

  44. QA Session Q 7b). Share your thoughts about how HPC will help to promote "MAKE IN INDIA" initiative? (5) May '16 Q.1a) What are applications of parallel computing (4) Dec 16? Prof Vijay More, MET's IOE, BKC, Adgaon Nashik

Related


More Related Content