Exploring Dichotomy of Parallel Computing Platforms
Dichotomy of Parallel Computing Platforms explores the communication model and control structures in parallel platforms, delving into the logical and physical organization as well as the levels of granularity for expressing parallelism. The discussion covers Flynn's Classical Taxonomy, classifying multi-processor computer architectures based on instruction and data dimensions.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Dichotomy of Parallel Computing Platforms Communication Model of Parallel Platforms Prof V B More, MET s IOE BKC Nashik
Dichotomy of Parallel Computing Platforms Parallel platforms are composed of logical and physical organization of the systems. Physical organization hardware organization whereas logical organization is the programmer. is the actual perspective of
Dichotomy of Parallel Computing Platforms There are two important components of parallel computing (as per the view of programmer): Control structures: Various ways of expressing parallel tasks; Communication model: The mechanism for specifying interaction between the parallel tasks.
Control Structure of Parallel Programs Parallelism can be expressed at various levels of granularity from instruction level (fine grain) to processes (coarse grain). Between these levels, there are range of models that supports corresponding architecture. Based on this diverse in formation of parallel tasks, appropriate structure models can be applied. control 4 Prof Vijay More, MET's IOE, BKC, Adgaon Nashik
Processing units in parallel computers either operate under the centralized control of a single control unit or work independently. Based on this aspect, parallel computers can be classified as per the Flynn s taxonomy. 5 Prof Vijay More, MET's IOE, BKC, Adgaon Nashik
Flynn's Classical Taxonomy 1966 Flynn's classical taxonomy classifies multi- processor computer architectures based on two independent dimensions: (i) Instruction and, (ii) Data. Each of these dimensions can have only one of two possible states: (a) Single or (b) Multiple.
Flynn's Classical Taxonomy 1966 SISD SIMD Single Instruction, Single Data Single Instruction, Multiple Data MISD Multiple MIMD Multiple Instruction, Multiple Data Instruction, Single Data
Single Instruction, Single Data (SISD) A serial (non-parallel) computer Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle
Single Instruction, Single Data (SISD) Single data: only one data stream is being used as input during any one clock cycle Deterministic execution This is the oldest and even today, the most common type of Computer Examples: older generation mainframes, minicomputers and workstations; most modern day PCs.
Single Instruction, Multiple Data (SIMD) A type of parallel computer Single instruction: All processing units execute the same instruction at any given clock cycle Multiple data: Each processing unit can operate on a different data element
Single Instruction, Multiple Data (SIMD) Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. Synchronous (lockstep) and deterministic execution Two varieties: Processor Arrays and Vector Pipelines
Single Instruction, Multiple Data (SIMD) Examples: Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV Vector Pipelines: IBM 9000, Cray X-MP, Y- MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10 Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.
Abstract SIMD Model PE1 LM1 DS DS IS CU IS Loaded from host PEn LMn DS DS
Operational SIMD Model Control Unit PEN-1 PE0 PE1 Proc. 0 Proc. 1 Proc. N-1 Mem. 0 Mem. 1 Mem. N-1 Interconnection Network
At each step, when global clock pulse changes, all processors execute same instruction, each on different data (single instruction stream multiple data stream).
SIMD machines are helpful in performing vector calculations. Ex. Addition of two vectors each having n elements, approx n/2 PEs will execute same instruction simultaneously.
Multiple Instruction, Single Data (MISD): A single data stream is fed into multiple processing units. Each processing unit operates on the data independently via independent instruction streams. Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971).
Multiple Instruction, Single Data (MISD): Some uses might be: multiple frequency filters operating on a single signal stream multiple cryptography algorithms attempting to crack a single coded message.
Multiple Instruction, Multiple Data (MIMD) Currently, the most common type of parallel computer. Most modern computers fall into this category. Multiple Instruction: every processor may be executing a different instruction stream Multiple Data: every processor may be working with a different data stream
Multiple Instruction, Multiple Data (MIMD) Execution can be synchronous or asynchronous, deterministic or non- deterministic Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. Note: many MIMD architectures also include SIMD execution sub-components
SIMD-MIMD Comparison SIMD computers require less hardware than MIMD computers (single control unit). However, since SIMD processors are specially designed, they tend to be expensive and have long design cycles. Not all applications are naturally suited to SIMD processors. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik 24
SIMD-MIMD Comparison In contrast, platforms supporting the SPMD paradigm can be built from inexpensive off-the-shelf components with relatively little effort in a short amount of time. MPI is primarily for SPMD/MIMD. HPF is an example of a SIMD interface. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik 25
A cluster computer offers a low-cost alternative to supercomputers obtaining higher processing power by interconnecting a processing nodes. Technically, in terms of Flynn's original classification, Cluster computing must be classified as multiple instruction-stream, multiple data-stream classification (MIMD) architecture since each computer in cluster executing its own program. for large number of Prof Vijay More, MET's IOE, BKC, Adgaon Nashik 26
If the same program is running over all the computers in the cluster, the processing is in single program, multiple data-stream (SPMD) mode. Prof Vijay More, MET's IOE, BKC, Adgaon Nashik 27
Basic structure of a SPMD program Prof Vijay More, MET's IOE, BKC, Adgaon Nashik 28
For example, here a controller process performs a different task (e.g. reading, checking and distributing initial data) to a worker process: main(int argc, char **argv) { If (process is a controller process) // call controller process Controller(/* Arguments */); else // call worker process Worker( /* Arguments */ ); } Prof Vijay More, MET's IOE, BKC, Adgaon Nashik 29
Class Example What kind of machine was the class room example? A. SISD B. SIMD C. MISD D. MIMD
Some Terminology Task A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. Parallel Task A task that can be executed by multiple processors safely and produces correct results.
Some Terminology Serial Execution Execution of a program sequentially instruction by statement at a time. This can be visualized as one processor machine. However, virtually all parallel tasks will have sections of a parallel program that must be executed serially. instruction, one
Some Terminology Parallel Execution Execution of a program by more than one task, with each task being able to execute the same statement at the same moment in time. Pipelining Breaking a task into chunks performed by different processing units, much like an assembly line; a type of parallel computing. or different
Some Terminology Shared Memory From a strictly hardware point of view, it describes a computer architecture where all processors have direct access to common physical memory. In a programming sense, it describes a model where all parallel tasks have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists.
Some Terminology Symmetric Multi-Processor (SMP) It is hardware architecture where multiple processors share a single address space and access all resources; computing. Distributed Memory It refers to the network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing. shared memory
Some Terminology Communications Parallel exchange data. There are several ways this can be done, such as through a shared memory bus or over a network. The actual event of data exchange is referred as communications regardless of the method used. tasks typically need to
Some Terminology Synchronization The coordination of parallel tasks in real time, communications. An application where a task may not proceed further until another task(s) reaches the same or logically equivalent Synchronization waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase. associated with point. involves usually
Some Terminology Granularity In parallel computing, granularity is a qualitative measure computation to communication. Coarse: relatively computational work are done between communication events Fine: relatively computational work are done between communication events of the ratio of large amounts of small amounts of
Some Terminology Observed Speedup Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution/ wall- clock time of parallel execution (wall time is the actual time taken from the start of a computer program to the end. ) It is one of the simplest and most widely used indicators for a parallel program's performance.
Some Terminology Parallel Overhead The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such as: Task start-up time Synchronizations Data communications Software overhead imposed by parallel compilers, libraries, tools, operating system, etc. Task termination time
Some Terminology Massively Parallel Refers to the hardware that comprises a given parallel system processors. The meaning of "many" keeps increasing, the largest parallel computers can be composed of processors ranging hundreds of thousands. - having many Embarrassingly Parallel Solving many similar, but independent tasks simultaneously; no need for coordination between the tasks.
Some Terminology Scalability Refers to a demonstrate a an increase in parallel speedup with the addition of more processors. Factors that contribute to scalability include: Hardware - memory, cpu, bandwidths and network communications Application algorithm Characteristics of your specific application and coding parallel system's ability to
Some Terminology Multi-core Processors Multiple processor cores on a single chip. Cluster Computing Combination of homogeneous processors, networks to build a parallel system. Supercomputing / High Performance Computing Use of the world's fastest, largest machines to solve large problems. large systems number comprising of
Communication Model of Parallel Platforms There are two primary forms of data exchange between parallel tasks i) accessing shared data space and ii) exchanging messages. Platforms that provide a shared data space are called shared-address-space machines or multiprocessors. 1.Consistency problems 2.Global view of memory
Communication Model of Parallel Platforms Platforms that support messaging are called message passing platforms or multicomputers. 1.No consistency problems 2.Lack of global view of memory 3.Communication is costly
Shared-Address-Space Platforms Part (or all) of the memory is accessible to all processors. Processors interact by modifying data objects stored in this shared-address- space.
Shared-Address-Space Platforms If the time taken by a processor to access any memory word in the system (global or local) is identical, the platform is classified as a uniform memory access (UMA), else, a non- uniform memory access (NUMA) machine.
NUMA & UMA Shared-Address-Space Platforms P M Interconnection Network P M M P Typical shared-address-space architectures: (a) Uniform- memory-access shared-address-space computer;
NUMA & UMA Shared-Address-Space Platforms P M C Interconnection Network P M C P M C Typical shared-address-space architectures: (b) Uniform-memory-access shared-address-space computer with caches and memories;
NUMA & UMA Shared-Address-Space Platforms P Interconnection Network C M P C M P C M Typical shared-address-space architectures: (c) Non-uniform-memory-access space computer with cache and local memory only. shared-address-