CS260 Parallel Algorithms: Theory and Practice Review

Slide Note
Embed
Share

This review covers essential topics from the CS260 Parallel Algorithms course by Yihan Sun, focusing on key concepts such as scheduler programs, cost models, reduce and scan techniques, PRAM models, atomic primitives, small algorithms, the master theorem, and sorting algorithms like Quicksort and Mergesort. The content emphasizes the practical implementation of these parallel algorithms, offering insights on optimizing performance and problem-solving strategies.


Uploaded on Sep 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CS260 Parallel Algorithms: Theory and Practice Yihan Sun Review and Summary

  2. Takeaway You won t remember all the algorithms and bounds, but some General techniques How you get the implementation faster Where to find an algorithm in the course material 2

  3. Scheduler Program The program generate threads (tasks) The scheduler maps each thread to a processor (e.g., whenever a processor is available) Scheduler 3

  4. Cost model: work-depth For all computations, draw a DAG A->B means that B can be performed only when A has been finished Work: the total number of operations Depth (span): the longest length of chain Shows the dependency of operations in the algorithm ?(? ?+ ?) time 4

  5. Reduce and scan Divide and conquer Shrink the problem size In practice, an easy way to deal with it: E.g., 10? chunks 1 2 3 4 5 6 7 8 9 10 11 12 6 24 33 15 1 21 55 3 36 10 45 6 78 28 66 15 5

  6. Models PRAM Fork-join Binary or N-ary 6

  7. Atomic primitives Compare and swap Test and set Priority write Fetch and add 7

  8. Some small algorithms Matrix multiplication Connect it to IO efficient algorithms List ranking Random mate also useful in connected component Flattening multiple arrays Using prefix sum 8

  9. Master theorem Draw the recursion tree See if it s root-dominated or leaf-dominated 9

  10. Sorting algorithms Quicksort Filter/partitioning/packing in parallel Also useful in a number of other algorithms (e.g., BFS) Mergesort Parallel merging Also useful in sample sort algorithm Connecting to the sample sort algorithm mentioned in the I/O efficient algorithm lecture 10

  11. Deterministic parallelism Race How to avoid race Let algorithm behave exactly the same as the sequential version 11

  12. Graph algorithms BFS frontier-based algorithms Consider adding shortcuts for high-diameter graphs Connected component Random mate Strongly connected component Using reachability to decide connectivity Prefix doubling SSSP Find frontier, process neighbors, move to the next round 12

  13. Tree algorithms Join-based tree algorithms Process two subtrees recursively in parallel Join them back Augmentation Store some extra info in tree nodes for fast range query Persistence & multi-versioning Copy nodes on the path to preserve the previous version Let each thread work on a snapshot 13

  14. I/O efficient algorithms Consider the cost of moving blocks into/out of your memory B at a time, M as the total capacity to hold for free computation E.g., B = cache line size, M = cache size Cache-oblivious algorithms Algorithms does not need to know B or M But analysis will use B and M Divide and conquer until the subproblem fits in the small memory 14

  15. Scheduling Work-stealing scheduler # steals: ?(??) Binary-forking Depth = ? #processors = ? Practical and used in many real-world systems 15

  16. Algorithm techniques Divide-and-conquer Solve subproblems in parallel Shrink problem sizes Recurse on smaller problems Randomization Random mate Quicksort Random priorities Deterministic parallelism, prefix doubling, treaps, 16

  17. Algorithm techniques Round-based algorithms Perform all computations that do not conflict with each other in a round Until all required work has been done Prefix doubling In round ?, process 2? elements Augmented trees For range-based sum Path-copying Achieve multi-versioning 17

  18. Implementation tricks Coarsening Helps to avoid overhead of invoking parallel tasks The threshold depends on the amount of work of base case Nested parallel for loops Usually only need to parallelize the outmost one Make each parallel task large enough I/O efficiency How many cache misses your algorithm will cause? 18

  19. // a 32-bit hash function inline uint32_t hash32(uint32_t a) { a = (a+0x7ed55d16) + (a<<12); a = (a^0xc761c23c) ^ (a>>19); a = (a+0x165667b1) + (a<<5); a = (a+0xd3a2646c) ^ (a<<9); a = (a+0xfd7046c5) + (a<<3); a = (a^0xb55a4f09) ^ (a>>16); return a; } Implementation tricks Always avoid Allocating new space during parallel execution E.g., not using std::vector (or reserve enough space) Using (the default) random number generator Write a hash function instead Heavy conflicts Avoid race Avoid concurrent write (try to use atomic primitives) Avoid atomic primitives for (i = 1 to n) random[i]=hash(i); 19

  20. FINAL EXAM 20

  21. Exam 48-hour take-home exam You are free to use any sources or references but absolutely must cite them. You must not communicate with anyone else other than me Hand-writing is acceptable for the final Deadline: 5pm March 17th Pickup: WCH308 Send an email to me to ask for an electronic version Submission: Drop off at Yihan s office, WCH308. If I m not in, you can write the time on the front page, sign it, and slide it under the door. Send via email, but only if it s an electronic version. 21

  22. Exam 6 problems, 130 points Choose up to 110 points, and earn up to 100 points Explicitly mark the problems you choose otherwise the first 110 points you work on will be considered 22

  23. Quick suggestions for the exam If you can model the problem as an algorithm we ve seen in class, you can directly use the result in the lectures E.g., call prefix sum algorithm on xxxx, which needs xx work and xx depth , call graph connectivity algorithm on xxxx, which needs xx work and xx depth For any algorithm you design Explain correctness (unless it is very, very, very straightforward)! Usually only a few sentences is enough 23

  24. Some other things No class on Wednesday Prepare for your project presentation HW2 grading ready Average & median about 72/80 Use iEval (today 13th) 24

  25. Some other things Presentation Audience is your classmates: give background Preliminary results Reserve presentation slot Paper review hints: Slides of lecture 2 Summary + evaluation + conclusion 25

Related


More Related Content