Understanding Algorithmic Complexity Measures and the Master Method
In this study, we explore key concepts in algorithmic complexity, including recurrences, matrix multiplication, merge sort, and tableau construction. We delve into the Master Method for solving recurrences, examining Cases 1, 2, and 3, and providing solutions for each scenario. Additionally, we discuss speedup, parallelism, and the critical path length in algorithmic analysis.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
LECTURE 2 Recurrences (Review) Recurrences (Review) Matrix Multiplication Matrix Multiplication Merge Sort Merge Sort Tableau Construction Tableau Construction Conclusion Conclusion 1
Algorithmic Complexity Measures TP = execution time on P processors T1 = work T1 = span* LOWER BOUNDS TP T1/P TP T1 * Also called critical-path length or computational depth.
Speedup Definition: T1/TP = speedup on P processors. If T1/TP = (P) P, we have linear speedup; = P, we have perfect linear speedup; > P, we have superlinear speedup, which is not possible in our model, because of the lower bound TP T1/P.
Parallelism Because we have the lower bound TP T1, the maximum possible speedup given T1 and T1 is T1/T1 = parallelism = the average amount of work per step along the span. July 13, 2006 4
The Master Method The Master Method for solving recurrences applies to recurrences of the form T(n) = aT(n/b) + f(n) , where a 1, b > 1, and fis asymptotically positive. * IDEA: Compare nlogba with f(n). * The unstated base case is T(n) = (1) for sufficiently small n. 5
Master Method CASE 1 T(n) = aT(n/b) + f(n) nlogba f (n) Specifically, f(n) = O(nlogba ) for some constant > 0. Solution: T(n) = (nlogba) . 6
Master Method CASE 2 T(n) = aT(n/b) + f(n) nlogba f (n) Specifically, f(n) = (nlogba lgkn) for some constant k 0. Solution: T(n) = (nlogba lgk+1n) . 7
Master Method CASE 3 T(n) = aT(n/b) + f(n) nlogba f (n) Specifically, f(n) = (nlogba + ) for some constant > 0 andf(n) satisfies the regularity condition that af(n/b) cf(n) for some constant c < 1. Solution: T(n) = (f(n)) . 8
Master Method Summary T(n) = aT(n/b) + f(n) CASE 1: f(n) = O(nlogba ), constant > 0 T(n) = (nlogba) . CASE 2: f(n) = (nlogba lgkn), constant k 0 T(n) = (nlogba lgk+1n) . CASE 3: f(n) = (nlogba + ), constant > 0, and regularity condition T(n) = (f(n)) . 9
Master Method Quiz T(n) = 4 T(n/2) + n nlogba = n2 n)CASE 1:T(n) = (n2). T(n) = 4 T(n/2) + n2 nlogba = n2 = n2 lg0n)CASE 2:T(n) = (n2lg n). T(n) = 4 T(n/2) + n3 nlogba = n2 n3)CASE 3:T(n) = (n3). T(n) = 4 T(n/2) + n2/lg n Master method does not apply! 10
LECTURE 2 Recurrences (Review) Recurrences (Review) Matrix Multiplication Matrix Multiplication Merge Sort Merge Sort Tableau Construction Tableau Construction Conclusion Conclusion 11
Square-Matrix Multiplication a11a12 a21a22 a1n a2n b11b12 b21b22 b1n b2n c11c12 c21c22 c1n c2n = cnn C ann A bnn B an1an2 bn1bn2 cn1cn2 n cij= aik bkj k = 1 Assume for simplicity that n = 2k. 12
Recursive Matrix Multiplication Divide and conquer C11 C12 A11 A12 B11 B12 = C21 C22 A21 A22 B21 B22 A11B11A11B12 A12B21A12B22 = + A21B11A21B12 A22B21A22B22 8 multiplications of (n/2) (n/2) matrices. 1 addition of n n matrices. 13
Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A B Absence of type declarations. 14
Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A B Coarsen base cases for efficiency. 15
Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A B Also need a row- size argument for array indexing. Submatrices are produced by pointer calculation, not copying of elements. 16
Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } C = A B C = C + T cilk void Add(*C, *T, n) { h h base case & partition matrices i i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } 17
Work of Matrix Addition cilk void Add(*C, *T, n) { h hbase case & partition matrices i i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } A1(n) = ? 4A1(n/2) + (1) = (n2) Work: CASE 1 nlogba = nlog24 = n2 (1) . 18
Span of Matrix Addition cilk void Add(*C, *T, n) { h hbase case & partition matrices i i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; } } cilk void Add(*C, *T, n) { h hbase case & partition matrices i i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); sync; return; maximum A1(n/2) + (1) = (lg n) Span: A1(n) = ? CASE 2 nlogba = nlog21 = 1 )f(n) = (nlogba lg0n) . 19
Work of Matrix Multiplication cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } 8 8M1(n/2) +A1(n) + (1) = 8M1(n/2) + (n2) = (n3) CASE 1 Work: M1(n) = ? nlogba = nlog28 = n3 (n2) . 20
Span of Matrix Multiplication cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } } cilk void Mult(*C, *A, *B, n) { float *T = Cilk_alloca(n*n*sizeof(float)); h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; 8 M1(n/2) + A1(n) + (1) = M1(n/2) + (lg n) = (lg2 n) CASE 2 Span: M1(n) = ? nlogba = nlog21 = 1 )f(n) = (nlogba lg1n) . 21
Parallelism of Matrix Multiply M1(n) = (n3) M1(n) = (lg2n) Work: Span: M1(n) M1(n) = (n3/lg2n) Parallelism: For 1000 1000 matrices, parallelism (103)3/102 = 107. 22
Stack Temporaries cilk void Mult(*C, *A, *B, n) { h hbase case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(T21,A22,B21,n/2); sync; spawn Add(C,T,n); sync; return; } float *T = Cilk_alloca(n*n*sizeof(float)); In hierarchical-memory machines (especially chip multiprocessors), memory accesses are so expensive that minimizing storage often yields higher performance. IDEA: Trade off parallelism for less storage. 23
No-Temp Matrix Multiplication cilk void MultA(*C, *A, *B, n) { // C = C + A * B h hbase case & partition matrices i i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } Saves space, but at what expense? 24
Work of No-Temp Multiply cilk void MultA(*C, *A, *B, n) { // C = C + A * B h hbase case & partition matrices i i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } 8 M1(n/2) + (1) CASE 1 Work: M1(n) = ? = (n3) 25
Span of No-Temp Multiply cilk void MultA(*C, *A, *B, n) { // C = C + A * B h hbase case & partition matrices i i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; } } M1(n) = ? Span: cilk void MultA(*C, *A, *B, n) { // C = C + A * B h hbase case & partition matrices i i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); sync; spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); sync; return; maximum maximum 2 M1(n/2) + (1) = (n) CASE 1 26
Parallelism of No-Temp Multiply M1(n) = (n3) M1(n) = (n) Work: Span: M1(n) M1(n) = (n2) Parallelism: For 1000 1000 matrices, parallelism (103)3/103 = 106. Faster in practice! 27
Testing Synchronization Cilk language feature: A programmer can check whether a Cilk procedure is synched (without actually performing a sync) by testing the pseudovariableSYNCHED: SYNCHED = 0 ) some spawned children might not have returned. SYNCHED = 1 ) all spawned children have definitely returned. 28
Best of Both Worlds cilk void Mult1(*C, *A, *B, n) {// multiply & store h hbase case & partition matrices i i spawn Mult1(C11,A11,B11,n/2); // multiply & store spawn Mult1(C12,A11,B12,n/2); spawn Mult1(C22,A21,B12,n/2); spawn Mult1(C21,A21,B11,n/2); if (SYNCHED) { spawn MultA1(C11,A12,B21,n/2); // multiply & add spawn MultA1(C12,A12,B22,n/2); spawn MultA1(C22,A22,B22,n/2); spawn MultA1(C21,A22,B21,n/2); } else { float *T = Cilk_alloca(n*n*sizeof(float)); spawn Mult1(T11,A12,B21,n/2); // multiply & store spawn Mult1(T12,A12,B22,n/2); spawn Mult1(T22,A22,B22,n/2); spawn Mult1(T21,A22,B21,n/2); sync; spawn Add(C,T,n); // C = C + T } sync; return; } This code is just as parallel as the original, but it only uses more space if runtime parallelism actually exists. 29
Ordinary Matrix Multiplication cij= k = 1 n aik bkj IDEA: Spawn n2 inner products in parallel. Compute each inner product in parallel. Work: (n3) Span: (lgn) Parallelism: (n3/lgn) BUT, this algorithm exhibits poor locality and does not exploit the cache hierarchy of modern microprocessors, especially CMP s. 30
LECTURE 2 Recurrences (Review) Recurrences (Review) Matrix Multiplication Matrix Multiplication Merge Sort Merge Sort Tableau Construction Tableau Construction Conclusion Conclusion 31
Merging Two Sorted Arrays void Merge(int *C, int *A, int *B, int na, int nb) { while (na>0 && nb>0) { if (*A <= *B) { *C++ = *A++; na--; } else { *C++ = *B++; nb--; } } while (na>0) { *C++ = *A++; na--; } while (nb>0) { *C++ = *B++; nb--; } } Time to merge n elements = ? (n). 3 3 12 12 19 19 46 46 4 4 14 14 21 21 23 23 32
Merge Sort cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } } 14 3 4 12 merge 19 21 33 46 3 12 19 46 4 14 21 33 merge 3 19 12 46 4 33 14 21 merge 19 3 12 46 33 4 21 14 33
Work of Merge Sort cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } } 2T1(n/2) + (n) Work: T1(n) = ? = (n lg n) CASE 2 nlogba = nlog22 = n)f(n) = (nlogba lg0n) . 34
Span of Merge Sort cilk void MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); sync; Merge(B, C, C+n/2, n/2, n-n/2); } } T1(n/2) + (n) CASE 3 = (n) Span: T1(n) = ? nlogba = nlog21 = 1 (n) . 35
Parallelism of Merge Sort T1(n) = (n lg n) T1(n) = (n) Work: PUNY ! Span: T1(n) T1(n) = (lg n) Parallelism: We need to parallelize the merge! 36
Parallel Merge na/2 0 na A A[na/2] A[na/2] Recursive merge Recursive merge Binary search B na nb A[na/2] A[na/2] 0 nb j+1 j KEY IDEA: If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most ? (3/4)n . 37
Parallel Merge cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { spawn P_Merge(C, B, A, nb, na); } else if (na==1) { if (nb == 0) { C[0] = A[0]; } else { C[0] = (A[0]<B[0]) ? A[0] : B[0]; /* minimum */ C[1] = (A[0]<B[0]) ? B[0] : A[0]; /* maximum */ } } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } } Coarsen base cases for efficiency. 38
Span of P_Merge cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } } T1(3n/4) + (lg n) CASE 2 = (lg2n) Span: T1(n) = ? nlogba = nlog4/31 = 1 )f(n) = (nlogba lg1n) . 39
Work of P_Merge cilk void P_Merge(int *C, int *A, int *B, int na, int nb) { if (na < nb) { } else { int ma = na/2; int mb = BinarySearch(A[ma], B, nb); spawn P_Merge(C, A, B, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); sync; } } HA IRY ! T1(n) = ? T1( n) + T1((1 )n) + (lg n), where 1/4 3/4 . CLAIM:T1(n) = (n) . Work: 40
Analysis of Work Recurrence T1(n) = T1( n) + T1((1 )n) + (lg n), where 1/4 3/4 . Substitution method: Inductive hypothesis is T1(k) c1k c2lg k, where c1,c2 > 0. Prove that the relation holds, and solve for c1 and c2. T1(n) = T1( n) + T1((1 )n) + (lg n) c1( n) c2lg( n) + c1((1 )n) c2lg((1 )n) + (lg n) 41
Analysis of Work Recurrence T1(n) = T1( n) + T1((1 )n) + (lg n), where 1/4 3/4 . T1(n) = T1( n) + T1((1 )n) + (lg n) c1( n) c2lg( n) + c1(1 )n c2lg((1 )n) + (lg n) 42
Analysis of Work Recurrence T1(n) = T1( n) + T1((1 )n) + (lg n), where 1/4 3/4 . T1(n) = T1( n) + T1((1 )n) + (lg n) c1( n) c2lg( n) + c1(1 )n c2lg((1 )n) + (lg n) c1n c2lg( n) c2lg((1 )n) + (lg n) c1n c2 ( lg( (1 )) + 2lg n ) + (lg n) c1n c2 lg n (c2(lg n + lg( (1 ))) (lg n)) c1n c2 lg n by choosing c1 and c2 large enough. 43
Parallelism of P_Merge T1(n) = (n) T1(n) = (lg2n) Work: Span: T1(n) T1(n) = (n/lg2n) Parallelism: 44
Parallel Merge Sort cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } } 45
Work of Parallel Merge Sort cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } } T1(n) = 2 T1(n/2) + (n) = (n lg n) Work: CASE 2 46
Span of Parallel Merge Sort cilk void P_MergeSort(int *B, int *A, int n) { if (n==1) { B[0] = A[0]; } else { int *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); sync; spawn P_Merge(B, C, C+n/2, n/2, n-n/2); } } T1(n/2) + (lg2n) CASE 2 = (lg3n) Span: T1(n) = ? nlogba = nlog21 = 1 )f(n) = (nlogba lg2n) . 47
Parallelism of Merge Sort T1(n) = (n lg n) T1(n) = (lg3n) Work: Span: T1(n) T1(n) = (n/lg2n) Parallelism: 48
LECTURE 2 Recurrences (Review) Recurrences (Review) Matrix Multiplication Matrix Multiplication Merge Sort Merge Sort Tableau Construction Tableau Construction Conclusion Conclusion 49
Tableau Construction Problem: Fill in an n n tableau A, where A[i, j] = f(A[i, j 1], A[i 1, j], A[i 1, j 1]). Dynamic programming Longest common subsequence Edit distance Time warping 00 01 02 03 04 05 06 07 10 11 12 13 14 15 16 17 20 21 22 23 24 25 26 27 30 31 32 33 34 35 36 37 40 41 42 43 44 45 46 47 50 51 52 53 54 55 56 57 Work: (n2). 60 61 62 63 64 65 66 67 70 71 72 73 74 75 76 77 50