Understanding Dynamic Programming in the Context of Knapsack and Edit Distance Problems
This content delves into the Knapsack problem, which involves selecting objects to maximize value while staying within a weight limit, and the Edit Distance problem, which focuses on finding the minimal number of edit operations to convert one string to another. Dynamic programming is used to solve these problems efficiently, with detailed explanations, algorithms, and examples provided.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CMPT 706 - Algorithms for Big Data Dynamic Programming March 26, 2020 Dynamic Programming 1-1
The Knapsack problem Approximation Algorithms 1-2
The Knapsack problem Input: A set of n objects each having values {vi}i=1 n and weight {wi}i=1 n A weight limit W. - all wi s and W are positive integers. Output: Select objects of total weight at most W such that the sum of values is maximized. Idea: Suppose we decided on all objects except for the last one. Should we add the last object? Two cases: 1. Take a solution that does not involve the last object. 2. If there is room for the last object, i.e., total weight is less than W-wn, then we can add it to the solution this adds value vn . Dynamic Programming 1-3
The Knapsack problem What is the runtime of the algorithm? Input: A set of n objects each having values {vi}i=1 n and weight {wi}i=1 n A weight limit W - all wi s and W are positive integers Output: Select objects of total weight at most W such that the sum of values is maximized. n*W iterations O(1) steps in each iteration Algorithm: 1. Create a matrix of dimensions n x W M[ i, w ] will have the optimal value up to weight w using items 1 i 2. For w = 1 W Set M[ 0 , w ] = 0 // no items used 3. For i=1 n For each w = 1 W Set M[ i , w ] = max ( M[i-1, w], M[i-1, w - wi ] + vi ) 4. Return M[n ,W] Total runtime is O(n W) Dynamic Programming 1-4
The Knapsack problem Example: Let the values be 1,3,4,2, the weights 1,1,3,2, and W = 5 W 0 1 2 3 4 5 i 0 0 0 0 0 0 0 1 0 1 1 1 1 1 (w=1,v=1) 2 0 3 4 4 4 4 (w=1,v=3) 3 0 3 4 4 7 8 (w=3,v=4) 4 0 3 4 5 7 8 (w=2,v=2) Dynamic Programming 1-5
The Edit Distance Problem Approximation Algorithms 1-6
The Edit Distance Problem Input: Two strings X and Y Goal: find smallest number of edit operations required to convert X to Y. Examples: X = G T A G C G G C G Y = G T A A C G G C G X = G T A G C G G C G Y = G T A A C G G C G One mismatch Gap in X / Insertion in Y X = G T A G C G C G Y = G T A G C G G C G X = G T A G C G C G Y = G T A G C G G C G Gap in Y / Insertion in X X = G T A G C G C G Y = G T G C G C G X = G T A G C G C G Y = G T G C G C G Dynamic Programming 1-7
The Edit Distance Problem Input: Two strings X and Y Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Another example: X = G C G T A T G A G G C T A A C G C Y = G C T A T G C G G C T A T A C G C Optimal alignment X = G C G T A T G A G G C T A A C G C Y = G C T A T G C G G C T A T A C G C Dynamic Programming 1-8
The Hamming Distance A related (simpler) notion: Two strings X and Y of the same length Goal: smallest number of mismatches between X to Y. Examples: X = G T A G C G G C G G A C Y = G T A A C G G C G G A T X = S T R O N G E R Y = S T R E N G T H Dynamic Programming 1-9
The Edit Distance Problem Claim: For two strings X, Y of equal length EditDist(X, Y) <= HammingDist(X, Y) Proof: can use substitution only. What about the other direction? Is there any non-trivial relation? Example: X = ababababababababab Y = bababababababababa Q: What is the edit distance between X and Y? A: 2 - we can remove a from the beginning of X and add a to the end. Q: What is the Hamming distance between X and Y? - n
The Edit Distance Problem Input: Two strings X and Y Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Operations are: R replace I insert into X D delete from X X = G C G T A T G A G G C T A A C G C Y = G C T A T G C G G C T A T A C G C delete replace insert Dynamic Programming 1-11
The Edit Distance Problem Input: Two strings X of length n, and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Algorithm: Let s consider the substrings X[1 i] and Y[1 j]. Suppose that (by recursion) we can solve the shorter subproblems. We have 4 options to handle the last symbols if X[i] = Y[j] use the optimal alignment for X[1 i-1] and Y[1 j-1] Otherwise Set X[i] to be Y[j] Delete X[i] from X Append Y[j] to X Dynamic Programming 1-12
The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Idea: Define a matrix D[0 n,0 .m] 1. Set D[i,0] = i for all i=0 n 2. Set D[0,j] = j for all j=0 m 3. For i,j both non-zeros define ( D[ i-1 , j ] + 1 // Delete X[i] D[i,j] = min ( D[ i, j-1 ] + 1 // Insert Y[j] after X[i] ( D[ i-1 , j-1 ] + 1(X[i] Y[j]) // Match or Mismatch | 1 if X[i] Y[j] Here 1(X[i] Y[j]) = < | 0 if X[i] = Y[j] Dynamic Programming 1-13
The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Example: [doing online] X = BLOCK Y = BOOK: A solution: (1) remove L and (2) replace C with O. i B L O C K j 0 1 2 3 4 5 B 1 Min(2,2,0) = 0 Min(3,1,2) = 1 Min(4,2,3) = 2 Min(5,3,4) = 3 Min(6,4,5) = 4 O 2 Min(1,3,2) = 1 Min(2,2,1) = 1 Min(3,2,1) = 1 Min(4,2,3) = 2 Min(5,3,4) = 3 O 3 Min(2,4,3) = 2 Min(2,3,2) = 2 Min(2,3,1) = 1 Min(3,2,2) = 2 Min(4,3,3) = 3 K 4 Min(3,5,4) =3 Dynamic Programming Min(3,4,3) = 3 Min(2,4,3) = 2 Min(3,3,2) = 2 Min(4,3,2) = 2 1-14
The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. What is the runtime of the algorithm? Dist(X,Y): |X| = n, |Y| = m 1. If n = 0, return m 2. If m = 0, return n 3. Otherwise compute d1 = Dist( X[1 n-1] , Y[1 m] ) + 1 compute d2 = Dist( X[1 n] , Y[1 m-1] ) + 1 compute d3 = Dist( X[1 n-1] , Y[1 m-1] ) + 1(X[i] Y[j]) return min(d1, d2, d3) Let s say m=n Then T(n) >= 3T(n-1) T(n) = 3T(n-1) =32*T(n-2) =33*T(n-3) =310*T(n-10) =3n-1*T(1) = O(3n) Dynamic Programming 1-15
The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find the edit distance between X and Y What is the runtime? Dynamix programming approach: |X| = n, |Y| = m Define a matrix D[0 n,0 .m] 1. Set D[i,0] = i for all i=0 n 2. Set D[0,j] = j for all j=0 m 3. For i=1 n and for j=1 m D[i,j] = min( D[ i-1 , j ] + 1 // Delete X[i] 4. Return D[n,m] n*m iterations in step 2 O(1) steps in each iteration Q: How do you find the optimal sequence of operations that turns X into Y D[ i, j-1 ] + 1 // Insert Y[j] after X[i] D[ i-1 , j-1 ] + 1(X[i] Y[j]) // Match or Mismatch Dynamic Programming 1-16
The Edit Distance Problem An alternative view: We can think about the matrix and a shortest path problem on the weighted grid graph from D[0,0] to D[n,m]. i B L O C K j B O O K On horizontal/vertical edges the weight is 1. On the diagonal edges the weight is either 0 (if match) or 1 (if mismatch) Dynamic Programming 1-17
The Edit Distance Problem Weighted variant: We can also consider the problem where different operations have different costs. For example, mismatch-cost(A, E) < mismatch-cost(B, Q) Inserting/deleting a letter more than mismatch Homework: solve the weighted variant of EditDistance. i B L O C K j 0 1 2 3 4 5 B 1 O 2 O 3 K 4 Dynamic Programming 1-18
Reading for next time Exercises from the Book: 6.7, 6.8, 6.11 Huffman Code 1-19