Dynamic Programming in the Context of Knapsack and Edit Distance Problems

 
CMPT 706 - Algorithms for Big Data
 
Dynamic Programming
March 26, 2020
 
Dynamic Programming
 
1-1
 
The Knapsack problem
 
 
Approximation Algorithms
 
1-2
The Knapsack problem
Dynamic Programming
1-3
 
Input
: A set of n objects each having values {v
i
}
i=1…n
 and weight {w
i
}
i=1…n
A weight limit W. - all w
i
’s and W are positive integers.
Output
: Select objects of total weight at most W such that the sum of
values is maximized.
 
Idea
: Suppose we decided on all objects except for the last one.
Should we add the last object?
 
Two cases:
1.
Take a solution that does not involve the last object.
2.
If there is room for the last object, i.e., total weight is less than W-w
n
,
then we can add it to the solution– this adds value v
n
 .
The Knapsack problem
Dynamic Programming
1-4
 
Input
: A set of n objects each having values {v
i
}
i=1…n
 and weight {w
i
}
i=1…n
A weight limit W  - all w
i
’s and W are positive integers
Output
: Select objects of total weight at most W such that the sum of
values is maximized.
 
Algorithm
:
1.
Create a matrix of dimensions n x W
 
M[ i, w ] will have the optimal value up to weight w using items 1…i
2.
For w = 1…W
 
Set M[ 0 , w ] = 0 // no items used
3.
For i=1…n
 
For each w = 1…
W
  
Set M[ i , w ] = max ( M[i-1, w], M[i-1, w - w
i
 ] + v
i
 )
4.
Return M[n ,W]
What is the runtime of the algorithm?
n*W iterations
O(1) steps in each iteration
Total runtime is O(n W)
The Knapsack problem
Dynamic Programming
1-5
 
Example
:
Let the values be 1,3,4,2,   the weights  1,1,3,2,    and W = 5
 
The Edit Distance Problem
 
 
Approximation Algorithms
 
1-6
The Edit Distance Problem
Dynamic Programming
1-7
 
Input
: Two strings X and Y
Goal
: find smallest number of 
edit operations
 required to convert X to Y.
 
Examples
:
 
 
One mismatch
X = G T A G C G G C G
Y = G T A A C G G C G
X = G T A G C G C G
Y = G T A 
G C G 
G C G
Gap in X /
Insertion in Y
X = G T A G C 
 G C G
Y = G T A G C 
G
 G C G
X = G T A 
G
 C G G C G
Y = G T A 
A
 C G G C G
X = G T A G C G C G
Y = G T 
G C G
 C G
Gap in Y /
Insertion in X
X = G T 
A
 G C G C G
Y = G T 
G C G C G
The Edit Distance Problem
Dynamic Programming
1-8
 
Input
: Two strings X and Y
Goal
: Find an optimal alignment of X and Y, i.e. smallest number of 
edit
operations
 required to convert X to Y.
Another example
:
 
 
Optimal alignment
X = 
G C G T A T G A G G C T A A C G C
Y = 
G C T A T G C G G C T A T A C G C
X = G C 
G
 T A T G 
A
 G G C T A 
 A C G C
Y = G C 
 T A T G 
C
 G G C T A 
T
 A C G C
The Hamming Distance
Dynamic Programming
1-9
 
A related (simpler) notion
: Two strings X and Y of the same length
Goal
: smallest number of 
mismatches 
between X to Y.
 
Examples
:
 
 
X = G T A 
G
 C G G C G G A 
C
Y = G T A 
A
 C G G C G G A 
T
X = S T R 
O
 N G 
E R
Y = S T R 
E 
 N G 
T H
The Edit Distance Problem
 
Claim
: For two strings X, Y of equal length
 
 EditDist(X, Y) <= HammingDist(X, Y)
Proof
: can use substitution only.
 
What about the other direction? Is there any non-trivial relation?
 
Example
:
X = ababababababababab
Y = bababababababababa
Q
: What is the edit distance between X and Y?
A
: 2 - we can remove a from the beginning of X and add a to the end.
Q
: What is the Hamming distance between X and Y? -  n
The Edit Distance Problem
Dynamic Programming
1-11
 
Input
: Two strings X and Y
Goal
: Find an optimal alignment of X and Y, i.e. smallest number of 
edit
operations
 required to convert X to Y.
 
Operations are
:
R – replace
I – insert into X
D – delete from X
 
 
X = G C 
G
 T A T G 
A
 G G C T A 
 A C G C
Y = G C 
 T A T G 
C
 G G C T A 
T
 A C G C
The Edit Distance Problem
Dynamic Programming
1-12
 
Input
: Two strings X of length n, and Y of length m
Goal
: Find an optimal alignment of X and Y, i.e. smallest number of 
edit
operations
 required to convert X to Y.
 
Algorithm
:
Let’s consider the substrings X[1…i] and Y[1…j].
Suppose that (by recursion) we can solve the shorter subproblems.
We have 4 options to handle the last symbols
if X[i] = Y[j] 
 
use the optimal alignment for X[1…i-1] and Y[1…j-1]
Otherwise
Set X[i] to be Y[j]
Delete X[i] from X
Append Y[j] to X
The Edit Distance Problem
Dynamic Programming
1-13
 
Input
: Two strings: X of length n and Y of length m
Goal
: Find an optimal alignment of X and Y, i.e. smallest number of 
edit
operations
 required to convert X to Y.
Idea
:
Define a matrix D[0…n,0….m]
1.
Set D[i,0] = i for all i=0…n
2.
Set D[0,j] = j for all j=0…m
3.
For i,j both non-zeros define
  
          (
 
D[ i-1 , j ] + 1 
 
// Delete X[i]
 
 D[i,j] = min (
 
D[ i, j-1 ] + 1 
 
// Insert Y[j] after X[i]
  
          (
 
D[ i-1 , j-1 ] + 1
(X[i]
 Y[j]) 
// Match or Mismatch
  
 |   1 if X[i]
 Y[j]
Here  1
(X[i]
 Y[j])
 =  
 
<
 
  
 |   0 if X[i]
 =
 Y[j]
The Edit Distance Problem
Dynamic Programming
1-14
 
Input
: Two strings: X of length n and Y of length m
Goal
: Find an optimal alignment of X and Y, i.e. smallest number of
edit operations
 required to convert X to Y.
Example
: [doing online]
X = BLOCK   Y = BOOK:  A solution: (1) remove L and (2) replace C with O.
The Edit Distance Problem
Dynamic Programming
1-15
 
Input
: Two strings: X of length n and Y of length m
Goal
: Find an optimal alignment of X and Y, i.e. smallest number of 
edit
operations
 required to convert X to Y.
 
Dist(X,Y)
: |X| = n, |Y| = m
1.
If n = 0, return m
2.
If m = 0, return n
3.
Otherwise
 
 
compute d
1
 = Dist( X[1…n-1] , Y[1…m] ) + 1
 
 
compute d
2
 = Dist( X[1…n] , Y[1…m-1] ) + 1
 
 
compute d
3
 = Dist( X[1…n-1] , Y[1…m-1] ) + 1
(X[i]
 Y[j])
 
return min(d
1
, d
2
, d
3
)
What is the runtime of the algorithm?
Let’s say m=n
Then T(n) >= 3T(n-1)
T(n) = 3T(n-1)
=3
2
*T(n-2)
=3
3
*T(n-3)…
=3
10
*T(n-10)…
=3
n-1
*T(1) = O(3
n
)
The Edit Distance Problem
Dynamic Programming
1-16
 
Input
: Two strings: X of length n and Y of length m
Goal
: Find the edit distance between X and Y
 
Dynamix programming approach
: |X| = n, |Y| = m
Define a matrix D[0…n,0….m]
1.
Set D[i,0] = i for all i=0…n
2.
Set D[0,j] = j for all j=0…m
3.
For i=1…n and for j=1…m
 
  
D[i,j] = min( 
 
D[ i-1 , j ] + 1 
 
// Delete X[i]
    
D[ i, j-1 ] + 1 
 
// Insert Y[j] after X[i]
    
D[ i-1 , j-1 ] + 1
(X[i]
 Y[j]) 
// Match or Mismatch
4.
Return D[n,m]
What is the runtime?
n*m iterations in step 2
O(1) steps in each iteration
Q
: How do you find the optimal sequence
of operations that turns X into Y
The Edit Distance Problem
Dynamic Programming
1-17
 
An alternative view
: We can think about the matrix and a shortest path
problem on the weighted grid graph from D[0,0] to D[n,m].
 
 
 
 
 
 
 
On horizontal/vertical edges the weight is 1.
On the diagonal edges the weight is either 0 (if match) or 1 (if mismatch)
The Edit Distance Problem
Dynamic Programming
1-18
 
Weighted variant
: We can also consider the problem where different
operations have different costs.
For example, mismatch-cost(A, E) < mismatch-cost(B, Q)
Inserting/deleting a letter more than mismatch
Homework
: solve the weighted variant of EditDistance.
 
Reading for next time
 
Exercises from the Book:
 
6.7, 6.8, 6.11
 
Huffman Code
 
1-19
Slide Note
Embed
Share

This content delves into the Knapsack problem, which involves selecting objects to maximize value while staying within a weight limit, and the Edit Distance problem, which focuses on finding the minimal number of edit operations to convert one string to another. Dynamic programming is used to solve these problems efficiently, with detailed explanations, algorithms, and examples provided.

  • Dynamic Programming
  • Knapsack Problem
  • Edit Distance
  • Algorithms
  • Optimization

Uploaded on Sep 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CMPT 706 - Algorithms for Big Data Dynamic Programming March 26, 2020 Dynamic Programming 1-1

  2. The Knapsack problem Approximation Algorithms 1-2

  3. The Knapsack problem Input: A set of n objects each having values {vi}i=1 n and weight {wi}i=1 n A weight limit W. - all wi s and W are positive integers. Output: Select objects of total weight at most W such that the sum of values is maximized. Idea: Suppose we decided on all objects except for the last one. Should we add the last object? Two cases: 1. Take a solution that does not involve the last object. 2. If there is room for the last object, i.e., total weight is less than W-wn, then we can add it to the solution this adds value vn . Dynamic Programming 1-3

  4. The Knapsack problem What is the runtime of the algorithm? Input: A set of n objects each having values {vi}i=1 n and weight {wi}i=1 n A weight limit W - all wi s and W are positive integers Output: Select objects of total weight at most W such that the sum of values is maximized. n*W iterations O(1) steps in each iteration Algorithm: 1. Create a matrix of dimensions n x W M[ i, w ] will have the optimal value up to weight w using items 1 i 2. For w = 1 W Set M[ 0 , w ] = 0 // no items used 3. For i=1 n For each w = 1 W Set M[ i , w ] = max ( M[i-1, w], M[i-1, w - wi ] + vi ) 4. Return M[n ,W] Total runtime is O(n W) Dynamic Programming 1-4

  5. The Knapsack problem Example: Let the values be 1,3,4,2, the weights 1,1,3,2, and W = 5 W 0 1 2 3 4 5 i 0 0 0 0 0 0 0 1 0 1 1 1 1 1 (w=1,v=1) 2 0 3 4 4 4 4 (w=1,v=3) 3 0 3 4 4 7 8 (w=3,v=4) 4 0 3 4 5 7 8 (w=2,v=2) Dynamic Programming 1-5

  6. The Edit Distance Problem Approximation Algorithms 1-6

  7. The Edit Distance Problem Input: Two strings X and Y Goal: find smallest number of edit operations required to convert X to Y. Examples: X = G T A G C G G C G Y = G T A A C G G C G X = G T A G C G G C G Y = G T A A C G G C G One mismatch Gap in X / Insertion in Y X = G T A G C G C G Y = G T A G C G G C G X = G T A G C G C G Y = G T A G C G G C G Gap in Y / Insertion in X X = G T A G C G C G Y = G T G C G C G X = G T A G C G C G Y = G T G C G C G Dynamic Programming 1-7

  8. The Edit Distance Problem Input: Two strings X and Y Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Another example: X = G C G T A T G A G G C T A A C G C Y = G C T A T G C G G C T A T A C G C Optimal alignment X = G C G T A T G A G G C T A A C G C Y = G C T A T G C G G C T A T A C G C Dynamic Programming 1-8

  9. The Hamming Distance A related (simpler) notion: Two strings X and Y of the same length Goal: smallest number of mismatches between X to Y. Examples: X = G T A G C G G C G G A C Y = G T A A C G G C G G A T X = S T R O N G E R Y = S T R E N G T H Dynamic Programming 1-9

  10. The Edit Distance Problem Claim: For two strings X, Y of equal length EditDist(X, Y) <= HammingDist(X, Y) Proof: can use substitution only. What about the other direction? Is there any non-trivial relation? Example: X = ababababababababab Y = bababababababababa Q: What is the edit distance between X and Y? A: 2 - we can remove a from the beginning of X and add a to the end. Q: What is the Hamming distance between X and Y? - n

  11. The Edit Distance Problem Input: Two strings X and Y Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Operations are: R replace I insert into X D delete from X X = G C G T A T G A G G C T A A C G C Y = G C T A T G C G G C T A T A C G C delete replace insert Dynamic Programming 1-11

  12. The Edit Distance Problem Input: Two strings X of length n, and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Algorithm: Let s consider the substrings X[1 i] and Y[1 j]. Suppose that (by recursion) we can solve the shorter subproblems. We have 4 options to handle the last symbols if X[i] = Y[j] use the optimal alignment for X[1 i-1] and Y[1 j-1] Otherwise Set X[i] to be Y[j] Delete X[i] from X Append Y[j] to X Dynamic Programming 1-12

  13. The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Idea: Define a matrix D[0 n,0 .m] 1. Set D[i,0] = i for all i=0 n 2. Set D[0,j] = j for all j=0 m 3. For i,j both non-zeros define ( D[ i-1 , j ] + 1 // Delete X[i] D[i,j] = min ( D[ i, j-1 ] + 1 // Insert Y[j] after X[i] ( D[ i-1 , j-1 ] + 1(X[i] Y[j]) // Match or Mismatch | 1 if X[i] Y[j] Here 1(X[i] Y[j]) = < | 0 if X[i] = Y[j] Dynamic Programming 1-13

  14. The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. Example: [doing online] X = BLOCK Y = BOOK: A solution: (1) remove L and (2) replace C with O. i B L O C K j 0 1 2 3 4 5 B 1 Min(2,2,0) = 0 Min(3,1,2) = 1 Min(4,2,3) = 2 Min(5,3,4) = 3 Min(6,4,5) = 4 O 2 Min(1,3,2) = 1 Min(2,2,1) = 1 Min(3,2,1) = 1 Min(4,2,3) = 2 Min(5,3,4) = 3 O 3 Min(2,4,3) = 2 Min(2,3,2) = 2 Min(2,3,1) = 1 Min(3,2,2) = 2 Min(4,3,3) = 3 K 4 Min(3,5,4) =3 Dynamic Programming Min(3,4,3) = 3 Min(2,4,3) = 2 Min(3,3,2) = 2 Min(4,3,2) = 2 1-14

  15. The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find an optimal alignment of X and Y, i.e. smallest number of edit operations required to convert X to Y. What is the runtime of the algorithm? Dist(X,Y): |X| = n, |Y| = m 1. If n = 0, return m 2. If m = 0, return n 3. Otherwise compute d1 = Dist( X[1 n-1] , Y[1 m] ) + 1 compute d2 = Dist( X[1 n] , Y[1 m-1] ) + 1 compute d3 = Dist( X[1 n-1] , Y[1 m-1] ) + 1(X[i] Y[j]) return min(d1, d2, d3) Let s say m=n Then T(n) >= 3T(n-1) T(n) = 3T(n-1) =32*T(n-2) =33*T(n-3) =310*T(n-10) =3n-1*T(1) = O(3n) Dynamic Programming 1-15

  16. The Edit Distance Problem Input: Two strings: X of length n and Y of length m Goal: Find the edit distance between X and Y What is the runtime? Dynamix programming approach: |X| = n, |Y| = m Define a matrix D[0 n,0 .m] 1. Set D[i,0] = i for all i=0 n 2. Set D[0,j] = j for all j=0 m 3. For i=1 n and for j=1 m D[i,j] = min( D[ i-1 , j ] + 1 // Delete X[i] 4. Return D[n,m] n*m iterations in step 2 O(1) steps in each iteration Q: How do you find the optimal sequence of operations that turns X into Y D[ i, j-1 ] + 1 // Insert Y[j] after X[i] D[ i-1 , j-1 ] + 1(X[i] Y[j]) // Match or Mismatch Dynamic Programming 1-16

  17. The Edit Distance Problem An alternative view: We can think about the matrix and a shortest path problem on the weighted grid graph from D[0,0] to D[n,m]. i B L O C K j B O O K On horizontal/vertical edges the weight is 1. On the diagonal edges the weight is either 0 (if match) or 1 (if mismatch) Dynamic Programming 1-17

  18. The Edit Distance Problem Weighted variant: We can also consider the problem where different operations have different costs. For example, mismatch-cost(A, E) < mismatch-cost(B, Q) Inserting/deleting a letter more than mismatch Homework: solve the weighted variant of EditDistance. i B L O C K j 0 1 2 3 4 5 B 1 O 2 O 3 K 4 Dynamic Programming 1-18

  19. Reading for next time Exercises from the Book: 6.7, 6.8, 6.11 Huffman Code 1-19

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#