Dynamic Programming in Various Algorithms
Jeremy Lin has invented a time machine that predicts $GME stock prices for upcoming days. The challenge is to determine the best trading strategy given the price predictions and constraints on the number of trades allowed. Alongside, there are announcements related to academic activities like homework posts, midterms grading, and upcoming sessions. The content also covers algorithmic problems like the Knapsack Problem and Stronger Dynamic Programming. It emphasizes efficient coding practices, use of pseudo-code, and specific guidelines for algorithm descriptions. Additionally, it touches upon topics like RNA secondary structure and sequence alignment in dynamic programming.
- Dynamic Programming
- Algorithmic Problems
- Stock Trading
- Academic Announcements
- Algorithmic Optimization
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Quiz Jeremy Lin has created a time machine. Now, he knows exactly the price of $GME for the next ? days, which are ?1,?2, ,??. Suppose Jeremy can only trade $GME for at most ? times. So, what is the best trading? Let ??,?be the network at ?-th day using ? trades. ?? ?? ??,?= max ?? 1,?,max ??,? 1 . ?<? This solution technically is wrong. What is the mistake? I omitted initial cases. 1
CSE 421 Dynamic Programming RNA, Sequence Alignment Yin Tat Lee 2
Announcement HW5 will be posted tonight. Sorry for the late. Swati OH is moved to Sunday (virtual). See website. Midterm is graded. Check your score on Canvas. Come to any OH for any grading mistakes. Come to my OH to get back the midterm (for in-person midterm) I will post the midterm solution tonight. 3
Midterm Here is the statistics percentile Q1 Q2 Q3 Q4 Q5 Total 25% 12 (75%) 17.5 (72%) 6.5 (65%) 3 (20%) 18 (72%) 66 (73%) 50% 14 (88%) 21 (87.5%) 9 (90%) 8 (53%) 24 (96%) 72.8 (81%) 75% 16 (100%) 21 (87.5%) 10 (100%) 14.5 (97%) 25 (100%) 80.8 (90%) Q4 is the hardest one. It is modified from some problem in a programming contest. If you get >= 50/90 in this midterm, you are on track for a 3.4 (depending on your homework) 4
Midterm ? 2log2? is not ?(?3). Note that ? 2log2?= ?1+log ? which is not even polynomial. Some write ?log24. Please write ?2 instead. Don t write a large ambiguous paragraph to describe your algo. Use pseudo code instead. We will deduct point in final. MST takes ? ?log? or ?(? + ?log?), but not ?(?log?). Q5, you don t need to do induction. 5
Knapsack Problem Given ?objects and a "knapsack. Item ? weighs ??> 0 kilograms and has value ??> 0. Knapsack has capacity of ? kilograms. Goal: fill knapsack so as to maximize total value. Item 1 2 Value Weight Ex: OPT is { 3, 4 } with value 40. 1 6 1 2 W = 11 3 4 5 18 22 28 5 6 7 Greedy: repeatedly add item with maximum ratio ??/??. Ex: { 5, 2, 1 } achieves only value = 35 greedy not optimal. 6
Stronger DP (Strengthening Hypothesis) Let ???(?,?) = Max value of subsets of items 1, ,? of weight ? Case 1: ???(?,?) selects item ? In this case, ??? ?,? = ??+ ???(? 1,? ??) Take best of the two Case 2: ??? ?,? does not select item ? In this case, ??? ?,? = ???(? 1,?). Therefore, If ? = 0 If ??> ? o.w., 0 ??? ? 1,? max(??? ? 1,? ,??+ ??? ? 1,? ??) ??? ?,? = 7
RNA Secondary Structure RNA: A String B = b1b2 bn over alphabet { A, C, G, U }. Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. C A Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA A A A U G C C G U A A G G U A U U A G A C G C U G C G C G A G C G A U G complementary base pairs: A-U, C-G 9
RNA Secondary Structure (Formal) Secondary structure. A set of pairs ? = { ??,??} that satisfy: [Matching]: ? is a matching. [Valid]: each pair in ? is: ? ?, ? ?, ? ?, or ? ?. [No sharp turns]: The ends of each pair are separated by at least 4 intervening bases. If ??,?? ?, then ? < ? 4. [Non-crossing] If (??,??) and (??,??) are two pairs in ?, then we cannot have ? < ? < ? < ?. Free energy: Usual hypothesis is that an RNA molecule will maximize total free energy. approximate by number of base pairs Goal: Given an RNA molecule B = b1b2 bn, find a secondary structure S that maximizes the number of base pairs. 10
Secondary Structure (Examples) G G G G G G G C U C U G C G C U C A U A U A G U A U A U A base pair A U G U G G C C A U A U G G G C A U A G U U G G C C A U G 4 ok sharp turn crossing 11
DP: First Attempt First attempt. Let ???(?) = maximum number of base pairs in a secondary structure of the substring b1b2 bn. Suppose ?? is matched with ?? in ??? ? . What IH should we use? match bt and bn t n 1 Difficulty: This naturally reduces to two subproblems Finding secondary structure in ?1, ,?? 1, i.e., OPT(t-1) Finding secondary structure in ??+1, ,?? 1, ??? 12
DP: Second Attempt Definition: ??? ?,? = maximum number of base pairs in a secondary structure of the substring ??,??+1, ,?? The most important part of a correct DP; It fixes IH Case 1: If ? ? 4. ??? ?,? = 0 by no-sharp turns condition. Case 2: Base ?? is not involved in a pair. ??? ?,? = ???(?,? 1) Case 3: Base ?? pairs with ?? for some ? ? < ? 4 non-crossing constraint decouples resulting sub-problems ? ?<? 4{ 1 + ???(?,? 1) + ???(? + 1,? 1) } ??? ?,? = max 13
Recursive Code Let M[i,j]=empty for all i,j. Compute-OPT(i,j){ if (j-i <= 4) return 0; if (M[i,j] is empty) M[i,j]=Compute-OPT(i,j-1) for t=i to j-5 do if (??,?? is in {A-U, U-A, C-G, G-C}) M[i,j]=max(M[i,j], 1+Compute-OPT(i,t-1) + Compute-OPT(t+1,j-1)) return M[j] } Does this code terminate? 14
Formal Induction Let ???(?,?) = maximum number of base pairs in a secondary structure of the substring ??,??+1, ,?? Base Case: ???(?,?) = 0 for all ?,? where ? ? 4. IH: For some 4, Suppose we have computed ???(?,?) for all ?,? where ? ? . IS: Goal: We find ???(?,?) for all ?,? where ? ? = + 1. Fix ?,? such that ? ? = + 1. Case 1: Base ?? is not involved in a pair. ??? ?,? = ???(?,? 1) [this we know by IH since ? ? 1 = ] Case 2: Base ?? pairs with ?? for some ? ? < ? 4 ??? ?,? = max ? ?<? 4{ 1 + ???(?,? 1) + ???(? + 1,? 1) } We know by IH since difference 15
Bottom-up DP 4 0 0 0 for = 1, 2, , n-1 for i = 1, 2, , n-1 j = i + if ( <= 4) M[i,j]=0; else M[i,j]=M[i,j-1] for t=i to j-5 do if (??,?? is in {A-U, U-A, C-G, G-C}) M[i,j]=max(M[i,j], 1+ M[i,t-1] + M[t+1,j-1]) 3 0 0 i 2 0 1 6 7 8 9 j return M[1, n] } Running Time: ?(?3) (It is also okay to loop over ?,? or ?,?) 16
Quiz Solution Let ?(?,?,?,?) be true if and only if the numbers ?1,?2, ,?? can be partitioned into three sets whose sums are ?,?,?. If ? > 0, we have ? ?,?,?,? = ? ? 1,? ??,?,? ?? ? ? 1,?,? ??,? ?? ? ? 1,?,?,? ?? If ? = 0, we have ? 0,?,?,? = True if ? = ? = ? = 0, False otherwise. 18
Sequence Alignment (Edit distance)
Word Alignment How similar are two strings? ocurrance occurrence o c u r r a n c e - o c c u r r e n c e 5 mismatches, 1 gap o c - u r r a n c e o c c u r r e n c e 1 mismatch, 1 gap o c - u r r - a n c e o c c u r r e - n c e 0 mismatches, 3 gaps 20
Edit Distance Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] Cost = # of gaps + #mismatches. How to formalize the question. Applications. Basis for Unix diff and Word correct in editors. Speech recognition. Computational biology. C T G A C C T A C C T - C T G A C C T A C C T C C T G A C T A C A T C C T G A C - T A C A T Cost: 5 Cost: 3 21
Sequence Alignment Given two strings ?1, ,?? and ?1, ,?? find an alignment with minimum number of mismatch and gaps. An alignment is a set of ordered pairs (??1,??1), ??2,??2, such that ?1< ?2< and ?1< ?2< Example: CTACCG vs. TACATG. Sol: We aligned x2-y1, x3-y2, x4-y3, x5-y4, x6-y6. x1 x2 x3 x4 x5 x6 C T A C C - G - T A C A T G So, the cost is 3. y1 y2 y3 y4 y5 y6 22
DP for Sequence Alignment Let ???(?,?) be min cost of aligning ?1, ,?? and ?1, ,?? Case 1: OPT matches ??,?? Then, pay mis-match cost if ?? ?? + min cost of aligning ?1, ,?? 1 and ?1, ,?? 1 i.e., ???(? 1,? 1) Case 2: OPT leaves ?? unmatched Then, pay gap cost for ?? + ??? ? 1,? Case 3: OPT leaves ?? unmatched Then, pay gap cost for ?? + ???(?,? 1) 23
Bottom-up DP Sequence-Alignment(m, n, x1x2...xm, y1y2...yn) { for i = 0 to m M[0, i] = i for j = 0 to n M[j, 0] = j for i = 1 to m for j = 1 to n M[i, j] = min( (xi=yj ? 0:1) + M[i-1, j-1], 1 + M[i-1, j], 1 + M[i, j-1]) return M[m, n] } Analysis: (??) time and space. Computational biology: m = n = 1,000,000. 1000 billions ops OK, but 1TB array? 24
M[i, j] = min( (xi=yj ? 0:1) + M[i-1, j-1], 1 + M[i-1, j], 1 + M[i, j-1]) Induction What is the order of induction? (i.e. why there is no loop?) We can do induction on ? + ?. (Alternatively, we can induct on the step of the algorithm) 25
Optimizing Memory We just need to use the last (row) of computed values. Sequence-Alignment(m, n, x1x2...xm, y1y2...yn) { for i = 0 to m M[0, i] = i for j = 0 to n M[j, 0] = j for i = 1 to m for j = 1 to n M[i, j] = min( (xi=yj ? 0:1) + M[i-1, j-1], 1 + M[i-1, j], 1 + M[i, j-1]) return M[m, n] } Just need ? 1,? rows to compute M[i,j] 26
DP with ?(? + ?) memory Keep an Old array containing values of the last row Fill out the new values in a New array Copy new to old at the end of the loop Sequence-Alignment(m, n, x1x2...xm, y1y2...yn) { for i = 0 to m O[i] = i for i = 1 to m N[0]=i for j = 1 to n N[j] = min( (xi=yj ? 0:1) + O[j-1], 1 + O[j], 1 + N[j-1]) for j = 1 to n O[j]=N[j] return N[n] } M[i-1, j-1] M[i-1, j] M[i, j-1] 27
M[i, j] = min( (xi=yj ? 0:1) + M[i-1, j-1], 1 + M[i-1, j], 1 + M[i, j-1]) Shortest Path Edit distance is the distance between (0,0) and (?,?) of the following graph. All horizontal edges has cost 1. All vertical edges has cost 1. The cost of edges from (? 1,? 1) to (?,?) is 1?? ?? The graph is a DAG. Question: How to recover the alignment (or how to find the shortest path) without using ?? space? 28
How to recover the alignment? Idea 1: Suffices to find the point a shortest path pass on the middle row. (m,n) (m/2,j) m/2 Why? Divide and Conquer! (0,0) Idea 2: ?0,0 (?,?)= min ?0,0 (?/2,?)+ ??/2,? (?,?) ? Find(i1,j1,i2,j2) { // Due to spacing, ignored boundary cases Let ? = (??+ ??)/? Compute ???,?? (?,??) for all ? via Sequence-Alignment. Compute ??,? (??,??) for all ? via similar algo run backward. Let ? = argmin????,?? (?,??)+ ??,?? (??,??) ??= Find(i1,j1,k,j) ??= Find(k,j,i2,j2) return ??,?? } 29
Lesson Advantage of a bottom-up DP: It is much easier to optimize the space. By the way, edit distance can be computed in ?(? min ?,? ) if edit distance ? ?2 can be computed in ?( log2?) (1980). can be approximated by log factor in ?(?1+?) (~2010). cannot be solved in ?(?2 ?) exactly (2015). can be approximated by O(1) factor in ?(?2 2/7) (~2018). can be approximated by O(1) factor in ?(?1+?) (~2020). 30
Longest Path in a DAG Goal: Given a DAG G, find the longest path. Recall: A directed graph G is a DAG if it has no cycle. This problem is NP-hard for general directed graphs: - It has the Hamiltonian Path as a special case 2 3 6 5 4 7 1 32
DP for Longest Path in a DAG Q: What is the right ordering? Remember, we have to use that G is a DAG, ideally in defining the ordering We saw that every DAG has a topological sorting So, let s use that as an ordering. 2 3 1 2 3 4 5 6 7 6 5 4 7 1 33
DP for Longest Path in a DAG Suppose we have labelled the vertices such that (?,?) is a directed edge only if ? < ?. 1 2 3 4 5 6 7 Let ???(?) = length of the longest path ending at ? Suppose OPT(j) is ?1,?2, ?2,?3, , ?? 1,??, ??,? , then Obs 1: ?1 ?2 ?? ?. Obs 2: ?1,?2, ?2,?3, , ?? 1,?? is the longest path ending at ??. ??? ? = 1 + ??? ??. 34
DP for Longest Path in a DAG Suppose we have labelled the vertices such that (?,?) is a directed edge only if ? < ?. Let ???(?) = length of the longest path ending at ? 0 1 + If ? is a source o.w. ??? ? = ?: ?,? an edge???(?) max 35
Outputting the Longest Path Let G be a DAG given with a topological sorting: For all edges (?,?) we have i<j. Initialize Parent[j]=-1 for all j. Compute-OPT(j){ if (in-degree(j)==0) return 0 if (M[j]==empty) M[j]=0; for all edges (i,j) if (M[j] < 1+Compute-OPT(i)) M[j]=1+Compute-OPT(i) Parent[j]=i return M[j] } Let M[k] be the maximum of M[1], ,M[n] While (Parent[k]!=-1) Print k k=Parent[k] Record the entry that we used to compute OPT(j) 36