Introduction to Arrays and Recursion in C++
Arrays and recursion play a vital role in designing algorithms on sequences in programming. This introduction covers the implementation of searching in arrays, binary search, merge sort, and the concept of searching in sorted arrays using recursion. The use of recursion helps reduce comparisons and enhances efficiency in searching algorithms for both non-decreasing and non-increasing sorted arrays. The binary search algorithm is demonstrated through code examples, showcasing the effectiveness of dividing the search region in half to optimize the search process.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
An Introduction to Programming though C++ Abhiram G. Ranade Ch. 16: Arrays and Recursion
Arrays and Recursion Recursion is very useful for designing algorithms on sequences Sequences will be stored in arrays Topics Binary Search Merge Sort
Searching an array Input: A: int array of length n, x (called key ) : int Output: true if x is present in A, false otherwise. Natural algorithm: scan through the array and return true if found. for(int i=0; i<n; i++){ if(A[i] == x) return true; } return false; Time consuming: Entire array scanned if the element is not present, Half array scanned on the average if it is present. Can we possibly do all this with fewer operations?
Searching a sorted array sorted array: (non decreasing order) A[0] A[1] A[n-1] sorted array: (non increasing order) A[0] A[1] A[n-1] How do we search in a sorted array (non increasing or non decreasing)? Does the sortedness help in searching?
Searching for x in a non decreasing sorted array A[0..n-1] Key idea for reducing comparisons: First compare x with the middle element A[n/2] of the array. Suppose x < A[n/2]: x is also smaller than A[n/2..n-1], because of sorting x if present will be present only in A[0..n/2-1]. So in the rest of the algorithm we will only search first half of A. Suppose x >= A[n/2]: x if present will be present in A[n/2..n-1] Note: x may be present in first half too, In the rest of the algorithm we will only search second half. How to search the halves ? Recurse!
Plan We will write a function Bsearch which will search a region of an array instead of the entire array. Region: specified using 2 numbers: starting index S, length of region L When L == 1, we are searching a length 1 array. So check if that element, A[S] == x. Otherwise, compare x to the middle element of A[S..S+L-1] Middle element: A[S + L/2] Algorithm is called Binary search , because size of the region to be searched gets roughly halved.
The code bool Bsearch(int A[], int S, int L, int x) // Search for x in A[S..S+L-1] { if(L == 1) return A[S] == x; int H = L/2; if(x < A[S+H]) return Bsearch(A, S, H, x); else return Bsearch(A, S+H, L-H, x); } int main(){ int A[8]={-1, 2, 2, 4, 10, 12, 30, 30}; cout << Bsearch(A,0,8,11) << endl; // searches for 11. }
How does the algorithm execute? A = {-1, 2, 2, 4, 10, 12, 30, 30} First call: Bsearch(A, 0, 8, 11) comparison: 11 < A[0+8/2] = A[4] = 10 Is false. Second call: Bsearch(A, 4, 4, 11) comparison: 11 < A[4+4/2] = A[6] = 30 Is true. Third call: Bsearch(A, 4, 2, 11) comparison: 11 < A[4+2/2] = A[5] = 12 Is true. Fourth call: Bsearch(A, 5, 1, 11) Base case. Return 11 == A[5]. So false.
Proof of correctness 1 Claim: Bsearch(A,S,L,x) returns true Iff x is present in A[S,S+L-1], where 0 <= S,S+L-1 < length of A, where A is an array sorted in non decreasing order. Proof: Induction over L. Base case L = 1. Obvious. Otherwise: L > 1. Algorithm first computes H = L/2. Note that 0 < H < L. If x < A[S+H], then x if present must be in A[S,S+H-1] So algorithm must call Bsearch(A, S, H, x), which it does. The length argument, H, is smaller than L. So by induction call returns correctly. If x A[S+H], then x if present, must be in A[S+H, L]. So algorithm must call Bsearch(A, S+H, L-H, x), which it does. The length argument, L-H, is smaller than L. So by induction call returns correctly. Hence the algorithm will work correctly for all L.
Remarks If you are likely to search an array frequently, it is useful to first sort it. The time to sort the array will be be compensated by the time saved in subsequent searches. How do you sort an array in the first place? Next. Binary search can be written without recursion. Exercise. Even professional programmers make mistakes when writing binary search. Should condition use x <= A[H] or x < A[H]? Need to ensure correct even if length is odd. Precise subranges to be searched and precise lengths to be searched should be exactly correct. Very important to write down precisely what the function does: searches A[S..S+L-1] be careful about -1 etc.
Estimating time taken General idea: standard operations take 1 cycle. Arithmetic, comparison, copying one word address calculation, pointer dereference Convenient idealization. We characterize running time as a function of an agreed upon problem size n : n = Number of keys to be sorted in a sorting problem. n = Size of matrices in matrix multiplication We worry only how the time grows as the problem size increases: e.g. the time is linear in n or is quadratic For large enough problem size, linear e.g. 100n is better than n2/2 Computers deal with large problems If time taken is different for different inputs of the same size, we consider the max time amongst them. We want to claim: No matter what the input is, the time is linear in n
Sorting Selection Sort (Chapter 14) Find smallest in A[0..n-1]. Exchange it with A[0]. Find smallest in A[1..n-1]. Exchange it with A[1]. Selection sort time: we count comparisons (Other operations will take proportional time.) n-1 comparisons to find smallest n-2 comparisons to find second smallest Total n(n-1)/2. About n2 . (Quadratic) Algorithms requiring fewer comparisons are known: About nlog n One such algorithm is Merge sort.
Mergesort idea To sort a long sequence: Break up the sequence into two small sequences. Sort each small sequence. (Recurse!) Somehow merge the sorted sequences into a single long sequence. Hope: merging sorted sequences is easier than sorting the large sequence. Our hope is correct, as we will see soon!
Example Suppose we want to sort the sequence 50, 29, 87, 23, 25, 7, 64 Break it into two sequences. 50, 29, 87, 23 and 25, 7, 64. Sort both We get 23, 29, 50, 87 and 7, 25, 64. Merge Goal is to get 7, 23, 25, 29, 50, 64, 87.
Merge sort void mergesort(int S[], int n){ // Sorts sequence S of length n. if(n==1) return; int U[n/2], V[n-n/2]; // local arrays for(int i=0; i<n/2; i++) U[i]=S[i]; for(int i=0; i<n-n/2; i++) V[i]=S[i+n/2]; mergesort(U,n/2); mergesort(V,n-n/2); // Merge sorted U, V into S. merge(U, n/2, V, n-n/2, S, n); // U, V merge into original array S. }
Merging example U: 23, 29, 50, 87. V: 7, 25, 64. S: The smallest overall must move into S. Smallest overall = smaller of smallest in U and smallest in V. So after movement we get: U: 23, 29, 50, 87. V: -, 25, 64. S: 7.
What do we do next? U: 23, 29, 50, 87. V: -, 25, 64. S: 7. Now we need to move the second smallest into S. Second smallest: smallest in U,V after smallest has moved out. smaller of what is at the front of U, V. So we get: U: -, 29, 50, 87. V: -, 25, 64. S: 7, 23.
General strategy While both U, V contain a number: Move smallest from those at the head of U,V to the end of S. If only U contains numbers: move all to end of S. If only V contains numbers: move all to end of S. uf: index denoting which element of U is currently at the front. U[0..uf-1] have moved out. vf: similarly for V. sb: index denoting where next element should move into S next (sb: back of S) S[0..sb-1] contain elements that have moved in earlier.
Merging two sequences void merge(int U[], int p, int V[], int q, int S[], int n){ // S should receive all elements of U,V, in sorted order. int uf=0, vf=0; // uf, vf : front of u, v for(int sb=0; sb < p + q; sb++){ // sb = back of s //Invariant: s[0..sb-1] contain smallest sb, // u[uf..p-1], v[vf..q-1] contain rest if(uf<p && vf<q){ // both U,V are non empty if(U[uf] < V[vf]){ S[sb] = U[uf]; uf++;} else{ S[sb] = V[vf]; vf++;} } else if(uf < p){ // only U is non empty S[sb] = U[uf]; uf++; } else{ // only V is non empty S[sb] = V[vf]; vf++; } } }
Time Analysis: merging Time required to merge two sequences of length p, q: Loop runs for p+q iterations. In each iteration a fixed number of operations are performed. So time is proportional to p+q. Time proportional to n, if n=p+q.
Time analysis: sorting Ti= maximum time required for mergesort to sort any sequence of length i. T1 c, where c is some constant. Tn dn + 2Tn/2+ en dn : time required to create U,V from S. Tn/2: time to sort sequences of length n/2. Assume n/2 integer. en : upper bound on time to merge sequence of net length n. Tn fn + 2Tn/2for f=d+e Inequality applies to Tn/2also Tn/2 fn/2 + 2Tn/4 Tn fn + 2(fn/2 + 2Tn/4) = 2fn + 4Tn/4 Continuing we get Tn kfn + 2kTn/2k If n=2kor k = log2n: Tn fn log n + nT1= fnlog2n + nc Thus Tn gn log2n for some constant g.
Remarks Mergesort is much faster than selection sort in practice.
The eight queens puzzle Place eight queens on a chess board so that no queen captures another . A queen can capture anything that is in a square exactly to the East, West, North, South, NE, NW, SE, SW. Queens should be in distinct rows, distinct columns, and distinct diagonals Good example of constraint satisfaction problem . Solution uses recursion.
Can we represent the problem mathematically? Frame a question of the form: Find numbers x,y,z... such that they satisfy constraints ... Constraints: equalities, inequalities, It should be possible to interpret the numbers in terms of queen positions on the board.
Mathematical representation 1 For each board square (i,j), let xij encode whether a queen is present or not present in that square. xij= 1 : queen present xij= 0 : queen absent At most one queen in each row i: xi1+ xi2+ ... + xin 1 Similarly for other conditions Solve !
Example: 3x3 board Row conditions: x11+x12+x13 1, x21+x22+x23 1, x31+x32+x33 1 Column conditions: x11+x21+x31 1, x12+x22+x32 1, x13+x23+x33 1 Diagonal conditions: x21+x32 1, x11+x22+x33 1, x12+x23 1 x12+x21 1, x13+x22+x31 1, x23+x32 1 Place 3 queens: x11+x12+x13+x21+x22+x23+x31+x32+x33= 3 0-1 constraint: All xij {0, 1}
Another representation What do we want to find? Positions for 8 queens. Position in 2d space : 2 numbers, (x, y) Are we looking for 16 numbers then? Real or integers? Integers, in the range 1..8 Distinct columns: All xishould be distinct. Distinct rows: All yishould be distinct. Distinct diagonals: For all i,j where i j: |xi- xj| |yi yj|
Other examples and variations on constraint satisfaction problems Find x1, x2, ..., xnsuch that ... is called a constraint satisfaction problem. 8 queens problem is a constraint satisfaction problem. Another example: solving equations: find x such that 3x2+ 4x + 1 = 0. We may in addition have an objective function : of all xisatisfying the constraints, report one that maximizes some given function f(x1, x2, ..., xn) Example of a constraint satisfaction problem with an objective function: finding GCD of x,y. Find r such that r divides x, and r divides y, and f(r)=r is maximum.
Solving constraint satisfaction problems Perform algebraic manipulation and deduce solution. Quadratic equation: factorize... Try all possibilities Works if each variable that we want to solve for has a finite domain 8 queens formulation 1: 64 variables, each either 0 or 1 8 queens formulation 2: 16 variables, each from {1,2,...,8} We construct each candidate solution and check if it satisfies our constraints. If yes, we report it. How to construct all solutions? Aren t there a huge number of them? Number of candidate solutions for 8 queens formulation 1: 264 Number of candidate solutions for 8 queens formulation 2: 816 816= 248< 264 Can we reduce this number further?
Formulation 3 No column can hold more than 1 queen; else there will be captures. We want 8 queens, need at least 1 queen in each of the 8 columns So place exactly 1 queen in each column. New representation: Let yi= row position of queen in column i. What conditions should y1, y2, ..., y8satisfy? Distinct columns condition: Automatically satisfied Distinct rows condition: yishould be distinct. Distinct diagonals condition: For all i,j, i j : |yi- yj| |i-j| Size of search space: 88, much smaller than previous 816or 264.
A program for 4 queens int y[4]; // y[i]: row position of column i queen // For all ways of placing all queens: for(y[0] = 0; y[0] < 4; y[0]++){ for(y[1] = 0; y[1] < 4; y[1]++){ for(y[2] = 0; y[2] < 4; y[2]++){ for(y[3] = 0; y[3] < 4; y[3]++){ if(!capture(y,4)) cout <<y[0]<<y[1]<<y[2]<<y[3]<<endl; } } } }
Function to check for capture bool capture(int y[], int n){ // Decides whether any queen captures any // other. n = board size. // check for all pairs j>k for(int j=1; j<n; j++){ for(int k=0; k<j; k++){ if((y[j] == y[k]) || (abs(j-k) == abs(y[j]-y[k])) return true; } } return false; } // Loop invariant?
Will the same idea work for any n? Many programming languages will not allow you to nest more than a certain number of loops. In other constraint satisfaction problem, the number of variables to be selected could be very large, making it difficult to do so much nesting Recursion comes to our rescue! We will write a recursive program which will not have much nesting but will have the same effect as writing a program with nesting.
A different view of searching through the candidate configurations S = Set of all possible ways ( configurations ) to place queens, one queen per column. = All possible ways to assign values to y0,y1, ,y7 Suppose n = 3. S has 27 elements: {000, 001, 002, 010, 011, 012, 020, 222} Algorithm outline for n = 3 : Store first configuration in y[0..2], then call capture. Store second configuration in y[0..2], then call capture. Store 27thconfiguration in y[0..2], then call capture. Searching the set S of configurations
How to search S Observation: S = S0 S1 Sn-1, where Si= set of configurations in which queen in column 0 is in row i, and other queens anywhere. 3 Queens: S0 = {000,001,002,010,011,012,020,021,022} Searching S = searching S0, ,Sn-1. But Siis also a union of smaller sets (recursion!): Si= Si0 Si1 Si,n-1, where Sij= set of configurations in which queen in column 0 is in row i, and queen in column 1 in row j, and other queens anywhere. What is S02for the 3 queen problem? {020, 021, 022}
General case Notational change: We will write S(x) rather than Sx. S(i0,i1, ,ik-1) : Configurations with queens in first k columns in rows ijfor j=0..k-1. S(i0,i1, ,ik-1) = S(i0,i1, ,ik-1,0) S(i0,i1, ,ik-1,1) ... S(i0,i1, ,ik-1,n-1)
How to search S (contd.) void search(int n, int y[], int k){ // n = number of queens, also length of array y. // Function searches subspace S(y[0],y[1],...y[k-1]) of all candidate // positions and of these prints those in which there is no capture. if(k == n){ // base case if(!capture(y,k)){ for(int j=0; j<k; j++) cout << y[j]; cout << endl; } } else{ // Search S(y[0],...y[k-1]) recursively for(int j=0; j<n; j++){ y[k] = j; // red decomposition given earlier search(n, y, k+1); } } }
How to search S (contd) The function search has an important post condition: it does not modify y[0..k-1]. Only because of this can we merely set y[k] = j before recursion. The main program is natural. int main(){ const int n=8; int y[n]; search(n, y, 0); }
Recursion tree for n=3 Search(3,[-,-,-],0) Search(3,[0,-,-],1) Search(3,[1,-,-],1) Search(3,[2,-,-],1) . . . . . . . . . Search(3,[1,0,-],2) Search(3,[1,2,-],2) . . . . . . . . . Search(3,[1,1,-],2) . . . . . . . . . Search(3,[1,2,0],3) . . . . . . . . . . . . . . .
An improvement: Early check S(i0,i1, ,ik-1) : Set of configurations of k queens such that queen in column j is in row ij for j=0..k-1. If any of the first k queens capture each other, we need not worry about the remaining n-k queens, clearly there is no non capturing configuration in this Subspace. Key idea: whenever we place the kth queen, we first check if it captures the previous queens. Only then do we bother to explore the set of configurations further.
Checking if the kth queen captures any previous queens bool lastCaptures(int y[], int k){ // check whether queen in column k // captures any in columns 0..k-1. for(int j=0; j<k; j++){ if((y[j] == y[k]) || (abs(j-k) == abs(y[j]-y[k])) return true; } return false; }
Search function and main program void search(int n, int y[], int k){ if(k == n){ for(int j=0; j<k; j++) cout << y[j]; cout << endl; } else { for(int j=0; j<n; j++){ y[k] = j; if(!lastCaptures(y,k)) search(n,y,k); } } } // Precisely state what this function expects and does. // main program: // as before.
Remarks Recursion is very powerful even with arrays. Idea of mergesort: divide input into parts, sort each part, then combine, is called divide- conquer-combine . Divide-conquer-combine is useful for other problems besides sorting. Try out all possibilities works for many problems, see problems at the end of the chapter. Early condition checking also works quite often.
Exercise 1 Suppose the comparison in our binary search code used <= rather than <. Give an instance (set of input values) for which the algorithm will produce a wrong answer. Where would the proof not work?
Exercise 2 Suppose the binary search code returns the index of the last element it examines, rather than a Boolean value. What value would this be, assuming (1) the element x being searched is not present in the array, (2) there is exactly one occurrence of x in the array, (3) there are multiple occurrences of x?
Exercise 3 Instead of specifying the subarray to be searched by giving the starting index and the length, you might give the starting and ending indices. Write the code using this, and prove its correctness.
Exercise 4 Suppose I represent a set of integers using an array that holds the integers in sorted order. Given two such arrays representing two sets, give an algorithm that prints their union. (The common elements must be printed just once.) Your algorithm should run in time proportional to the size of the union. Suppose the elements were stored without sorting. How many comparisons do you think the natural algorithm would perform? Does sorting help?
Exercise 5 A seller receives bids for an item from n buyers. The seller examines the bids and sells to the highest bidder. Each buyer b bids Hb if she is happy, and Sb if she is sad. Buyers are happy/sad with equal probability, and independently of each other. Write a program that reads in n and the values Hb and Sb for each buyer and prints the expected value of the winning bid. Your program should do this in two ways, and print the two answers on separate lines. 1. The probability space has 2n points, corresponding to the different ways in which the buyers can be happy or sad. Your program should go over each point and add up the contribution of different points to the expectation. This should be done using a recursive function, similar to what is used in n queens. 2. What is the probability that the highest in all the numbers H, S actually becomes the winning bid? What can you say going down the sorted order of bids? Develop this idea into a very fast algorithm and code that. For sorting, use the standard library function as discussed in Section 22.3.2. To sort an array of structures, you should use a lambda expression as an extra argument, as shown at the top of page 321. It is shown for vectors, but is the same for arrays. Lambda expressions are discussed earlier in the book, and you may find it interesting to understand them fully.
Exercise 6 The basic idea of binary search can be used even when the values over which to search are not given explicitly. Here is an example. The input consists of numbers x1,x2,...,xn which are lengths of consecutive billboards located along a road. We have an additional input, k, giving the number of painters available to paint the boards. Each board requires time proportional to its length to paint. We must assign some number of consecutive boards to each painter such that the maximum sum of the length of boards assigned to any painter is as small as possible, i.e. so as to finish in minimum time. Observe that it is easy to check whether in T time the k painters can finish the job. If the check succeeds, you can decide to check for a lower time T. Develop this idea into an algorithm that determines the minimum time. Your algorithm should to about log n checks only.