Understanding Basic Data Structures for Efficient Data Memorization

Slide Note

Memorization is a fundamental function in computing, and the efficiency of this process depends on the data structure used. Learn about different ways of memorization, structuring memory units, and the implementation of stacks for storing and retrieving data efficiently.

gafford_a Follow

Uploaded on Oct 03, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Basic Data Structures Stack and Queue List Bucket and Hash

Memorize the Data Memorization is a basic function of computers The efficiency on the utility depends on the way (structure) of memorization cost for memorization, cost for search - write the data to a book as they come - make shelves of books We should choose the ways, according to the objectives to store the data Way of memorization is said data structure

The Way of Memory The unit of memory can keep a value - write the data to a book as they come allocate memory of some quantity, and write data from memory with the smallest indices array, stack, queue - make shelves of books structure the memory units by linking them with indices/pointers list, heap, binary tree, bucket, hash See them, one by one

Memory by Array Question: we want to memorize the data, that come one by one. How do we memorize them? Answer: write the latest at the next of the current end However, memory has no function of keeping the time the value is written Even if they have, it is not easy to find the last one; there are huge number of memory units thus, we use a memory as a variable keeping the position of the unit of written last

Stack See an example V V array counter The structure of a pair of array and counter (for the last position) is called stack (the counter is called a stack pointer )

Delete a Value Next, we think about reading and deleting the values written to the memory - read an arbitral value, and delete it read the last one (at stack pointer), and decrease stack pointer - delete xxx -th value copy the last position to xxx , and decrease stack pointer - delete the values xxx scanning the array is needed stack pointer V V V V V array

A Stack Subroutine Implementation of stack Use with stack and value intSTACK_push (STACK *S, int a){ if (S->t == S->max) return (1); // overflow error S->h[S->t] = a; S->t ++; return (0); } typedef struct { int *h;// array for data int end; // size of array int t; // counter } STACK intSTACK_pop (STACK *S, int *a){ if (S->t == 0) return (1); // underflow error S->t--; *a = S->h[S->t]; return (0); } t end V V V V V h[]

Examples of Usage Reverse the given string ABCDEFGH undo function for word processors end h[]

Column: Stack without Overflow Stack has a limit given by the array size In some case, we don t know the amount of data to be stacked (such as, read file and memorize all the numbers in the file; it s OK if we are allowed to scan the file before the execution ) When an overflow occurs, we make a new stack of a larger size, and copy the old one to the new one However, if we increment the size by one, overflow occurs in every insertion, and thus wastful

Column: Stack without Overflow (2) When we make a new stack, doubling the size is efficient Once overflow occurs, the number of cells of stacks existing in the memory at the same time is bounded by the number of values times three The total cost for copy is also bounded by the twice the current number of values no loss in the sense of time complexity

Column: Stack without Overflow (3) A code is written as this voidSTACK_push (STACK *S, int a){ if (S->t == S->max){ // overflow error int i, *h = malloc (sizeof(int)*max*2); // using realloc is easy for (i=0 ; i<S->t ; i++) h[i] = S->h[i]; free (S->h); S->h = h; } S->h[S->t] = a; S->t ++; }

F I L O Read and delete an arbitral one value (= the last one) used in the case, for example, a user put s on the display and computer deletes all when the button is pushed () - The value written last is read first - Such a data structure is called FILO First In Last Out counter 5 V V V V V array

Queue; FIFO In some case, we want to read first the value written first (FIFO; First In First Out ) For example, delete s in the order of putting counter services follow this rule (customer = value) Such a data structure is called queue counter 5 V V V V V array

Counters for Queue A queue needs a pointer to indicate the place at which the value is written first so, we need two counters (pointers), for the position to be read, and the position to be written the position to be read is called head that to be written is called tail head tail V V V V V array

Overflow Stack overflows when numbers come more than its size Queue overflows after inserting n+1 values, even though we deleted many values Set the tail to the head of the array, when overflow head also When the tail passes the head, really overflow occurs head tail V V V V V array

Adjustment for Passing When the tail catches up the head, something happens (all cells are written some values) this situation is the same as the empty queue! we cannot distinguish them Ways out of this are + prepare flag to distinguish them (one bit) + not write the last cell (size of queue will be n-1) head tail V V V V V V V V V V array

A Subroutine for Queue An implementation of queue is the following they input a queue structure and the value to be written intQUEUE_ins (QUEUE *Q, int a){ if ((Q->t +1) % Q->end == Q->s) return (1); // overflow error Q->h[Q->t] = a; Q->t = ( Q->t +1 ) % Q->end; return (0); } typedef struct { int *h;// array for data int end; // size of array int s; // counter for head int t; // counter for tail } QUEUE intQUEUE_ext ( QUEUE *S, int *a){ if (Q->s == Q->t) return (1); // underflow error *a = Q->h[Q->s]; Q->s = ( Q->s +1 ) % Q->end; return (0); } t s end V V V h[]

Example of Usages Input numbers one by one, and output five of them at once, at some points Draw the trajectory of the mouse cursor, with the fixed length (store the locations of mouse cursor in queue (ex., 30 locations in each second), and delete the ones before the specified period) end h[]

List: Ins/Del with Keeping the Order Arrays are simple and useful, but need much cost for keeping the ordering Can we have advantage for the ordering, with possibly lose some advantages on other functions For example, random access can be lost (we can access to the k-th element in constant time for any k) + customers in the line of a counter service, with allowing cancel and breaking into the line + edition of document; insert/delete/move words, sentences, and sections, even pictures, in the sequence of letters (and objects)

Idea: Simulate a Chain A (real) chain is useful, in such a situation however, finding the kth is not light simulate this structure in the computer In a chain, the neighboring relations are fixed, but the place is not. Thus, each ring (cell) of the chain can be located at any place in the memory, and adjacency has to be kept Each cell has to store its neighbor (previous, and next) thus, each cell has three values 1 5 7 3 + the value + the previous cell (position, or pointer) + the next cell (position, or pointer)

Strategy for Insertion/deletion When detach/insert a ring of a chain, we, cut the relation to the neighbors (of the ring) for a cell, change the adjacent relation of its neighbors For insertion, change the adjacency relation of the cells, on the place to be inserted (the inserting cell becomes a new neighbor) For deletion, directly connect each other, the neighbors of removing cell Both can be done in constant time when the target cell is given, and does not change the order 1 5 7 3

Structure Using Pointer Define this structure and allocate one block for each request no limit for the size (length), while arrays have Note: definition of LIST needs LIST itself, thus we need a trick of using _LIST_ typedef struct { LIST *prv; // pointer LIST *nxt; // pointer int h; // value } LIST The head and tail of a list has to be kept in memory, otherwise the list will be lost in the large memory space typedef struct _LIST_ { struct _LIST_ *prv; // pointer struct _LIST_ *nxt; // pointer int h; // value } LIST (a LIST structure can point head/tail

Code (Initialization) For initialization, prepare a LIST structure as the root of the list, and set nxt and prv to itself, to represent an empty list int LIST_init ( LIST *L ){ L->prv = L; L->nxt = L; } After inserting several cells to this empty list, the nxt/prv of points head / tail 1 5 7 3

Insertion Insertion is done with giving (the pointers to) the cell to be inserted, and the cell just before the place to be inserted change the pointers of the cells on the place int LIST_ins ( LIST *l, LIST *p ){ p->nxt->prv = l; l->nxt = p->nxt; p->nxt = l; l->prv = p; } 1 5 7 3 Notice that the order of changing the pointers is crucial In some bad orderings, the operation will not be done correctly

Deletion Deletion of a given cell is done by connecting the previous cell and the next cell by pointers int LIST_del ( LIST *l ){ l->nxt->prv = l->prv; l->prv->nxt = l->nxt; } 1 5 7 3 The pointers of l need not to be modified, since it is out (further, if we want to recover the cell in the list, in future, we can immediately identify the place to be recovered by looking at the non-deleted pointers)

In Usual Textbooks Generally, head/tail are supposed to point NULL nxt/prv is NULL, then that is the end list 1 5 7 3 Theoretically beautifully, but bothering for programming Insertion/deletion concerned with the edge needs an exception Insert before NULL, set the prv of NULL to X are impossible, so we have to avoid them by several if statements with considering the place to be operated

Loop along a List Tracing a list can be done by going to the nxt repeatedly, starting from LIST *p; int e; for ( p= ->nxt ; p!= ; p=p->nxt ){ e = p->h; } 1 5 7 3 Opposite direction is done by using prv

Recover a Cell A cell just removed (the neighbors are not operated) can be recovered by inserting it to the position at which the cell was int LIST_recov ( LIST *l ){ LIST_ins ( l, l->prv); } 1 5 7 3 The position is stored at prv/nxt In this way, we can recover all removed cells in the opposite order of the removal Removed cells can be identified by setting prv := NULL still nxt indicates the position 1 5 7 3

Usages of List Insert 1 to n to a list one by one so that i is inserted to the position next to j that is randomly chosen from 1 i-1 (random permutation is generated in linear time) Jobs in a time scale, new job comes, and some jobs will be canceled

A Sophisticated Usage We have n pairs of values: (x1,y1) , , (xn,yn) We want to know the nk /m th largest x value in the pairs whose y has rank of at most k, for each k, k (=1, ,m) Straightforward method spends O( n2 log n) time A sophisticated algorithm using a list spends only O( n( m+log n)) time

A Sophisticated Usage (2) Make a list of pairs sorted by values of x Store pointers to nk/m th largest values for all k Make a list of pairs sorted by values of y, and trace the list from the largest; at the same time, the currently visited pair is removed from the first list Update the positions of rank of 1/m th to m/m th This can be done by shifting the positions to right or left, according to the value of removed cell (we know the number of cells on the left side and right side) Intuitively, the complexity is; O( n2 log n) O( n( m+log n)) thus n times faster

Single Link List If we are always given the previous cell for insertion/deletion, we do not have to have pointers to the previous cell (only to the next cell) 1 5 7 3 Operations will be limited, memory/program/speed will be efficient + Making slides? + Merging sorted sequences of numbers

List Realized by Array List is realizable by arrays of cells instead of bothering pointers (on memory allocation, segmentation fault, ) Advantage: cells have indices, thus we can easily allocate weight/extra value for all cells just allocating an array Disadvantage: array needs cost to re-size Many applications in the real world needs fixed number of cells, thus no disadvantage typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST In this case, all cells are stored in one structure

Example of Array List The i-th cells of arrays h, prv, and nxt are h, prv, and nxt of cell i Consider the first/last cell as the root of the list typedef struct { int *prv; // index to previous int *nxt; // index to next int *h; // value } ALIST 0 1 2 3 4 ( ) V V V V h prv 2 4 0 3 1 nxt 0 1 3 4 2

Bucket Queues and lists are useful but not so for the search ex) find all values of 1 digit Some structure would make the search efficient A simple case is classification by values, since we usually want to find values near by the given key

Idea of Bucket We prepare one structure (array, list, etc.) for each value, then the values are classified Ex) we are given numbers from 0 to 99, and classify them according to their digits in ten s place Each structure is realized by a list, since we don t know the size of the structures after inputting all the numbers 0 1 2 3 4 5 6 7 8 9

Usage of Bucket Sorting numbers by their digits in ten s place Transposition of a sparse matrix A: 1,9,5 B: 1,2 C: 1,6,7 D: 4,5 0 1 2 3 4 5 6 7 8 9

Application: Radix Sort Buckets can sort the numbers by a digit in linear time Using this, we repeatedly sort the numbers from lower-orders to higher orders, with keeping the past order in the ties We need two buckets, but they can be one; we insert all numbers in the first bucket, keeping the ordering, and re-sort by scanning the bucket 0 1 2 3 4 5 6 7 8 9

Hash Buckets are useful when the values are classified finely, however, then, we need many buckets we need huge memory moreover, scanning the bucket takes long time then, are their any trade-off; such as high, but not perfect, accuracy, and non-large memory Further, we can restrict ourselves to just find this value neither larger than this , nor between XXX and YYY

Idea of Hash Consider the case of string data Question Is string Sinside this bucket? will be answered quickly, if we prepare buckets for all possibilities needs much memory space However, we can assume usually that strings are not many So, let s use the first two letters for the bucket classification Two strings will be in the same bucket even if they have different third letters; this reduces the memory for buckets if a bucket is empty, checking its inside is light however, if it contains many, check involves long scan so the operation will be heavy

Bucket for Strings Doesn t bucket accept only of numbers? Yes! Thus we have to convert a string to a number (index), and classify the strings according to the index 1: ABCABC 2: ABBBBB 3: CCCBBB First two letters is converted to a number, such as when alphabet size is three, suppose that A=0,B=1,C=2 and regard a string as a 3-digit number AB 1, CC 8 0 1 2 3 4 5 6 7 8 9

Bias of Distribution Sometimes, first two letters are not uniform in the data ex., English words ( st , th , and re are frequent) some buckets will have many, and the others will have few then, can we use some good mapping functions instead of first two letters ? (such function is called hash function, the functional value of a data is called hash key, or hash value) further, considering the real world applications, similar value should have (much) different hash values Ex) For x1,x2,x3 , , the modulo of (x1)1+(x2)2+(x3)3and ((x1+1)x2+1)x3 by (#buckets)

Determine the Size How can we determine the size of hash? (#buckets) Basically, any bucket should have few, in particular, 1 or 2 values originally, we have n values, thus O(n) is acceptable for #buckets Then, we can set them to constant multiplication of n particularly, we have no loss of space complexity As same as stacks, we double the size when the hash overflows no loss on the time complexity

Summary Stack and queue: combination of array and counters adopts sequentially coming data List: store the adjacency relation between data so that the order of the data is kept efficiently on insertion and deletion Bucket: make the search easy by classifying the data by their values Hash: buckets with hash keys for keeping both classification accuracy and memory efficiency high

Understanding Basic Data Structures for Efficient Data Memorization

Download Presentation

Presentation Transcript

Related

More Related Content