Database Implementation Issues in Programming Studio

undefined
 
DATABASE
IMPLEMENTATION
ISSUES
 
CSCE 315 – PROGRAMMING STUDIO, SPRING 2019
ROBERT LIGHTFOOT
 
Slides adapted from those used by
Jennifer Welch and John Keyser
 
DATABASE IMPLEMENTATION
 
Typically, we assume databases are very large, used by
many people, etc.
So, specialized algorithms are usually used for databases
Efficiency
Reliability
 
STORING DATA
 
Other terminology for implementation
Relation is a 
table
Tuple is a 
record
Attribute is a 
field
 
STORING A RECORD (TUPLE)
 
Often can assume all the fields are fixed (maximum)
length.
For efficiency, usually concatenate all fields in each tuple.
Variable length: store max length possible, plus one bit
for termination
Store the offsets for concatenation in a schema
 
EXAMPLE: TUPLE STORAGE
 
Senator
Name – variable character (100 + 1 bytes)
State – fixed character (2 bytes)
YearsInSenate – integer (1 byte)
Party – variable character (11 + 1 bytes)
 
MORE ON TUPLES/RECORDS
 
So, schema would store:
Name: 0
State: 101
YearsInSenate: 103
Party: 104
Note that HW/efficiency considerations might give
minimum sizes for each field
e.g. multiple of 4 or 8 bytes
 
VARIABLE LENGTH FIELDS
 
Storing max size may be problematic
Usually nowhere close – waste space
Could make record too large for a “unit” of storage (i.e., disk
block)
Storing pointers may be problematic
Record is not known size, thus variable length records
Not much locality in memory
Extra storage for pointer
Dynamic memory allocation is slow
 
VARIABLE LENGTH FIELDS
 
Compromise: “local” pointers
Store fixed-length records, followed by variable-length
Variable data still kept locally
Header stores info about variable fields
Pointer to start of each
Not great if data in variable field will change
 
RECORD HEADERS
 
Might want to store additional key information in 
header
of each record
Schema information (or pointer to schema)
Record size (if variable length)
Timestamp of last modification
 
RECORD HEADERS AND BLOCKS
 
Records grouped into 
blocks
Correspond with a “unit” of disk/storage
Header information with record positions
Also might list which relation it is part of.
Concatenate records
Header
Record 1
Record n
Record 2
 
ADDRESSES
 
Addresses of (pointers to) data often
represented
Two types of address
Location in database (on disk)
Location in memory
Translation table
 usually kept to map items
currently in virtual memory to the overall
database.
Pointer swizzling: updating pointers to refer to disk
vs. memory locations
 
RECORDS AND BLOCKS
 
Sometimes want records to 
span
 blocks
Generally try to keep related records in the same block,
but not always possible
Record too large for one block
Too much wasted space
Split parts are called 
fragments
Header information of record
Is it a fragment
Store pointers to previous/next fragments
 
ADDING, DELETING, MODIFYING
RECORDS
 
Insertion
If order doesn’t matter, just find a block with enough free space
Later come back to storing tables
If want to keep in order:
If room in block, just do insertion sort
If need new block, go to overflow block
Might rearrange records between blocks
Other variations
 
ADDING, DELETING, MODIFYING
RECORDS
 
Deletion
If want to keep space, may need to shift records around in block
to fill gap created
Can use “tombstone” to mark deleted records
Modifying
For fixed-length, straightforward
For variable-length, like adding (if length increases) or deleting
(if length decreases)
 
KEEPING TRACK OF TABLES
 
We have a bunch of records stored (somehow).
We need to query them (SELECT * FROM table
WHERE condition)
Scanning every block/record is far too slow
Could store each table in a subset of blocks
Saves time, but still slow
Use an 
index
INDEX
A general term – applies to other settings (e.g.,
index of a file, index of a dictionary, index of
search engine)
It is a separate file 
describing
 the main data file
Describe here means giving random access to the data
file by some search keys
Example:
 
INDEX IN DATABASE TERMINOLOGY
 
Special data structures to find all records that satisfy
some condition
Possible indexes
Simple index on sorted data
Secondary index on unsorted file
Trees (B-trees)
Hash Tables
 
SORTED FILES
 
Sort records of the relation according to field (attribute)
of interest.
Attribute of interest is 
search key
Might not be a “true” 
key
Index stores (K,a) values
K = search key
a = address of record with K
 
DENSE INDEX
 
One index entry per record
Useful if records are huge, and index can be small enough to fit
in memory
Can search efficiently and then examine/retrieve single
record only
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
 
SPARSE INDEX
(ON SEQUENTIAL FILE)
 
Store an index for only every n records
Use that to find the one before, then search sequentially.
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
1
7
12
27
44
 
MULTIPLE INDICES
 
Indices in hierarchy
B-trees are an example
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
1
7
12
27
44
1
27
 
DUPLICATE KEYS
 
Can cause issues, in both dense and sparse indexes, need
to account for
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
1
7
12
27
44
 
WHAT IF NOT SORTED?
 
Can be the case when we want two or more indices on
the same data
e.g. Senator.name, Senator.party
Must be dense (sparse would make no sense)
Can sort the 
index 
by the search key
This 
second level index
 can be sparse
 
EXAMPLE – SECONDARY INDEX
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
1
7
5
7
10
12
18
18
18
27
30
65
44
43
35
73
1
7
12
27
44
1
27
 
BUCKETS
 
If there are lots of repeated keys, can use buckets
Buckets are in between the secondary index and the data
file
One entry in index per key – points to bucket file
Bucket file lists all records with that key
 
EXAMPLE – BUCKETS
 
1
7
5
10
12
18
18
18
27
30
65
44
43
35
73
1
7
12
27
44
7
 
PUTTING IT ALL TOGETHER
 
To handle variable length records (e.g. VARCHAR(x)),
we now have a header for 
every
 record
If everything were fixed length and we did not care about
wasted space, only per-table header would suffice
To deal with long disk latency, group as many
records in a disk block as possible
The rest of the block is empty
We need a block header to keep track
 
header
 
A disk block containing records from a table
Record 1
Record 2
Record 3
 
PUTTING IT ALL TOGETHER (CONTD.)
 
Insert/Delete/Update operations put this design into test
Delete is easier, just mark as deleted
Insert/Update might require a lot of “move-to-right” operations
for making room for new/increased data
But empty spaces in most blocks, the “move-to-right”
operation does not propagate too much
Usually an empty spot is found in the current or surrounding
blocks
If no surrounding blocks found, 
overflow blocks
 can be used
Thus, insert/update operations does not undo sorted files
 
PUTTING IT ALL TOGETHER - INDICES
 
Index is a separate file along with the data file
Often small enough to fit in the RAM
The main purpose of indices is to avoid scanning all
disk blocks
Each index entry contains a key and a pointer to the record
containing the key
Index often resides in the RAM, while the record in the disk
Two types of indices:
Dense Index (applies to sorted or unsorted files)
Sparse (only for sorted files)
 
STORAGE CONSIDERATIONS
 
Memory Hierarchy
Cache
Main Memory
Secondary storage (disk)
Tertiary storage (e.g. tape)
Smaller amounts but faster access
Need to organize information to minimize “cache misses”
 
STORAGE CONSIDERATIONS:
MAKING THINGS EFFICIENT
 
Placing records together in blocks for group fetch
Prefetching
Prediction algorithm
Parallelism/Striping
placing across multiple disks to read/write faster
Example: RAID
Mirroring
double the reliability
 
STORAGE CONSIDERATIONS
MAKING IT RELIABLE
 
Checksums
Mirroring disks
Parity bits
RAID levels
Mostly, out of the scope of this course
Slide Note
Embed
Share

Key topics covered in the slides include database implementation issues, storing data efficiently, and strategies for handling variable length fields in tuple storage. The presentation delves into specialized algorithms for database efficiency and reliability, terminology related to database implementation, and considerations for storing data like tuples and records. Various challenges and compromises in managing variable length fields in storage systems are also discussed.

  • Database Implementation
  • Efficiency
  • Reliability
  • Storage Strategies
  • Variable Length Fields

Uploaded on Sep 01, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. DATABASE IMPLEMENTATION ISSUES CSCE 315 PROGRAMMING STUDIO, SPRING 2019 ROBERT LIGHTFOOT Slides adapted from those used by Jennifer Welch and John Keyser

  2. DATABASE IMPLEMENTATION Typically, we assume databases are very large, used by many people, etc. So, specialized algorithms are usually used for databases Efficiency Reliability

  3. STORING DATA Other terminology for implementation Relation is a table Tuple is a record Attribute is a field

  4. STORING A RECORD (TUPLE) Often can assume all the fields are fixed (maximum) length. For efficiency, usually concatenate all fields in each tuple. Variable length: store max length possible, plus one bit for termination Store the offsets for concatenation in a schema

  5. EXAMPLE: TUPLE STORAGE Senator Name variable character (100 + 1 bytes) State fixed character (2 bytes) YearsInSenate integer (1 byte) Party variable character (11 + 1 bytes) 116 0 101 103 104

  6. MORE ON TUPLES/RECORDS So, schema would store: Name: 0 State: 101 YearsInSenate: 103 Party: 104 Note that HW/efficiency considerations might give minimum sizes for each field e.g. multiple of 4 or 8 bytes

  7. VARIABLE LENGTH FIELDS Storing max size may be problematic Usually nowhere close waste space Could make record too large for a unit of storage (i.e., disk block) Storing pointers may be problematic Record is not known size, thus variable length records Not much locality in memory Extra storage for pointer Dynamic memory allocation is slow

  8. VARIABLE LENGTH FIELDS Compromise: local pointers Store fixed-length records, followed by variable-length Variable data still kept locally Header stores info about variable fields Pointer to start of each Not great if data in variable field will change

  9. RECORD HEADERS Might want to store additional key information in header of each record Schema information (or pointer to schema) Record size (if variable length) Timestamp of last modification

  10. RECORD HEADERS AND BLOCKS Records grouped into blocks Correspond with a unit of disk/storage Header information with record positions Also might list which relation it is part of. Concatenate records Header Record 1 Record 2 Record n

  11. ADDRESSES Addresses of (pointers to) data often represented Two types of address Location in database (on disk) Location in memory Translation table usually kept to map items currently in virtual memory to the overall database. Pointer swizzling: updating pointers to refer to disk vs. memory locations

  12. RECORDS AND BLOCKS Sometimes want records to span blocks Generally try to keep related records in the same block, but not always possible Record too large for one block Too much wasted space Split parts are called fragments Header information of record Is it a fragment Store pointers to previous/next fragments

  13. ADDING, DELETING, MODIFYING RECORDS Insertion If order doesn t matter, just find a block with enough free space Later come back to storing tables If want to keep in order: If room in block, just do insertion sort If need new block, go to overflow block Might rearrange records between blocks Other variations

  14. ADDING, DELETING, MODIFYING RECORDS Deletion If want to keep space, may need to shift records around in block to fill gap created Can use tombstone to mark deleted records Modifying For fixed-length, straightforward For variable-length, like adding (if length increases) or deleting (if length decreases)

  15. KEEPING TRACK OF TABLES We have a bunch of records stored (somehow). We need to query them (SELECT * FROM table WHERE condition) Scanning every block/record is far too slow Could store each table in a subset of blocks Saves time, but still slow Use an index

  16. INDEX A general term applies to other settings (e.g., index of a file, index of a dictionary, index of search engine) It is a separate file describing the main data file Describe here means giving random access to the data file by some search keys Example: A B C Apple Apricot Ball Banana Blazer Cannot Cat Code Coil . Index file Data file

  17. INDEX IN DATABASE TERMINOLOGY Special data structures to find all records that satisfy some condition Possible indexes Simple index on sorted data Secondary index on unsorted file Trees (B-trees) Hash Tables

  18. SORTED FILES Sort records of the relation according to field (attribute) of interest. Attribute of interest is search key Might not be a true key Index stores (K,a) values K = search key a = address of record with K

  19. DENSE INDEX One index entry per record Useful if records are huge, and index can be small enough to fit in memory Can search efficiently and then examine/retrieve single record only 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  20. SPARSE INDEX (ON SEQUENTIAL FILE) Store an index for only every n records Use that to find the one before, then search sequentially. 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  21. MULTIPLE INDICES Indices in hierarchy B-trees are an example 1 27 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  22. DUPLICATE KEYS Can cause issues, in both dense and sparse indexes, need to account for 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  23. WHAT IF NOT SORTED? Can be the case when we want two or more indices on the same data e.g. Senator.name, Senator.party Must be dense (sparse would make no sense) Can sort the index by the search key This second level index can be sparse

  24. EXAMPLE SECONDARY INDEX 1 27 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73 5 35 18 43 12 44 73 1 65 10 7 18 30 27 7 18

  25. BUCKETS If there are lots of repeated keys, can use buckets Buckets are in between the secondary index and the data file One entry in index per key points to bucket file Bucket file lists all records with that key

  26. EXAMPLE BUCKETS 1 7 12 27 44 1 5 7 10 12 18 27 30 35 43 44 65 73 7 18 18

  27. PUTTING IT ALL TOGETHER To handle variable length records (e.g. VARCHAR(x)), we now have a header for every record If everything were fixed length and we did not care about wasted space, only per-table header would suffice To deal with long disk latency, group as many records in a disk block as possible The rest of the block is empty We need a block header to keep track header Record 3 Record 2 Record 1 A disk block containing records from a table

  28. PUTTING IT ALL TOGETHER (CONTD.) header Record 3 Record 2 Record 1 Insert/Delete/Update operations put this design into test Delete is easier, just mark as deleted Insert/Update might require a lot of move-to-right operations for making room for new/increased data But empty spaces in most blocks, the move-to-right operation does not propagate too much Usually an empty spot is found in the current or surrounding blocks If no surrounding blocks found, overflow blocks can be used Thus, insert/update operations does not undo sorted files

  29. PUTTING IT ALL TOGETHER - INDICES Index is a separate file along with the data file Often small enough to fit in the RAM The main purpose of indices is to avoid scanning all disk blocks Each index entry contains a key and a pointer to the record containing the key Index often resides in the RAM, while the record in the disk Two types of indices: Dense Index (applies to sorted or unsorted files) Sparse (only for sorted files)

  30. STORAGE CONSIDERATIONS Memory Hierarchy Cache Main Memory Secondary storage (disk) Tertiary storage (e.g. tape) Smaller amounts but faster access Need to organize information to minimize cache misses

  31. STORAGE CONSIDERATIONS: MAKING THINGS EFFICIENT Placing records together in blocks for group fetch Prefetching Prediction algorithm Parallelism/Striping placing across multiple disks to read/write faster Example: RAID Mirroring double the reliability

  32. STORAGE CONSIDERATIONS MAKING IT RELIABLE Checksums Mirroring disks Parity bits RAID levels Mostly, out of the scope of this course

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#