Dangling Tuples

Dangling Tuples

What is Dangling Tuples

•

In DBMS if there is a tuple that does not participate in a

natural join we called it as dangling tuple . It may gives

indication consistency problem in the database.

•

Another definition  of dangling problem tuple is that a tuple

with a foreign key value that not appear in the referenced

relation is known as dangling tuple. In DBMS Referential

integrity constraints specify us exactly when dangling tuples

indicate problem.

Multivalued Dependency

•

Multivalued dependency occurs when two attributes in a table are

independent of each other but, both depend on a third attribute.

•

A multivalued dependency consists of at least two attributes that are

dependent on a third attribute that's why it always requires at least

three attributes.

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of

each other

Transaction System

•

–

ACID Properties

Atomicity:

•

By this, we mean that either the entire transaction takes place at

once or doesn’t happen at all. There is no midway i.e. transactions do

not occur partially. Each transaction is considered as one unit and

either runs to completion or is not executed at all. It involves the

following two operations.

—

Abort

: If a transaction aborts, changes made to the database are

not visible.

—

Commit

: If a transaction commits, changes made are visible.

Atomicity is also known as the ‘All or nothing rule’.

•

Consider the following transaction

 consisting of

T1

and

T2

: Transfer

of 100 from account

 to account

•

If the transaction fails after completion of

T1

 but before completion

of

T2

.( say, after

write(X)

 but before

write(Y)

), then the amount has

been deducted from

 but not added to

. This results in an

inconsistent database state. Therefore, the transaction must be

executed in its entirety in order to ensure the correctness of the

database state.

Consistency:

•

This means that integrity constraints must be maintained so that the

database is consistent before and after the transaction. It refers to the

correctness of a database. Referring to the example above,

The total amount before and after the transaction must be

maintained.

Total

before T

 occurs =

500 + 200 = 700

Total

after T occurs

400 + 300 = 700

Therefore, the database is

consistent

. Inconsistency occurs in

case

T1

 completes but

T2

 fails. As a result, T is incomplete.

Isolation:

•

This property ensures that multiple transactions can occur

concurrently without leading to the inconsistency of the database

state. Transactions occur independently without interference.

Changes occurring in a particular transaction will not be visible to any

other transaction until that particular change in that transaction is

written to memory or has been committed. This property ensures

that the execution of transactions concurrently will result in a state

that is equivalent to a state achieved these were executed serially in

some order.

Let

= 500,

 = 500.

Durability:

•

This property ensures that once the transaction has completed

execution, the updates and modifications to the database are stored

in and written to disk and they persist even if a system failure occurs.

These updates now become permanent and are stored in non-volatile

memory. The effects of the transaction, thus, are never lost.

Important points:

Serializability

•

Schedules

and Serializable Schedules in DBMS

•

Schedules are a series of operations performing one transaction to

the other.

•

R(X) means Reading the value: X; and W(X) means Writing the

value: X.

Schedules in DBMS are of two types:

•

Serial Schedule

 - A schedule in which only one transaction is

executed at a time, i.e., one transaction is executed completely

before starting another transaction.

Example

Non-serial Schedule

•

A schedule in which the transactions are interleaving or

interchanging. There are several transactions executing

simultaneously as they are being used in performing real-world

database operations. These transactions may be working on the

same piece of data. Hence, the serializability of non-serial schedules

is a major concern so that our database is consistent before and

after the execution of the transactions.

Example

What is a serializable schedule?

•

A non-serial schedule is called a serializable schedule if it can be

converted to its equivalent serial schedule. In simple words, if a

non-serial schedule and a serial schedule result in the same then

the non-serial schedule is called a serializable schedule.

Testing of Serializability

•

To test the serializability of a schedule, we can use

Serialization

Graph

or

Precedence Graph

. A serialization Graph is nothing but a

Directed Graph of the entire transactions of a schedule.

•

It can be defined as a Graph G(V, E) consisting of a set of directed-

edges E = {E1, E2, E3, ..., En} and a set of vertices V = {V1, V2, V3,

...,Vn}. The set of edges contains one of the two operations - READ,

WRITE performed by a certain transaction.

Cont..

Ti -> Tj

, means Transaction-Ti is either performing read or write before the transaction-Tj.

Example

•

If there is a cycle present in the serialized graph then the schedule is

non-serializable because the cycle resembles that one transaction is

dependent on the other transaction and vice versa. It also means

that there are one or more conflicting pairs of operations in the

transactions. On the other hand, no-cycle means that the non-serial

schedule is serializable.

What is a conflicting pair in transactions?

•

Two operations inside a schedule are called conflicting if they meet

these three conditions:

1.

They belong to two different transactions.

2.

They are working on the same data piece.

3.

One of them is performing the WRITE operation.

•

To conclude, let’s take two operations on data: "a". The conflicting

pairs are:

1.

READ(a) - WRITE(a)

2.

WRITE(a) - WRITE(a)

3.

WRITE(a) - READ(a)

•

 transaction may not execute completely due to hardware failure,

system crash or software issues. In that case, we have to roll back the

failed transaction. But some other transaction may also have used

values produced by the failed transaction. So, we have to roll back

those transactions as well.

Recoverable Schedules

•

Schedules in which transactions commit only after all transactions

whose changes they read commit are called recoverable schedules. In

other words, if some transaction T

 is reading value updated or

written by some other transaction T

, then the commit of T

 must

occur after the commit of T

•

Example-

•

Given schedule follows order of

Ti->Tj => C1->C2

. Transaction T1 is

executed before T2 hence there is no chances of conflict occur. R1(x)

appears before W1(x) and transaction T1 is committed before T2 i.e.

completion of first transaction performed first update on data item x,

hence given schedule is recoverable.

S1: R1(x),

W1(x)

, R2(x), R1(y), R2(y),

W2(x)

, W1(y),

C1

C2

•

Consider the following schedule involving two transactions T

 and T

This is a recoverable schedule since T

 commits before T

, that makes the

value read by T

 correct.

Irrecoverable Schedule

•

The table below shows a schedule with two transactions, T1 reads

and writes A and that value is read and written by T2. T2 commits.

But later on, T1 fails. So we have to rollback T1. Since T2 has read the

value written by T1, it should also be rollbacked. But we have already

committed that. So this schedule is irrecoverable schedule. When Tj is

reading the value updated by Ti and Tj is committed before

committing of Ti, the schedule will be irrecoverable.

Recoverable with Cascading Rollback

•

The table below shows a schedule with two transactions, T1 reads

and writes A and that value is read and written by T2. But later on, T1

fails. So we have to rollback T1. Since T2 has read the value written by

T1, it should also be rollbacked. As it has not committed, we can

rollback T2 as well. So it is recoverable with cascading rollback.

Therefore, if Tj is reading value updated by Ti and commit of Tj is

delayed till commit of Ti, the schedule is called recoverable with

cascading rollback.

Cascadeless Recoverable Rollback

•

The table below shows a schedule with two transactions, T1 reads

and writes A and commits and that value is read by T2. But if T1 fails

before commit, no other transaction has read its value, so there is no

need to rollback other transaction. So this is a Cascadeless

recoverable schedule. So, if Tj reads value updated by Ti only after Ti

is committed, the schedule will be cascadeless recoverable.

Log and log records

•

The log is a sequence of log records, recording all the update activities

in the database. In a stable storage, logs for each transaction are

maintained. Any operation which is performed on the database is

recorded is on the log. Prior to performing any modification to

database, an update log record is created to reflect that modification.

•

An update log record represented as: <Ti, Xj, V1, V2> has these fields:

1.

Transaction identifier:

 Unique Identifier of the transaction that

performed the write operation.

2.

Data item:

 Unique identifier of the data item written.

3.

Old value:

 Value of data item prior to write.

4.

New value:

 Value of data item after write operation.

Other type of log records are:

1.

<Ti start>

: It contains information about when a transaction Ti starts.

2.

<Ti commit>

: It contains information about when a transaction Ti

commits.

3.

<Ti abort>

: It contains information about when a transaction Ti

aborts.

•

Because all database modifications must be preceded by creation of

log record, the system has available both the old value prior to

modification of data item and new value that is to be written for data

item. This allows system to perform redo and undo operations as

appropriate:

1.

Undo:

 using a log record sets the data item specified in log record to

old value.

2.

Redo:

 using a log record sets the data item specified in log record to

new value.

Checkpoint

•

The checkpoint is a type of mechanism where all the previous logs are

removed from the system and permanently stored in the storage disk.

•

The checkpoint is like a bookmark. While the execution of the

transaction, such checkpoints are marked, and the transaction is

executed then using the steps of the transaction, the log files will be

created.

•

When it reaches to the checkpoint, then the transaction will be

updated into the database, and till that point, the entire log file will

be removed from the file. Then the log file is updated with the new

step of transaction till next checkpoint and so on.

•

The checkpoint is used to declare a point before which the DBMS was

in the consistent state, and all transactions were committed.

Recovery using Checkpoint

•

The recovery system reads log files from the end to start. It reads log files from T4

to T1.

•

Recovery system maintains two lists, a redo-list, and an undo-list.

•

The transaction is put into redo state if the recovery system sees a log with <Tn,

Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous

list, all the transactions are removed and then redone before saving their logs.

Concurrency Control

•

Concurrency Control is provided in a database to:

•

nforce isolation among transactions.

•

reserve database consistency through consistency preserving

execution of transactions.

•

esolve read-write and write-read conflicts.

oncurrency

ontrol

echniques are

1.

Two-phase locking Protocol

2.

Time stamp ordering Protocol

3.

Multi version concurrency control

4.

Validation concurrency control

Two Phase Locking

•

Locking is an operation which secures: permission to read, OR

permission to write a data item. Two phase locking is a process used

to gain ownership of shared resources without creating the possibility

of deadlock. The 3 activities taking place in the two phase update

algorithm are:

(i)

  Lock Acquisition

(ii)

 Modification of Data

(iii)

Release Lock

Cont..

•

Two phase locking prevents deadlock from occurring in distributed systems

by releasing all the resources it has acquired, if it is not possible to acquire

all the resources required without waiting for another process to finish

using a lock. This means that no process is ever in a state where it is

holding some shared resources, and waiting for another process to release

a shared resource which it requires. This means that deadlock cannot occur

due to resource contention. A transaction in the Two Phase Locking

Protocol can assume one of the 2 phases:

Cont..

•

Growing Phase:

 In this phase a transaction can only acquire locks but

cannot release any lock. The point when a transaction acquires all the

locks it needs is called the Lock Point.

•

Shrinking Phase:

 In this phase a transaction can only release locks but

cannot acquire any.

Time Stamp Ordering

•

A timestamp is a tag that can be attached to any transaction or any

data item, which denotes a specific time on which the transaction or

the data item had been used in any way. A timestamp can be

implemented in 2 ways. One is to directly assign the current value of

the clock to the transaction or data item. The other is to attach the

value of a logical counter that keeps increment as new timestamps

are required. The timestamp of a data item can be of 2 types:

•

(i) W-timestamp(X):

 This means the latest time when the data item X

has been written into.

•

(ii) R-timestamp(X):

 This means the latest time when the data item X

has been read from. These 2 timestamps are updated each time a

successful read/write operation is performed on the data item X.

Multiversion Concurrency Control

•

Multiversion schemes keep old versions of data item to increase

concurrency.

Multiversion 2 phase locking:

 Each successful write

results in the creation of a new version of the data item written.

Timestamps are used to label the versions. When a read(X) operation

is issued, select an appropriate version of X based on the timestamp

of the transaction

Validation Concurrency Control

•

The optimistic approach is based on the assumption that the majority

of the database operations do not conflict. The optimistic approach

requires neither locking nor time stamping techniques. Instead, a

transaction is executed without restrictions until it is committed.

Using an optimistic approach, each transaction moves through 2 or 3

phases, referred to as read, validation and write.

Cont..

•

(i)

 During read phase, the transaction reads the database, executes

the needed computations and makes the updates to a private copy

of  the database values. All update operations of the transactions are

recorded in a temporary update file, which is not accessed by the

remaining transactions.

•

(ii)

 During the validation phase, the transaction is validated to ensure

that the changes made will not affect the integrity and consistency of

the database. If the validation test is positive, the transaction goes to

a write phase. If the validation test is negative, he transaction is

restarted and the changes are discarded.

•

(iii)

 During the write phase, the changes are permanently applied to

the database.

RDBMS

•

A relational database is a type of database that stores and provides access

to data points that are related to one another. Relational

databases

are

based on the relational model, an intuitive, straightforward way of

representing data in tables. In a relational database, each row in the table

is a record with a unique ID called the key. The columns of the table hold

attributes of the data, and each record usually has a value for each

attribute, making it easy to establish the relationships among data points.

Concept of Table Spaces

•

A table space is a storage structure containing tables, indexes, large

objects, and long data. They are used to organize data in a database

into logical storage groupings that relate to where data is stored on a

system. Table spaces are stored in database partition groups.

Segments, Extents and Block

•

Oracle allocates database space for all data in a database. The units of

logical database allocation are data blocks, extents, and segments. The

following illustration shows the relationships between these data

structures:

•

At the finest level of granularity, Oracle stores data in

data blocks

 (also

called logical blocks, Oracle blocks, or pages). One data block corresponds

to a specific number of bytes of physical database space on disk. You set the

data block size for every Oracle database when you create the database.

This data block size should be a multiple of the operating system's block

size within the maximum limit. Oracle data blocks are the smallest units of

storage that Oracle can use or allocate.

•

In contrast, all data at the physical, operating system level is stored in bytes.

Each operating system has what is called a

block size

. Oracle requests data

in multiples of Oracle blocks, not operating system blocks. Therefore, you

should set the Oracle block size to a multiple of the operating system block

size to avoid unnecessary I/O.

•

Oracle manages the storage space in the datafiles of a database in units

called

data blocks

. A data block is the smallest unit of I/O used by a

database.

Extents

•

The next level of logical database space is called an

extent

. An extent

is a specific number of contiguous data blocks that is allocated for

storing a specific type of information.

•

The level of logical database storage above an extent is called

segment

. A segment is a set of extents that have been allocated for a

specific type of data structure, and that all are stored in the same

tablespace. For example, each table's data is stored in its own

data

segment

, while each index's data is stored in its own

index

segment

.Oracle allocates space for segments in extents. Therefore,

when the existing extents of a segment are full, Oracle allocates

another extent for that segment. Because extents are allocated as

needed, the extents of a segment may or may not be contiguous on

disk. The segments also can span files, but the individual extents

cannot.

•

Dedicated Server

is a kind of web hosting where a business rents a full

computer server with Internet access from a data center. Unlike a

conventional computer server in an office closest, the Dedicated Server is

kept in a climate-controlled data center where the network and Internet

uptime are kept up by expert engineers. The business would now be able

to rent dedicated servers instead of purchasing and setting up a computer

server at their office and employing additional staff to oversee and

maintain the server. This permits businesses to eliminate the hassles of

dealing with day-to-day server issues and focus on their business growth.

Multi-Threaded Server

•

A multithreaded server is any server that has more than one thread.

Because a transport requires its own thread, multithreaded servers

also have multiple transports. The number of thread-transport pairs

that a server contains defines the number of requests that the server

can handle in parallel.

•

A distributed database is a database that consists of two or more files

located in different sites either on the same network or on entirely

different networks. Portions of the database are stored in multiple

physical locations and processing is distributed among multiple

database nodes.

•

A hierarchical query operates on rows in which one or more column

values correspond to nodes within a logical structure of parent-child

relationships. If parent rows have multiple children, sibling

relationships exist among child rows of the same parent.

Inline Query

•

An inline query is a type of sub-query present in FROM clause of a

SQL as a data source. Below is the type of sub-query: If it present in

the SELECT list, it is called “sub-select”. If it present in the FROM

clause, it is called “inline-query” or “inline-view”

Flashback

•

Flashback Query allows users to see the view of past data, If in case

some data or table is being deleted by the user, then the flashback

query provides us an opportunity to view that data again and perform

manipulations over it.

•

In flashback queries we have an concept of flash area, in flash area we

store the deleted data which can be viewed if needed in future.

•

To use the feature of flashback query, our server must be configured

according to automatic undo management. If our system supports the

traditional approach of rollback then we can not perform flashback

query on such systems.

•

We can enable the flashback query using the package

DBMS_FLASHBACK. This package enables us to view the data in past

by specifying the System change number or the exact time in the

past.

Slide Note

Embed Share

Download Presentation

Learn about dangling tuples in DBMS, which are tuples that do not participate in a natural join, potentially indicating consistency issues in the database. Referential integrity constraints play a key role in identifying and addressing dangling tuples.

ewan Follow

Uploaded on Mar 06, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Dangling Tuples

What is Dangling Tuples In DBMS if there is a tuple that does not participate in a natural join we called it as dangling tuple . It may gives indication consistency problem in the database. Another definition of dangling problem tuple is that a tuple with a foreign key value that not appear in the referenced relation is known as dangling tuple. In DBMS Referential integrity constraints specify us exactly when dangling tuples indicate problem.

Multivalued Dependency Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute. A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes. BIKE_MODEL MANUF_YEAR COLOR M2011 2008 White M2001 2008 Black M3001 2013 White M3001 2013 Black M4006 2017 White M4006 2017 Black Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

Transaction System The transaction refers to a small unit of any given program that consists of various low-level tasks. Every transaction in DBMS must maintain ACID A (Atomicity), C (Consistency), I (Isolation), D (Durability). One must maintain ACID so as to ensure completeness, accuracy, and integrity of data.

ACID Properties

Atomicity: By this, we mean that either the entire transaction takes place at once or doesn t happen at all. There is no midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves the following two operations. Abort: If a transaction aborts, changes made to the database are not visible. Commit: If a transaction commits, changes made are visible. Atomicity is also known as the All or nothing rule . Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.

If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before write(Y)), then the amount has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore, the transaction must be executed in its entirety in order to ensure the correctness of the database state.

Consistency: This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of a database. Referring to the example above, The total amount before and after the transaction must be maintained. Total before T occurs Total after T occurs = Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result, T is incomplete. = 500 400 + + 200 300 = = 700. 700.

Isolation: This concurrently without leading to the inconsistency of the database state. Transactions occur independently Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that transaction is written to memory or has been committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some Let X= 500, Y = 500. property ensures that multiple transactions can occur without interference. order.

Durability: This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs. These updates now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost.

Important points: Responsibility for maintaining properties Property Atomicity Transaction Manager Consistency Application programmer Isolation Concurrency Control Manager Durability Recovery Manager

Serializability Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It helps in maintaining the transactions to execute simultaneously without interleaving one another.

Schedules and Serializable Schedules in DBMS Schedules are a series of operations performing one transaction to the other. R(X) means Reading the value: X; and W(X) means Writing the value: X.

Schedules in DBMS are of two types: Serial Schedule -A schedule in which only one transaction is executed at a time, i.e., one transaction is executed completely before starting another transaction.

Example Transaction-1 Transaction-2 R(a) W(a) R(b) W(b) R(b) W(b) R(a) W(a)

Non-serial Schedule- A schedule in which the transactions are interleaving or interchanging. There are several transactions executing simultaneously as they are being used in performing real-world database operations. These transactions may be working on the same piece of data. Hence, the serializability of non-serial schedules is a major concern so that our database is consistent before and after the execution of the transactions.

Example Transaction-1 Transaction-2 R(a) W(a) R(b) W(b) R(b) R(a) W(b) W(a)

What is a serializable schedule? A non-serial schedule is called a serializable schedule if it can be converted to its equivalent serial schedule. In simple words, if a non-serial schedule and a serial schedule result in the same then the non-serial schedule is called a serializable schedule.

Testing of Serializability To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A serialization Graph is nothing but a Directed Graph ofthe entire transactions ofa schedule. It can be defined as a Graph G(V, E) consisting of a set of directed- edges E = {E1, E2, E3, ..., En} and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two operations - READ, WRITE performed by acertain transaction.

Cont.. Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.

Example If there is a cycle present in the serialized graph then the schedule is non-serializable because the cycle resembles that one transaction is dependent on the other transaction and vice versa. It also means that there are one or more conflicting pairs of operations in the transactions. On the other hand, no-cycle means that the non-serial schedule is serializable.

What is a conflicting pair in transactions? Two operations inside a schedule are called conflicting if they meet these three conditions: 1.They belong to two different transactions. 2.They are working on the same data piece. 3.One of them is performing the WRITE operation. To conclude, let s take two operations on data: "a". The conflicting pairs are: 1.READ(a) - WRITE(a) 2.WRITE(a) - WRITE(a) 3.WRITE(a) - READ(a)

Recoverability Recoverability A transaction may not execute completely due to hardware failure, system crash or software issues. In that case, we have to roll back the failed transaction. But some other transaction may also have used values produced by the failed transaction. So, we have to roll back those transactions as well.

Recoverable Schedules Schedules in which transactions commit only after all transactions whose changes they read commit are called recoverable schedules. In other words, if some transaction Tjis reading value updated or written by some other transaction Ti, then the commit of Tjmust occur after the commit of Ti. Example- Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is recoverable. S1: R1(x), W1(x), R2(x), R1(y), R2(y), W2(x), W1(y), C1, C2;

Example 2 Example 2 Consider the following schedule involving two transactions T1and T2. T1 T2 R(A) W(A) W(A) R(A) commit commit This is a recoverable schedule since T1commits before T2, that makes the value read by T2correct.

Irrecoverable Schedule The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. But we have already committed that. So this schedule is irrecoverable schedule. When Tj is reading the value updated by Ti and Tj is committed before committing of Ti, the schedule will be irrecoverable.

Recoverable with Cascading Rollback The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. As it has not committed, we can rollback T2 as well. So it is recoverable with cascading rollback. Therefore, if Tj is reading value updated by Ti and commit of Tj is delayed till commit of Ti, the schedule is called recoverable with cascading rollback.

Cascadeless Recoverable Rollback The table below shows a schedule with two transactions, T1 reads and writes A and commits and that value is read by T2. But if T1 fails before commit, no other transaction has read its value, so there is no need to rollback other transaction. So this is a Cascadeless recoverable schedule. So, if Tj reads value updated by Ti only after Ti is committed, the schedule will be cascadeless recoverable.

Log and log records The log is a sequence of log records, recording all the update activities in the database. In a stable storage, logs for each transaction are maintained. Any operation which is performed on the database is recorded is on the log. Prior to performing any modification to database, an update log record is created to reflect that modification. An update log record represented as: <Ti, Xj, V1, V2> has these fields: 1.Transaction identifier: Unique Identifier of the transaction that performed the write operation. 2.Data item: Unique identifier of the data item written. 3.Old value: Value of data item prior to write. 4.New value: Value of data item after write operation.

Other type of log records are: 1.<Ti start>: It contains information about when a transaction Ti starts. 2.<Ti commit>: It contains information about when a transaction Ti commits. 3.<Ti abort>: It contains information about when a transaction Ti aborts.

Undo and Redo Operations Undo and Redo Operations Because all database modifications must be preceded by creation of log record, the system has available both the old value prior to modification of data item and new value that is to be written for data item. This allows system to perform redo and undo operations as appropriate: 1.Undo: using a log record sets the data item specified in log record to old value. 2.Redo: using a log record sets the data item specified in log record to new value.

Checkpoint The checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk. The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the transaction is executed then using the steps of the transaction, the log files will be created. When it reaches to the checkpoint, then the transaction will be updated into the database, and till that point, the entire log file will be removed from the file. Then the log file is updated with the new step of transaction till next checkpoint and so on. The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all transactions were committed.

Recovery using Checkpoint The recovery system reads log files from the end to start. It reads log files from T4 to T1. Recovery system maintains two lists, a redo-list, and an undo-list. The transaction is put into redo state if the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the transactions are removed and then redone before saving their logs.

Concurrency Control Concurrency Control is provided in a database to: Enforce isolation among transactions. Preserve database consistency through consistency preserving execution of transactions. Resolve read-write and write-read conflicts.

Concurrency Control Techniques are 1.Two-phase locking Protocol 2.Time stamp ordering Protocol 3.Multi version concurrency control 4.Validation concurrency control

Two Phase Locking Locking is an operation which secures: permission to read, OR permission to write a data item. Two phase locking is a process used to gain ownership of shared resources without creating the possibility of deadlock. The 3 activities taking place in the two phase update algorithm are: (i) Lock Acquisition (ii) Modification of Data (iii) Release Lock

Cont.. Two phase locking prevents deadlock from occurring in distributed systems by releasing all the resources it has acquired, if it is not possible to acquire all the resources required without waiting for another process to finish using a lock. This means that no process is ever in a state where it is holding some shared resources, and waiting for another process to release a shared resource which it requires. This means that deadlock cannot occur due to resource contention. A transaction in the Two Phase Locking Protocol can assume one of the 2 phases:

Cont.. Growing Phase: In this phase a transaction can only acquire locks but cannot release any lock. The point when a transaction acquires all the locks it needs is called the Lock Point. Shrinking Phase: In this phase a transaction can only release locks but cannot acquire any.

Time Stamp Ordering A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific time on which the transaction or the data item had been used in any way. A timestamp can be implemented in 2 ways. One is to directly assign the current value of the clock to the transaction or data item. The other is to attach the value of a logical counter that keeps increment as new timestamps are required. The timestamp of a data item can be of 2 types: (i) W-timestamp(X): This means the latest time when the data item X has been written into. (ii) R-timestamp(X): This means the latest time when the data item X has been read from. These 2 timestamps are updated each time a successful read/write operation is performed on the data item X.

Multiversion Concurrency Control Multiversion schemes keep old versions of data item to increase concurrency. Multiversion 2 phase locking: Each successful write results in the creation of a new version of the data item written. Timestamps are used to label the versions. When a read(X) operation is issued, select an appropriate version of X based on the timestamp of the transaction

Validation Concurrency Control The optimistic approach is based on the assumption that the majority of the database operations do not conflict. The optimistic approach requires neither locking nor time stamping techniques. Instead, a transaction is executed without restrictions until it is committed. Using an optimistic approach, each transaction moves through 2 or 3 phases, referred to as read, validation and write.

Cont.. (i) During read phase, the transaction reads the database, executes the needed computations and makes the updates to a private copy of the database values. All update operations of the transactions are recorded in a temporary update file, which is not accessed by the remaining transactions. (ii) During the validation phase, the transaction is validated to ensure that the changes made will not affect the integrity and consistency of the database. If the validation test is positive, the transaction goes to a write phase. If the validation test is negative, he transaction is restarted and the changes are discarded. (iii) During the write phase, the changes are permanently applied to the database.

RDBMS A relational database is a type of database that stores and provides access to data points that are related to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables. In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.

RDBMS DBMS Data stored is in table format Data stored is in the file format Multiple data elements are accessible together Individual access of data elements Data in the form of a table are linked together No connection between data Normalisation is not achievable There is normalisation Support distributed database No support for distributed database Data is stored in a large amount Data stored is a small quantity

Here, redundancy of data is reduced with the help of key and indexes in RDBMS Data redundancy is common RDBMS supports multiple users DBMS supports a single user It features multiple layers of security while handling data There is only low security while handling data The software and hardware requirements are higher The software and hardware requirements are low Oracle, SQL Server. XML, Microsoft Access.

Concept of Table Spaces A table space is a storage structure containing tables, indexes, large objects, and long data. They are used to organize data in a database into logical storage groupings that relate to where data is stored on a system. Table spaces are stored in database partition groups.

Segments, Extents and Block Oracle allocates database space for all data in a database. The units of logical database allocation are data blocks, extents, and segments. The following illustration shows the relationships between these data structures:

Data Blocks Data Blocks At the finest level of granularity, Oracle stores data in data blocks (also called logical blocks, Oracle blocks, or pages). One data block corresponds to a specific number of bytes of physical database space on disk. You set the data block size for every Oracle database when you create the database. This data block size should be a multiple of the operating system's block size within the maximum limit. Oracle data blocks are the smallest units of storage that Oracle can use or allocate. In contrast, all data at the physical, operating system level is stored in bytes. Each operating system has what is called a block size. Oracle requests data in multiples of Oracle blocks, not operating system blocks. Therefore, you should set the Oracle block size to a multiple of the operating system block size to avoid unnecessary I/O. Oracle manages the storage space in the datafiles of a database in units called data blocks. A data block is the smallest unit of I/O used by a database.

Extents The next level of logical database space is called an extent. An extent is a specific number of contiguous data blocks that is allocated for storing a specific type of information.

Dangling Tuples

Download Presentation

Presentation Transcript

Related

More Related Content