Dangling Tuples

 
Dangling Tuples
 
 
What is Dangling Tuples
 
In DBMS if there is a tuple that does not participate in a
natural join we called it as dangling tuple . It may gives
indication consistency problem in the database.
Another definition  of dangling problem tuple is that a tuple
with a foreign key value that not appear in the referenced
relation is known as dangling tuple. In DBMS Referential
integrity constraints specify us exactly when dangling tuples
indicate problem.
 
Multivalued Dependency
 
Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least
three attributes.
 
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other
.
 
Transaction System
 
T
h
e
 
t
r
a
n
s
a
c
t
i
o
n
 
r
e
f
e
r
s
 
t
o
 
a
 
s
m
a
l
l
 
u
n
i
t
 
o
f
 
a
n
y
 
g
i
v
e
n
 
p
r
o
g
r
a
m
t
h
a
t
 
c
o
n
s
i
s
t
s
 
o
f
 
v
a
r
i
o
u
s
 
l
o
w
-
l
e
v
e
l
 
t
a
s
k
s
.
 
E
v
e
r
y
 
t
r
a
n
s
a
c
t
i
o
n
 
i
n
D
B
M
S
 
m
u
s
t
 
m
a
i
n
t
a
i
n
 
A
C
I
D
 
 
A
 
(
A
t
o
m
i
c
i
t
y
)
,
 
C
 
(
C
o
n
s
i
s
t
e
n
c
y
)
,
 
I
(
I
s
o
l
a
t
i
o
n
)
,
 
D
 
(
D
u
r
a
b
i
l
i
t
y
)
.
 
O
n
e
 
m
u
s
t
 
m
a
i
n
t
a
i
n
 
A
C
I
D
 
s
o
 
a
s
 
t
o
e
n
s
u
r
e
 
c
o
m
p
l
e
t
e
n
e
s
s
,
 
a
c
c
u
r
a
c
y
,
 
a
n
d
 
i
n
t
e
g
r
i
t
y
 
o
f
 
d
a
t
a
.
 
ACID Properties
 
 
Atomicity:
 
By this, we mean that either the entire transaction takes place at
once or doesn’t happen at all. There is no midway i.e. transactions do
not occur partially. Each transaction is considered as one unit and
either runs to completion or is not executed at all. It involves the
following two operations.
Abort
: If a transaction aborts, changes made to the database are
not visible.
Commit
: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
 
Consider the following transaction 
T
 consisting of 
T1
 and 
T2
: Transfer
of 100 from account 
X
 to account 
Y
.
 
 
 
 
 
If the transaction fails after completion of 
T1
 but before completion
of 
T2
.( say, after 
write(X)
 but before 
write(Y)
), then the amount has
been deducted from 
X
 but not added to 
Y
. This results in an
inconsistent database state. Therefore, the transaction must be
executed in its entirety in order to ensure the correctness of the
database state.
 
Consistency:
 
This means that integrity constraints must be maintained so that the
database is consistent before and after the transaction. It refers to the
correctness of a database. Referring to the example above,
The total amount before and after the transaction must be
maintained.
Total 
before T
 occurs = 
500 + 200 = 700
.
Total 
after T occurs
 = 
400 + 300 = 700
.
Therefore, the database is 
consistent
. Inconsistency occurs in
case 
T1
 completes but 
T2
 fails. As a result, T is incomplete.
 
Isolation:
 
This property ensures that multiple transactions can occur
concurrently without leading to the inconsistency of the database
state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any
other transaction until that particular change in that transaction is
written to memory or has been committed. This property ensures
that the execution of transactions concurrently will result in a state
that is equivalent to a state achieved these were executed serially in
some order.
Let 
X
= 500, 
Y
 = 500.
 
Durability:
 
This property ensures that once the transaction has completed
execution, the updates and modifications to the database are stored
in and written to disk and they persist even if a system failure occurs.
These updates now become permanent and are stored in non-volatile
memory. The effects of the transaction, thus, are never lost.
 
Important points:
 
Serializability
 
S
e
r
i
a
l
i
z
a
b
i
l
i
t
y
 
o
f
 
s
c
h
e
d
u
l
e
s
 
e
n
s
u
r
e
s
 
t
h
a
t
 
a
 
n
o
n
-
s
e
r
i
a
l
s
c
h
e
d
u
l
e
 
i
s
 
e
q
u
i
v
a
l
e
n
t
 
t
o
 
a
 
s
e
r
i
a
l
 
s
c
h
e
d
u
l
e
.
 
I
t
 
h
e
l
p
s
 
i
n
m
a
i
n
t
a
i
n
i
n
g
 
t
h
e
 
t
r
a
n
s
a
c
t
i
o
n
s
 
t
o
 
e
x
e
c
u
t
e
 
s
i
m
u
l
t
a
n
e
o
u
s
l
y
 
w
i
t
h
o
u
t
i
n
t
e
r
l
e
a
v
i
n
g
 
o
n
e
 
a
n
o
t
h
e
r
.
 
Schedules 
and Serializable Schedules in DBMS
 
Schedules are a series of operations performing one transaction to
the other.
R(X) means Reading the value: X; and W(X) means Writing the
value: X.
 
 
Schedules in DBMS are of two types:
 
Serial Schedule
 - A schedule in which only one transaction is
executed at a time, i.e., one transaction is executed completely
before starting another transaction.
 
Example
 
Non-serial Schedule
 -
 
A schedule in which the transactions are interleaving or
interchanging. There are several transactions executing
simultaneously as they are being used in performing real-world
database operations. These transactions may be working on the
same piece of data. Hence, the serializability of non-serial schedules
is a major concern so that our database is consistent before and
after the execution of the transactions.
 
Example
 
What is a serializable schedule?
 
A non-serial schedule is called a serializable schedule if it can be
converted to its equivalent serial schedule. In simple words, if a
non-serial schedule and a serial schedule result in the same then
the non-serial schedule is called a serializable schedule.
 
Testing of Serializability
 
To test the serializability of a schedule, we can use 
Serialization
Graph
 or 
Precedence Graph
. A serialization Graph is nothing but a
Directed Graph of the entire transactions of a schedule.
It can be defined as a Graph G(V, E) consisting of a set of directed-
edges E = {E1, E2, E3, ..., En} and a set of vertices V = {V1, V2, V3,
...,Vn}. The set of edges contains one of the two operations - READ,
WRITE performed by a certain transaction.
 
Cont..
 
Ti -> Tj
, means Transaction-Ti is either performing read or write before the transaction-Tj.
 
Example
 
If there is a cycle present in the serialized graph then the schedule is
non-serializable because the cycle resembles that one transaction is
dependent on the other transaction and vice versa. It also means
that there are one or more conflicting pairs of operations in the
transactions. On the other hand, no-cycle means that the non-serial
schedule is serializable.
 
What is a conflicting pair in transactions?
 
Two operations inside a schedule are called conflicting if they meet
these three conditions:
1.
They belong to two different transactions.
2.
They are working on the same data piece.
3.
One of them is performing the WRITE operation.
To conclude, let’s take two operations on data: "a". The conflicting
pairs are:
1.
READ(a) - WRITE(a)
2.
WRITE(a) - WRITE(a)
3.
WRITE(a) - READ(a)
 
R
e
c
o
v
e
r
a
b
i
l
i
t
y
 
A
 transaction may not execute completely due to hardware failure,
system crash or software issues. In that case, we have to roll back the
failed transaction. But some other transaction may also have used
values produced by the failed transaction. So, we have to roll back
those transactions as well.
 
Recoverable Schedules
 
Schedules in which transactions commit only after all transactions
whose changes they read commit are called recoverable schedules. In
other words, if some transaction T
j
 is reading value updated or
written by some other transaction T
i
, then the commit of T
j
 must
occur after the commit of T
i
.
Example-
Given schedule follows order of 
Ti->Tj => C1->C2
. Transaction T1 is
executed before T2 hence there is no chances of conflict occur. R1(x)
appears before W1(x) and transaction T1 is committed before T2 i.e.
completion of first transaction performed first update on data item x,
hence given schedule is recoverable.
 
S1: R1(x), 
W1(x)
, R2(x), R1(y), R2(y), 
W2(x)
, W1(y), 
C1
, 
C2
;
 
E
x
a
m
p
l
e
 
2
 
Consider the following schedule involving two transactions T
1
 and T
2
.
 
This is a recoverable schedule since T
1
 commits before T
2
, that makes the
value read by T
2
 correct.
 
Irrecoverable Schedule
 
The table below shows a schedule with two transactions, T1 reads
and writes A and that value is read and written by T2. T2 commits.
But later on, T1 fails. So we have to rollback T1. Since T2 has read the
value written by T1, it should also be rollbacked. But we have already
committed that. So this schedule is irrecoverable schedule. When Tj is
reading the value updated by Ti and Tj is committed before
committing of Ti, the schedule will be irrecoverable.
 
Recoverable with Cascading Rollback
 
The table below shows a schedule with two transactions, T1 reads
and writes A and that value is read and written by T2. But later on, T1
fails. So we have to rollback T1. Since T2 has read the value written by
T1, it should also be rollbacked. As it has not committed, we can
rollback T2 as well. So it is recoverable with cascading rollback.
Therefore, if Tj is reading value updated by Ti and commit of Tj is
delayed till commit of Ti, the schedule is called recoverable with
cascading rollback.
 
Cascadeless Recoverable Rollback
 
The table below shows a schedule with two transactions, T1 reads
and writes A and commits and that value is read by T2. But if T1 fails
before commit, no other transaction has read its value, so there is no
need to rollback other transaction. So this is a Cascadeless
recoverable schedule. So, if Tj reads value updated by Ti only after Ti
is committed, the schedule will be cascadeless recoverable.
 
Log and log records
 
The log is a sequence of log records, recording all the update activities
in the database. In a stable storage, logs for each transaction are
maintained. Any operation which is performed on the database is
recorded is on the log. Prior to performing any modification to
database, an update log record is created to reflect that modification.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
1.
Transaction identifier:
 Unique Identifier of the transaction that
performed the write operation.
2.
Data item:
 Unique identifier of the data item written.
3.
Old value:
 Value of data item prior to write.
4.
New value:
 Value of data item after write operation.
 
Other type of log records are:
 
1.
<Ti start>
: It contains information about when a transaction Ti starts.
2.
<Ti commit>
: It contains information about when a transaction Ti
commits.
3.
<Ti abort>
: It contains information about when a transaction Ti
aborts.
 
U
n
d
o
 
a
n
d
 
R
e
d
o
 
O
p
e
r
a
t
i
o
n
s
 
Because all database modifications must be preceded by creation of
log record, the system has available both the old value prior to
modification of data item and new value that is to be written for data
item. This allows system to perform redo and undo operations as
appropriate:
1.
Undo:
 using a log record sets the data item specified in log record to
old value.
2.
Redo:
 using a log record sets the data item specified in log record to
new value.
 
Checkpoint
 
The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
The checkpoint is like a bookmark. While the execution of the
transaction, such checkpoints are marked, and the transaction is
executed then using the steps of the transaction, the log files will be
created.
When it reaches to the checkpoint, then the transaction will be
updated into the database, and till that point, the entire log file will
be removed from the file. Then the log file is updated with the new
step of transaction till next checkpoint and so on.
The checkpoint is used to declare a point before which the DBMS was
in the consistent state, and all transactions were committed.
 
Recovery using Checkpoint
 
The recovery system reads log files from the end to start. It reads log files from T4
to T1.
Recovery system maintains two lists, a redo-list, and an undo-list.
The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
 
Concurrency Control
 
Concurrency Control is provided in a database to:
E
nforce isolation among transactions.
P
reserve database consistency through consistency preserving
execution of transactions.
R
esolve read-write and write-read conflicts.
 
C
oncurrency 
C
ontrol 
T
echniques are
 
1.
Two-phase locking Protocol
2.
Time stamp ordering Protocol
3.
Multi version concurrency control
4.
Validation concurrency control
 
Two Phase Locking
 
Locking is an operation which secures: permission to read, OR
permission to write a data item. Two phase locking is a process used
to gain ownership of shared resources without creating the possibility
of deadlock. The 3 activities taking place in the two phase update
algorithm are:
 
(i)
  Lock Acquisition
(ii)
 Modification of Data
(iii)
 
Release Lock
 
Cont..
 
Two phase locking prevents deadlock from occurring in distributed systems
by releasing all the resources it has acquired, if it is not possible to acquire
all the resources required without waiting for another process to finish
using a lock. This means that no process is ever in a state where it is
holding some shared resources, and waiting for another process to release
a shared resource which it requires. This means that deadlock cannot occur
due to resource contention. A transaction in the Two Phase Locking
Protocol can assume one of the 2 phases:
 
Cont..
 
Growing Phase:
 In this phase a transaction can only acquire locks but
cannot release any lock. The point when a transaction acquires all the
locks it needs is called the Lock Point.
Shrinking Phase:
 In this phase a transaction can only release locks but
cannot acquire any.
 
Time Stamp Ordering
 
A timestamp is a tag that can be attached to any transaction or any
data item, which denotes a specific time on which the transaction or
the data item had been used in any way. A timestamp can be
implemented in 2 ways. One is to directly assign the current value of
the clock to the transaction or data item. The other is to attach the
value of a logical counter that keeps increment as new timestamps
are required. The timestamp of a data item can be of 2 types:
(i) W-timestamp(X):
 This means the latest time when the data item X
has been written into.
(ii) R-timestamp(X):
 This means the latest time when the data item X
has been read from. These 2 timestamps are updated each time a
successful read/write operation is performed on the data item X.
 
Multiversion Concurrency Control
 
Multiversion schemes keep old versions of data item to increase
concurrency. 
Multiversion 2 phase locking:
 Each successful write
results in the creation of a new version of the data item written.
Timestamps are used to label the versions. When a read(X) operation
is issued, select an appropriate version of X based on the timestamp
of the transaction
 
Validation Concurrency Control
 
The optimistic approach is based on the assumption that the majority
of the database operations do not conflict. The optimistic approach
requires neither locking nor time stamping techniques. Instead, a
transaction is executed without restrictions until it is committed.
Using an optimistic approach, each transaction moves through 2 or 3
phases, referred to as read, validation and write.
 
Cont..
 
(i)
 During read phase, the transaction reads the database, executes
the needed computations and makes the updates to a private copy
of  the database values. All update operations of the transactions are
recorded in a temporary update file, which is not accessed by the
remaining transactions.
(ii)
 During the validation phase, the transaction is validated to ensure
that the changes made will not affect the integrity and consistency of
the database. If the validation test is positive, the transaction goes to
a write phase. If the validation test is negative, he transaction is
restarted and the changes are discarded.
(iii)
 During the write phase, the changes are permanently applied to
the database.
 
RDBMS
 
A relational database is a type of database that stores and provides access
to data points that are related to one another. Relational 
databases
 are
based on the relational model, an intuitive, straightforward way of
representing data in tables. In a relational database, each row in the table
is a record with a unique ID called the key. The columns of the table hold
attributes of the data, and each record usually has a value for each
attribute, making it easy to establish the relationships among data points.
 
 
 
Concept of Table Spaces
 
A table space is a storage structure containing tables, indexes, large
objects, and long data. They are used to organize data in a database
into logical storage groupings that relate to where data is stored on a
system. Table spaces are stored in database partition groups.
 
Segments, Extents and Block
 
Oracle allocates database space for all data in a database. The units of
logical database allocation are data blocks, extents, and segments. The
following illustration shows the relationships between these data
structures:
 
 
D
a
t
a
 
B
l
o
c
k
s
 
At the finest level of granularity, Oracle stores data in 
data blocks
 (also
called logical blocks, Oracle blocks, or pages). One data block corresponds
to a specific number of bytes of physical database space on disk. You set the
data block size for every Oracle database when you create the database.
This data block size should be a multiple of the operating system's block
size within the maximum limit. Oracle data blocks are the smallest units of
storage that Oracle can use or allocate.
In contrast, all data at the physical, operating system level is stored in bytes.
Each operating system has what is called a 
block size
. Oracle requests data
in multiples of Oracle blocks, not operating system blocks. Therefore, you
should set the Oracle block size to a multiple of the operating system block
size to avoid unnecessary I/O.
Oracle manages the storage space in the datafiles of a database in units
called 
data blocks
. A data block is the smallest unit of I/O used by a
database.
 
Extents
 
The next level of logical database space is called an 
extent
. An extent
is a specific number of contiguous data blocks that is allocated for
storing a specific type of information.
 
S
e
g
m
e
n
t
 
The level of logical database storage above an extent is called
segment
. A segment is a set of extents that have been allocated for a
specific type of data structure, and that all are stored in the same
tablespace. For example, each table's data is stored in its own 
data
segment
, while each index's data is stored in its own 
index
segment
.Oracle allocates space for segments in extents. Therefore,
when the existing extents of a segment are full, Oracle allocates
another extent for that segment. Because extents are allocated as
needed, the extents of a segment may or may not be contiguous on
disk. The segments also can span files, but the individual extents
cannot.
 
D
e
d
i
c
a
t
e
d
 
S
e
r
v
e
r
 
Dedicated Server 
is a kind of web hosting where a business rents a full
computer server with Internet access from a data center. Unlike a
conventional computer server in an office closest, the Dedicated Server is
kept in a climate-controlled data center where the network and Internet
uptime are kept up by expert engineers. The business would now be able
to rent dedicated servers instead of purchasing and setting up a computer
server at their office and employing additional staff to oversee and
maintain the server. This permits businesses to eliminate the hassles of
dealing with day-to-day server issues and focus on their business growth.
 
Multi-Threaded Server
 
A multithreaded server is any server that has more than one thread.
Because a transport requires its own thread, multithreaded servers
also have multiple transports. The number of thread-transport pairs
that a server contains defines the number of requests that the server
can handle in parallel.
 
 
 
D
i
s
t
r
i
b
u
t
e
d
 
D
a
t
a
b
a
s
e
 
A distributed database is a database that consists of two or more files
located in different sites either on the same network or on entirely
different networks. Portions of the database are stored in multiple
physical locations and processing is distributed among multiple
database nodes.
 
 
H
i
e
r
a
r
c
h
i
c
a
l
 
Q
u
e
r
i
e
s
 
A hierarchical query operates on rows in which one or more column
values correspond to nodes within a logical structure of parent-child
relationships. If parent rows have multiple children, sibling
relationships exist among child rows of the same parent.
 
Inline Query
 
An inline query is a type of sub-query present in FROM clause of a
SQL as a data source. Below is the type of sub-query: If it present in
the SELECT list, it is called “sub-select”. If it present in the FROM
clause, it is called “inline-query” or “inline-view”
 
Flashback
 
Flashback Query allows users to see the view of past data, If in case
some data or table is being deleted by the user, then the flashback
query provides us an opportunity to view that data again and perform
manipulations over it.
In flashback queries we have an concept of flash area, in flash area we
store the deleted data which can be viewed if needed in future.
To use the feature of flashback query, our server must be configured
according to automatic undo management. If our system supports the
traditional approach of rollback then we can not perform flashback
query on such systems.
We can enable the flashback query using the package
DBMS_FLASHBACK. This package enables us to view the data in past
by specifying the System change number or the exact time in the
past.
Slide Note
Embed
Share

Learn about dangling tuples in DBMS, which are tuples that do not participate in a natural join, potentially indicating consistency issues in the database. Referential integrity constraints play a key role in identifying and addressing dangling tuples.


Uploaded on Mar 06, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Dangling Tuples

  2. What is Dangling Tuples In DBMS if there is a tuple that does not participate in a natural join we called it as dangling tuple . It may gives indication consistency problem in the database. Another definition of dangling problem tuple is that a tuple with a foreign key value that not appear in the referenced relation is known as dangling tuple. In DBMS Referential integrity constraints specify us exactly when dangling tuples indicate problem.

  3. Multivalued Dependency Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute. A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes. BIKE_MODEL MANUF_YEAR COLOR M2011 2008 White M2001 2008 Black M3001 2013 White M3001 2013 Black M4006 2017 White M4006 2017 Black Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

  4. Transaction System The transaction refers to a small unit of any given program that consists of various low-level tasks. Every transaction in DBMS must maintain ACID A (Atomicity), C (Consistency), I (Isolation), D (Durability). One must maintain ACID so as to ensure completeness, accuracy, and integrity of data.

  5. ACID Properties

  6. Atomicity: By this, we mean that either the entire transaction takes place at once or doesn t happen at all. There is no midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves the following two operations. Abort: If a transaction aborts, changes made to the database are not visible. Commit: If a transaction commits, changes made are visible. Atomicity is also known as the All or nothing rule . Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.

  7. If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before write(Y)), then the amount has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore, the transaction must be executed in its entirety in order to ensure the correctness of the database state.

  8. Consistency: This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of a database. Referring to the example above, The total amount before and after the transaction must be maintained. Total before T occurs Total after T occurs = Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result, T is incomplete. = 500 400 + + 200 300 = = 700. 700.

  9. Isolation: This concurrently without leading to the inconsistency of the database state. Transactions occur independently Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that transaction is written to memory or has been committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some Let X= 500, Y = 500. property ensures that multiple transactions can occur without interference. order.

  10. Durability: This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs. These updates now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost.

  11. Important points: Responsibility for maintaining properties Property Atomicity Transaction Manager Consistency Application programmer Isolation Concurrency Control Manager Durability Recovery Manager

  12. Serializability Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It helps in maintaining the transactions to execute simultaneously without interleaving one another.

  13. Schedules and Serializable Schedules in DBMS Schedules are a series of operations performing one transaction to the other. R(X) means Reading the value: X; and W(X) means Writing the value: X.

  14. Schedules in DBMS are of two types: Serial Schedule -A schedule in which only one transaction is executed at a time, i.e., one transaction is executed completely before starting another transaction.

  15. Example Transaction-1 Transaction-2 R(a) W(a) R(b) W(b) R(b) W(b) R(a) W(a)

  16. Non-serial Schedule- A schedule in which the transactions are interleaving or interchanging. There are several transactions executing simultaneously as they are being used in performing real-world database operations. These transactions may be working on the same piece of data. Hence, the serializability of non-serial schedules is a major concern so that our database is consistent before and after the execution of the transactions.

  17. Example Transaction-1 Transaction-2 R(a) W(a) R(b) W(b) R(b) R(a) W(b) W(a)

  18. What is a serializable schedule? A non-serial schedule is called a serializable schedule if it can be converted to its equivalent serial schedule. In simple words, if a non-serial schedule and a serial schedule result in the same then the non-serial schedule is called a serializable schedule.

  19. Testing of Serializability To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A serialization Graph is nothing but a Directed Graph ofthe entire transactions ofa schedule. It can be defined as a Graph G(V, E) consisting of a set of directed- edges E = {E1, E2, E3, ..., En} and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two operations - READ, WRITE performed by acertain transaction.

  20. Cont.. Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.

  21. Example If there is a cycle present in the serialized graph then the schedule is non-serializable because the cycle resembles that one transaction is dependent on the other transaction and vice versa. It also means that there are one or more conflicting pairs of operations in the transactions. On the other hand, no-cycle means that the non-serial schedule is serializable.

  22. What is a conflicting pair in transactions? Two operations inside a schedule are called conflicting if they meet these three conditions: 1.They belong to two different transactions. 2.They are working on the same data piece. 3.One of them is performing the WRITE operation. To conclude, let s take two operations on data: "a". The conflicting pairs are: 1.READ(a) - WRITE(a) 2.WRITE(a) - WRITE(a) 3.WRITE(a) - READ(a)

  23. Recoverability Recoverability A transaction may not execute completely due to hardware failure, system crash or software issues. In that case, we have to roll back the failed transaction. But some other transaction may also have used values produced by the failed transaction. So, we have to roll back those transactions as well.

  24. Recoverable Schedules Schedules in which transactions commit only after all transactions whose changes they read commit are called recoverable schedules. In other words, if some transaction Tjis reading value updated or written by some other transaction Ti, then the commit of Tjmust occur after the commit of Ti. Example- Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is recoverable. S1: R1(x), W1(x), R2(x), R1(y), R2(y), W2(x), W1(y), C1, C2;

  25. Example 2 Example 2 Consider the following schedule involving two transactions T1and T2. T1 T2 R(A) W(A) W(A) R(A) commit commit This is a recoverable schedule since T1commits before T2, that makes the value read by T2correct.

  26. Irrecoverable Schedule The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. But we have already committed that. So this schedule is irrecoverable schedule. When Tj is reading the value updated by Ti and Tj is committed before committing of Ti, the schedule will be irrecoverable.

  27. Recoverable with Cascading Rollback The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. As it has not committed, we can rollback T2 as well. So it is recoverable with cascading rollback. Therefore, if Tj is reading value updated by Ti and commit of Tj is delayed till commit of Ti, the schedule is called recoverable with cascading rollback.

  28. Cascadeless Recoverable Rollback The table below shows a schedule with two transactions, T1 reads and writes A and commits and that value is read by T2. But if T1 fails before commit, no other transaction has read its value, so there is no need to rollback other transaction. So this is a Cascadeless recoverable schedule. So, if Tj reads value updated by Ti only after Ti is committed, the schedule will be cascadeless recoverable.

  29. Log and log records The log is a sequence of log records, recording all the update activities in the database. In a stable storage, logs for each transaction are maintained. Any operation which is performed on the database is recorded is on the log. Prior to performing any modification to database, an update log record is created to reflect that modification. An update log record represented as: <Ti, Xj, V1, V2> has these fields: 1.Transaction identifier: Unique Identifier of the transaction that performed the write operation. 2.Data item: Unique identifier of the data item written. 3.Old value: Value of data item prior to write. 4.New value: Value of data item after write operation.

  30. Other type of log records are: 1.<Ti start>: It contains information about when a transaction Ti starts. 2.<Ti commit>: It contains information about when a transaction Ti commits. 3.<Ti abort>: It contains information about when a transaction Ti aborts.

  31. Undo and Redo Operations Undo and Redo Operations Because all database modifications must be preceded by creation of log record, the system has available both the old value prior to modification of data item and new value that is to be written for data item. This allows system to perform redo and undo operations as appropriate: 1.Undo: using a log record sets the data item specified in log record to old value. 2.Redo: using a log record sets the data item specified in log record to new value.

  32. Checkpoint The checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk. The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the transaction is executed then using the steps of the transaction, the log files will be created. When it reaches to the checkpoint, then the transaction will be updated into the database, and till that point, the entire log file will be removed from the file. Then the log file is updated with the new step of transaction till next checkpoint and so on. The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all transactions were committed.

  33. Recovery using Checkpoint The recovery system reads log files from the end to start. It reads log files from T4 to T1. Recovery system maintains two lists, a redo-list, and an undo-list. The transaction is put into redo state if the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the transactions are removed and then redone before saving their logs.

  34. Concurrency Control Concurrency Control is provided in a database to: Enforce isolation among transactions. Preserve database consistency through consistency preserving execution of transactions. Resolve read-write and write-read conflicts.

  35. Concurrency Control Techniques are 1.Two-phase locking Protocol 2.Time stamp ordering Protocol 3.Multi version concurrency control 4.Validation concurrency control

  36. Two Phase Locking Locking is an operation which secures: permission to read, OR permission to write a data item. Two phase locking is a process used to gain ownership of shared resources without creating the possibility of deadlock. The 3 activities taking place in the two phase update algorithm are: (i) Lock Acquisition (ii) Modification of Data (iii) Release Lock

  37. Cont.. Two phase locking prevents deadlock from occurring in distributed systems by releasing all the resources it has acquired, if it is not possible to acquire all the resources required without waiting for another process to finish using a lock. This means that no process is ever in a state where it is holding some shared resources, and waiting for another process to release a shared resource which it requires. This means that deadlock cannot occur due to resource contention. A transaction in the Two Phase Locking Protocol can assume one of the 2 phases:

  38. Cont.. Growing Phase: In this phase a transaction can only acquire locks but cannot release any lock. The point when a transaction acquires all the locks it needs is called the Lock Point. Shrinking Phase: In this phase a transaction can only release locks but cannot acquire any.

  39. Time Stamp Ordering A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific time on which the transaction or the data item had been used in any way. A timestamp can be implemented in 2 ways. One is to directly assign the current value of the clock to the transaction or data item. The other is to attach the value of a logical counter that keeps increment as new timestamps are required. The timestamp of a data item can be of 2 types: (i) W-timestamp(X): This means the latest time when the data item X has been written into. (ii) R-timestamp(X): This means the latest time when the data item X has been read from. These 2 timestamps are updated each time a successful read/write operation is performed on the data item X.

  40. Multiversion Concurrency Control Multiversion schemes keep old versions of data item to increase concurrency. Multiversion 2 phase locking: Each successful write results in the creation of a new version of the data item written. Timestamps are used to label the versions. When a read(X) operation is issued, select an appropriate version of X based on the timestamp of the transaction

  41. Validation Concurrency Control The optimistic approach is based on the assumption that the majority of the database operations do not conflict. The optimistic approach requires neither locking nor time stamping techniques. Instead, a transaction is executed without restrictions until it is committed. Using an optimistic approach, each transaction moves through 2 or 3 phases, referred to as read, validation and write.

  42. Cont.. (i) During read phase, the transaction reads the database, executes the needed computations and makes the updates to a private copy of the database values. All update operations of the transactions are recorded in a temporary update file, which is not accessed by the remaining transactions. (ii) During the validation phase, the transaction is validated to ensure that the changes made will not affect the integrity and consistency of the database. If the validation test is positive, the transaction goes to a write phase. If the validation test is negative, he transaction is restarted and the changes are discarded. (iii) During the write phase, the changes are permanently applied to the database.

  43. RDBMS A relational database is a type of database that stores and provides access to data points that are related to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables. In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.

  44. RDBMS DBMS Data stored is in table format Data stored is in the file format Multiple data elements are accessible together Individual access of data elements Data in the form of a table are linked together No connection between data Normalisation is not achievable There is normalisation Support distributed database No support for distributed database Data is stored in a large amount Data stored is a small quantity

  45. Here, redundancy of data is reduced with the help of key and indexes in RDBMS Data redundancy is common RDBMS supports multiple users DBMS supports a single user It features multiple layers of security while handling data There is only low security while handling data The software and hardware requirements are higher The software and hardware requirements are low Oracle, SQL Server. XML, Microsoft Access.

  46. Concept of Table Spaces A table space is a storage structure containing tables, indexes, large objects, and long data. They are used to organize data in a database into logical storage groupings that relate to where data is stored on a system. Table spaces are stored in database partition groups.

  47. Segments, Extents and Block Oracle allocates database space for all data in a database. The units of logical database allocation are data blocks, extents, and segments. The following illustration shows the relationships between these data structures:

  48. Data Blocks Data Blocks At the finest level of granularity, Oracle stores data in data blocks (also called logical blocks, Oracle blocks, or pages). One data block corresponds to a specific number of bytes of physical database space on disk. You set the data block size for every Oracle database when you create the database. This data block size should be a multiple of the operating system's block size within the maximum limit. Oracle data blocks are the smallest units of storage that Oracle can use or allocate. In contrast, all data at the physical, operating system level is stored in bytes. Each operating system has what is called a block size. Oracle requests data in multiples of Oracle blocks, not operating system blocks. Therefore, you should set the Oracle block size to a multiple of the operating system block size to avoid unnecessary I/O. Oracle manages the storage space in the datafiles of a database in units called data blocks. A data block is the smallest unit of I/O used by a database.

  49. Extents The next level of logical database space is called an extent. An extent is a specific number of contiguous data blocks that is allocated for storing a specific type of information.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#