This chapter explores the basic issues and research challenges having key
importance to the performance of DRTDBS followed by the contributions and
organization of this thesis.
1.2 PERFORMANCE ISSUES AND RESEARCH CHALLENGES
The implementation of DRTDBS is difficult due to the conflicting requirements
of maintaining data consistency and meeting transaction’s deadlines [97,148]. The
difficulty comes from the unpredictability of the transactions’ response times
[70,71,114]. Each distributed transaction accessing a data item takes a variable
amount of time due to concurrency control, I/O and communication delays. While
maintaining the consistency of underlying database, scheduling and management
of the system resources in DRTDBS should also take into account the timing
constraints. Access to CPU, main memory, I/O devices and shared data should be
managed to make the best effort to satisfy the transaction deadlines.
One of the most important issues in design of DRTDBS is transaction
scheduling [18,68,88]. The transaction scheduling in DRTDBS involves both the
CPU scheduling and the data scheduling and is done according to the priorities
assigned to the transactions. As a result, the role of the priority assignment
policy becomes an important issue in deciding the performance of the system
because priorities determine the order of the transactions to access resources
which in turn affects their likeliness to meet the deadlines. In traditional
databases, when conflicts occur, the preferences tend to be based either on
fairness or on resource consumption [151]. However, the transaction scheduling in
DRTDBS is done according to the urgency of the transactions that decides their
priorities. The priority assignment problem has been addressed by very few
researches [86]. Generally, the priority of a transaction is determined on the basis
of its deadline such as in earliest deadline first (EDF) [85,93] priority assignment
policy; both fairness and maximum resource utilization become secondary goal
[130]. This can cause two problems. First, more CPU resource is wasted if closer
to completion transactions are aborted in favor of higher priority transactions [68].
Second, longer transactions may be harder to finish creating a starvation problem
[63]. Execution of a global transaction in a distributed system requires the
2
execution of cohorts on different sites. Most heuristics [66,67,85] for priority
assignment in DRTDBS consider that subtasks (cohorts) of a transaction are
executed sequentially. Except ultimate deadline (UD), other heuristics are not
suitable when the subtasks (cohorts) of a transaction are executed in parallel. The
UD also becomes ineffective when data contention is non - trivial [85].
The atomic commit protocols play a key role in supporting global atomicity for
the distributed real time transactions [9,13,82,83,101]. These protocols are used
to ensure that all cohorts agree on the final outcome of the transaction. They
typically require exchange of multiple messages [127] in multiple phases among
the participating sites, and also require to maintain logs of data to recover from
failures [59,80,117]. This significantly increases the execution time of the
transactions and can adversely affect the system’s ability to meet transaction
deadlines. Due to distributed nature of the transactions and in presence of other
sources of unpredictability such as data access conflicts, uneven distribution of
transactions over the sites, variable local CPU scheduling time, communication
delay, failure of coordinator and cohort’s sites etc., it is not easy to meet the
deadline of all transactions in DRTDBS. The unpredictability in the commitment
phase makes it more serious because the blocking time of the waiting cohorts due
to execute-commit conflict may become longer. Hence, due to unique
characteristics of the committing transactions [81] and unpredictability in the
commitment process, design of an efficient commit protocol is another important
issue that affects the performance of DRTDBS [49].
Important database system resources are main memory, CPU, disk and data
items [20,104,105]. Before the start of execution of a transaction, buffer space in
main memory is allocated for the transaction [62]. When the main memory is
running low, a transaction may be blocked from execution. The amount of memory
available in the system thus limits the number of concurrently executable
transactions. In large scale real time database systems, the execution of
transactions will be significantly slowed down if available memory is low. So, the
effective use of available main memory space in data intensive applications is
another challenging issue. During the execution of a transaction, temporary
records are created to maintain the status of the transaction’s execution. These
3
temporary records are kept in the main memory until the transaction commits.
This consumes a substantial amount of main memory. Since, unpredictability in
the commitment phase may make the transaction to stay for a long period in the
system; memory will be held up for a long period and will be not available for other
transactions. So, this necessitates the design of commit protocols that save
memory by creating less temporary objects.
The design and implementation of DRTDBS introduce several other
interesting problems. Among these problems, predictability and consistency are
fundamental to real time transaction processing, but sometimes these require
conflicting actions [70,84,124]. To ensure consistency, we may have to block
certain transactions. Blocking of these transactions, however, may cause
unpredictable transaction execution and may lead to the violation of timing
constraints. There are a number of other sources of unpredictability such as
communication delays, site failures [22] and transaction’s interaction with the
underlying operating system and I/O subsystems [16]. Other design issues of
DRTDBS are data access mechanism and invariance, new metrics for database
correctness and performance, maintaining global system information, security,
fault tolerance, failure recovery etc. Again, there is also no adequately designed
technique for scheduling the CPU as being the primary resource in the DRTDBS
[6].
Although, a lot of research has been done on these issues, there still exist
many challenging and unresolved issues. Due to the heterogeneity of these
issues, we have confined our work to only some of these issues. Our work
involves design of new priority assignment policies and commit protocols and the
comparison of their performance with existing policies/protocols. We assumed that
the transactions are firm real time and data items accessed by the transactions
are known before the start of execution of the transactions. Two locking
approaches are used by the transactions to obtain a lock on data items viz., static
two phase locking (S2PL) and dynamic two phase locking (D2PL). The deadlock
freedom and lower communication overhead of locking by using S2PL makes it
attractive for DRTDBS [73,75,132]. So, we used S2PL with higher priority
concurrency control algorithm to access data in mutually exclusive way.
4
1.3 CONTRIBUTIONS OF THE THESIS
The work reported in this thesis provides better solutions for some of the
issues mentioned above. Major contributions of this thesis may be described as
follows:
(i) Development of a simulator for both the main memory resident and the
disk resident DRTDBS.
(ii) A new scheme to determine the priorities of cohorts executing in parallel
along with the method to compute the deadlines of the global and the
local transactions have been proposed. In our scheme, each cohort is
assigned an initial priority which is inversely proportional to the number of
locks required by the cohort at its execution site. A temporary
intermediate priority of the cohort is calculated when a data contention
occurs and initial priority of newly arrived cohort (T
A
) is higher than the
priority of lock holding cohort (T
L
). The intermediate priorities are based
on the remaining execution time needed by T
L
and the slack time
available with T
A
. This minimizes the abort of near completion low priority
lock holding cohorts. The proposed scheme has been compared with EDF
priority assignment policy
(iii) The dependencies created due to read/update type locks have been
redefined, and then a static two phase locking with high priority (S2PL-
HP) based commit protocol named as SWIFT has been proposed. The
performance of SWIFT has been compared with 2SC and PROMPT for
both main memory resident and disk resident databases with and without
communication delay. The performance of SWIFT protocol has also been
analyzed (i) for partial read-only optimization and (ii) for the case when
cohorts of the same transaction (siblings) are allowed to send messages
to each other.
(iv) A new locking scheme has been developed for the database model that
permits two types of write operations: blind write and update. All types of
dependencies that may arise by allowing a committing cohort to lend its
data to an executing cohort have been redefined. A memory efficient
5
commit protocol (MECP) has been proposed on the basis of the new
locking scheme, and its performance has been compared with 2SC and
PROMPT.
A list of the author’s research publication is given at the end of the thesis.
1.4 ORGANIZATION OF THE THESIS
Chapter 2 of the thesis reviews the past and ongoing work in the areas of
priority assignment policies, commit protocols and effective use of available main
memory space in data intensive applications. The design of new priority
assignment policies for the allocation of CPU and data to cohorts has been
discussed in chapter 3. Chapter 4 deals with the design and performance
evaluation of SWIFT, a new commit protocol. Chapter 5 of the thesis describes
the design of a memory efficient commit protocol (MECP) to reduce the storage
space requirement for intermediate (temporary) records and compares its
performance with other commit protocols. Finally, chapter 6 concludes the thesis
and proposes some research directions.
6
CHAPTER - 2
BACKGROUND AND LITERATURE REVIEW
2.1 INTRODUCTION
Databases and database systems have become an essential component of
everyday life in modern society. In the course of a day, most of us encounter
several activities that involve some interaction with databases. Nowadays,
because of the information technology revolution, fast access to information and
its efficient management are key to the success of any activity such as business
and other similar ones [50,60,99]. Today’s business applications are not the old-
styled batch applications; rather they do their data processing activities on-line
[123]. Modern electronic services and electronic commerce applications,
characterized by high volume of transactions, cannot survive without on-line
support of computer systems and database technology [7]. Therefore, database
systems (DBSs) play an important role in managing the information of fast
growing current businesses environment.
In this chapter, we review the current status of research in the area of
DRTDBS and identify important performance issues which are unique to their
study.
2.2 DISTRIBUTED DATABASE SYSTEM
DBS can be viewed as a collection of the data items which are shared by
many users [109,140]. They are designed to manage huge amount of the data.
The management of data basically involves the definition of structures for its
storage and provision of mechanisms for manipulation of this stored information.
Thus, a DBS is a collection of objects, which satisfy the need of users besides a
set of integrity constraints. Database Systems can be broadly classified as
centralized or distributed. The centralized database systems are those that run on
a single computer system. Such systems may range from single-user database
systems running on personal computers to high-performance database systems
running on mainframe systems. The distributed database systems (DDBS) consist
7
of a collection of sites, connected together via some means of communication
networks, in which, each site is a database system site in its own right but the
sites have agreed to work together, so that a user at any site can access data
from anywhere in the network, exactly as if, the data are all stored at the user’s
own site [34]. We can define DDBS as a collection of multiple logically interrelated
databases distributed over a computer network, and a distributed database
management system (DDBMS) which manages distributed databases while
making the distribution transparent to the user [147]. DDBS bring the advantages
of distributed computing to the database management domain and fit more
naturally in the decentralized structure of many business organizations.
2.3 DISTRIBUTED REAL TIME DATABASE SYSTEM
Real Time systems (RTS) are those for which correctness depends not only
on the logical properties of the produced results, but also on the temporal
properties of these results [7,118,149]. Typically, real time systems are associated
with critical applications, in which human lives or expensive machineries may be
at stake [79,87]. Hence, in such systems, an action performed too late (or too
early) or a computation which uses temporally invalid data may be useless and
sometimes harmful even if such an action or computation is functionally correct.
As RTS continue to evolve, their applications become more and more complex,
and often require timely access and predictable processing of massive amounts of
real time data [144]. The database systems, which are especially designed for the
efficient processing of these types of real time data, are referred to as distributed
real-time database systems (DRTDBS). DRTDBS can be viewed as an
amalgamation of the conventional DDBS and RTS, and like a conventional DDBS,
it has to process distributed transactions and guarantee their basic correctness
criteria [64,16,128]. Thus, DRTDBS are collection of multiple, logically interrelated
databases distributed over a computer network where transactions have explicit
timing constraints, usually in the form of deadlines [90,91]. In such a system, data
items shared among transactions are spread over remote locations. Accessing the
shared data items must be controlled in order to maintain database’s logical
consistency by applying a concurrency control algorithm. At the same time,
8
transactions have to be scheduled according to their timeliness to finish within
their timing constraints, i.e., transaction processing in DRTDBS must satisfy both
database logical consistency and timing constraints. What makes DRTDBS
different from a conventional real-time system is the requirement of preserving the
consistency of data besides considering the real-time requirements of the
transaction [14]. Satisfying the timing constraints of various real time activities in
distributed systems may be difficult due to the distributed nature of the
transactions.
2.3.1 Distributed Real Time Transaction
When the user programs for interaction with database, partially ordered sets
of read and write operations are generated [36]. This sequence of operations on
the database is called a transaction and it transforms the current consistent state
of the database system into a new consistent state. In DRTDBS, there are two
types of transactions: global and local. Global transactions are distributed real-
time transaction executed at more than one site whereas the local transactions
are executed at the generating site only. A common model of a distributed
transaction is that there is one process called coordinator which is executed at the
site where the transaction is submitted and a set of other processes called cohorts
that execute on behalf of the transaction at other sites that are accessed by the
transaction.
In fact, the distributed real time transaction processing is a form of transaction
processing that supports transactions whose operations are distributed among
different computers or among different databases from different vendors. So, in a
distributed real time transaction, the operations are executed at the site where the
required data item resides and is associated with time constraints. Transfer of
money from one account to another, reservation of train tickets, filing of tax
returns, entering marks on a student's grade sheet etc. are some of the examples
of distributed real time transactions. The transaction is an atomic unit of work,
which is either completed in it’s entirely or not at all. Hence, a distributed commit
protocol is needed to guarantee the uniform commitment of distributed transaction
execution [115]. The commit operation implies that the transaction is successful,
9
and, hence all of its updates should be incorporated into the database
permanently. An abort operation indicates that the transaction has failed, and
hence, requires the database management system to cancel or abolish all of its
effects in the database system. In short, a transaction is an "all or nothing" unit of
the execution.
2.3.2 Types of Distributed Real Time Transaction
In general, Real time transactions are classified into three types namely hard,
soft and firm. No hard real time transaction should have its deadline missed, and
its deadline must be guaranteed by the system. On the other hand, any deadline
violations of the soft real time transactions may only result in the performance
degradation of the system. The major performance metric for the soft real time
transaction is the number or percentage of deadline violations or their average or
worst case response time. The firm real time transactions are a special kind of soft
real time transactions except that firm real time transactions will be killed when
their deadline expire. The performance metrics is the number or percentage of
deadline violations. The completion of a real time transaction might contribute a
value to the system. The relationship between the value imparted by a real time
transaction and its completion time can be considered as a value function of the
time. After the soft real time transaction misses its deadline, its value might
decrease with time. A firm real time transaction loses its value after its deadline
expires. When a hard real time transaction misses its deadline, its value becomes
negative. It means that a catastrophe might occur.
2.3.3 Distributed Transaction Execution Model
There are two types of distributed transaction execution model, i.e., sequential
and parallel [4,14]. In the sequential execution model, there can be at most one
cohort of a transaction at each execution site, and only one cohort can be active
at a time. After the successful completion of one operation, next operation in the
sequence is executed by the appropriate cohort. At the end of execution of the
last operation, the transaction can be committed. In the parallel execution model,
the coordinator of the transaction spawns all cohorts together and sends them for
10
execution on respective sites [14]. All cohorts then execute in parallel. The
assumption here is that the operations performed by one cohort during its
execution at one site are independent of the results of the operations performed
by some other cohort at some other site. In other words, the sibling cohorts do not
require any information from each other to share [144].
2.3.4 Locking Mechanism
One of the fundamental properties of a transaction is isolation. When several
transactions execute concurrently in the database, the isolation property must be
preserved. To ensure this, the system must control the interaction among the
concurrent transactions; this control is achieved through concurrency control
schemes [109,140]. Some of the main concurrency control techniques such as
two phase locking (2PL) are based on the concept of locking of data items. Locks
are used to ensure the noninterference property of concurrently executing
transactions and to guarantee serializability of the schedules. A transaction is said
to follow the two-phase locking protocol, if all locking operations precedes the first
unlock operation in the transaction. Such a transaction can be divided into two
phases: an expanding or growing (first) phase, during which new locks on data
items can be acquired but none can be released; and a shrinking (second) phase,
during which existing locks can be released but no new locks can be acquired
[95]. It ensures serializability, but not deadlock freedom. The two phase locking
can be static or dynamic. The working principle of static two phase locking (S2PL)
is similar to dynamic two phase locking (D2PL) except for the procedure of setting
locks [73]. In D2PL, transactions acquire locks to access data items on demand
and release locks upon termination or commit [133]. In S2PL, the required locks of
a transaction are assumed to be known before its execution [136]. The prior
knowledge of the required data items by a transaction is easy to address in
DRTDBS as it is generally agreed that the behavior and the data items to be
accessed by real-time transactions, especially hard real-time transactions, are
much more well-defined and predictable. So, as a result of the better defined
nature of real time transactions, it is not uncommon to assume that the locking
information of a transaction is known before its processing. For example, in
11
priority ceiling protocol (PCP), the locks required by the transactions must be
known before their arrivals with predefined priorities [119]. A transaction has to
obtain all its required locks before the start of execution. If any one of its locks is
being used by another transaction, it is blocked and releases all seized locks. The
locks to be accessed by a transaction at each site can be packed into a single
message for transmission. In DRTDBS, the number of messages for setting the
locks is generally smaller for distributed S2PL than for D2PL. So, the number of
messages and the time delay for remote locking can be significantly reduced.
There is no local deadlock and a distributed deadlock is much easier to resolve
with S2PL than with D2PL. S2PL protocol is deadlock free [121] because blocked
transactions cannot hold locks. In the last two decades, a lot of work has been
done to compare D2PL with S2PL [86]. Most researchers agree that D2PL is a
better choice for conventional non-real-time database systems than S2PL
because of the followings reasons [65].
(i) Smaller probability of lock conflicts due to the shorter average lock holding
time in D2PL; and
(ii) Difficulty in determining the required locks before the processing of
transaction
However, the meaning of “better” performance in DRTDBS is quite different
from that in conventional non-real-time database systems. In the conventional
database systems, the major performance measures are mean system throughput
and mean system response time. On the contrary, minimizing the number of
missed deadlines is the main concern in DRTDBS.
In non-real-time S2PL, a transaction is blocked if any of its required locks are
seized in conflicting mode by any other transactions. While it is being blocked,
some of its required locks which may be free initially, can be seized by other
transactions. Thus, even when the original conflicting locks are released, the
transaction may be blocked by other transactions which arrive after it. So, the
blocking time of higher priority transaction can be arbitrarily long due to prolonged
blocking as a result of waiting for multiple locks [75]. An alternative for
12
concurrency control in DRTDBS is to use real time S2PL (RT-S2PL). In RT-S2PL,
each lock in the database is defined with a priority equal to the priority of the
highest priority transaction which is waiting for that lock. All the locks of the data
items to be accessed by a transaction have to be set in appropriate modes before
processing of the transaction. If any of the required locks is in a conflicting mode
or has a priority higher than that of the requesting transaction; none of the
required locks will be set and the transaction will be blocked instead. However, for
the locks with lower priorities, their priorities will be updated to that of the
requesting transaction. These features of RT-S2PL make it attractive for DRTDBS
[73,75,132]. In RT-S2PL protocols, the problem of locking-induced thrashing can
be prevented because lock requesting transactions can be blocked due to a lock
conflict [137,138,139].
Based on S2PL and H2PL, static two phase locking with high priority (S2PL-
HP) is used in the present work. Here, the lock requesting cohort waits for the
data item to be released when the requested data item is held by one or more
higher priority cohorts in a conflicting mode. On the other hand, if the data item is
held by a lower priority cohort in a conflicting way, the lower priority cohort is
aborted and requesting cohort is granted the desired locks.
2.3.5 Issues in Distributed Real Time Database Systems
The time expressed in the form of a deadline is a critical factor to be
considered in distributed real time transaction [102]. The completion of
transactions on or before its deadline is one of the most important performance
objectives of DRTDBS. There are several important factors that contribute to the
difficulty in meeting the deadlines of a distributed transaction [38]. One of the most
significant factors is the data conflict among transactions [26]. The data conflict
that occurs among executing transactions is referred to as executing-executing
conflict. The conflict involving executing-committing transactions is termed as
executing-committing conflict. The executing-executing conflicts are resolved by
distributed concurrency control protocols. A number of real time concurrency
control protocols have been proposed in the past. When a data conflict occurs
between an executing and a committing transaction, a commit protocol has to
13
work with concurrency control protocol to handle it and to ensure the transaction
atomicity. The traditional commit protocols block the lock requesting transactions
until the lock holding transaction releases the lock. The blocked transactions may
seriously affect the performance of DRTDBS, especially when failures occur
during the commitment phase.
Some of the other issues in DRTDBS are scheduling of distributed
transactions, optimizing the use of memory, management of distributed
transactions, deadline assignment strategies, difficulty in maintaining global
system information, possibility of distributed deadlocks etc. [17]. Among these
issues, priority assignment policy for transaction scheduling, commit protocol and
memory optimization are the only issues considered in this thesis. In the following
sections, we will review the literature which addresses these issues.
2.4 PRIORITY ASSIGNMENT POLICY
Usually, a real time database system is a part of a large and complex real
time system. The tasks in RTS and transactions in DRTDBS are similar in the
sense that both are units of work as well as units of scheduling [61,70,71].
However, tasks and transactions are different computational concepts and their
differences affect how they should be scheduled and processed. Unlike
transactions, tasks in real time systems do not consider consistency of the data
items used. Though many real time task scheduling techniques are still used for
scheduling real time transactions, the transaction scheduling in real time database
systems needs a different approach than that of which is used in scheduling tasks
in the real time systems. The following sub sections next to it review the literature
on task/transaction scheduling in centralized and distributed environments
respectively.
2.4.1 Priority Assignment Policies in Centralized Environment
The priority assignment techniques proposed for centralized real time systems
can be broadly classified into three categories: static, dynamic or hybrid of both. A
scheduling algorithm is said to be static, if priorities are assigned to tasks once
and for all. A scheduling algorithm is said to be dynamic, if the priority of a task
14