cflCopyright by Hidenori Nakazato 1993 ISSUES ON SYNCHRONIZING AND SCHEDULING TASKS IN REAL-TIME DATABASE SYSTEMS BY HIDENORI NAKAZATO B.Engr., Waseda University, 1982 M.S., University of Illinois, 1989 THESIS Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 1993 Urbana, Illinois ISSUES ON SYNCHRONIZING AND SCHEDULING TASKS IN REAL-TIME DATABASE SYSTEMS Hidenori Nakazato, Ph.D. Department of Computer Science University of Illinois at Urbana-Champaign, 1993 Kwei-Jay Lin, Advisor Real-time systems have timing requirements. In database systems, database operations are performed in a sequence so as to maintain database consistency. By combining the features of a real-time system and a database system, a real-time database system must satisfy requirements from both models. Unfortunately, the requirements from the two models are not always compatible. In order to satisfy the timing requirements, transactions must be scheduled in a temporally predictable fashion. On the other hand, some transactions may have to be suspended in order to maintain the database consistency. The suspension causes disturbances in scheduling and may result in temporally unpredictable behavior. The priority ceiling protocol has been proposed to satisfy the timing requirements under the existence of suspension. However, the priority ceiling protocol does not maintain database consistency. We propose a set of algorithms integrating scheduling and concurrency control in order to maintain database consistency and still satisfy the timing requirements. We describe the properties of the algorithms and compare them with the priority ceiling protocol. For real-time database systems with monitoring and controlling operations, there are additional requirements for data consistency. We therefore define external consistency and temporal consistency. In order to satisfy these additional requirements, we propose design strategies for the temporal aspect of a real-time database system. Finally, we describe how to assign specific deadlines to transactions from the timing requirements while utilizing the processor effectively. iii To my parents, Hideo and Tomi, for their love and encouragement. And to my mother, Ai, peacefully sleeping in Heaven. iv Acknowledgements I would like to express my sincere gratitude to my Ph.D. thesis advisor Professor Kwei-Jay Lin for his guidance, encouragement, and patience without which this thesis would not be possible. I especially appreciate his spending much time with me discussing problems, research, and life. I appreciate the hospitality of his family. I would like to thank the other members of my thesis committee: Professors Geneva Belford, Andrew Chien, Jane Liu, and Marianne Winslett, and Doctor Tony Ng for their suggestions, time, and interest. Special thanks go to Professor Jane Liu. She took care of me while Professor Kwei-Jay Lin was absent and gave me a chance to work on interesting problems. She also gave me many opportunities to meet people in the real-time system research community. This will be a great help in my career. I am grateful to all the members of the real-time research group for their friendship. It was a pleasant experience to be surrounded by those wonderful people. Exchanging research ideas and discussing problems with them have enriched my knowledge. In particular, the discussions with Min-Ih Chen and Ching-Chih Han brought me important ideas for my research work. I highly appreciate the hospitality of Professor Saburo Muroga and his family. I thank Professor Dave Liu and other tennis friends for providing me chances to refresh myself. Also I thank Kathy Johnson for her help in much paperwork and enjoyable conversation. Without them, my stay in this non-native culture must not be this enjoyable. The love and encouragement of my family sustained me over ups and downs in the pursuit of the Ph.D. degree. My parents taught me the importance of knowledge and education. I cannot be too grateful for their unconditional love and support. I thank my brother Hiroyuki for his support of our parents and for his wonderful personality. Talking with him over international phone line made me feel like I was at home. Without the financial support of the Oki Electric Industry, it would have been impossible to complete this thesis. They gave me the opportunity to pursue the Ph.D. degree. I greatly appreciate their generosity. v Table of Contents Chapter 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Real-Time Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Database Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.3 Real-Time Database Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1.4 Period Assignment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 1.5 Organization of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Past Research : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.1 Scheduling in Real-Time Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.2 Concurrency Control : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.3 Priority Ceiling Protocols : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.3.1 The Original Priority Ceiling Protocol : : : : : : : : : : : : : : : : : : : : 11 2.3.2 Priority Ceiling Protocols with Dynamic Priority Assignment : : : : : : : 15 3 The Convex Ceiling Protocol : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 3.1 Priority Ceiling Function : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 3.2 The Convex Ceiling Protocol : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 3.3 Properties of CCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25 3.3.1 Single-Blocking and No-Deadlock Properties : : : : : : : : : : : : : : : : : 25 3.3.2 Serializability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 3.4 The Schedulability Condition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 3.4.1 The Schedulability Condition for the RM Priority Assignment : : : : : : 32 3.4.2 The Schedulability Condition for General Static Priority Assignment : : : 33 3.5 Comparison with PCP+2PL : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 4 Optimization and Extension of CCP : : : : : : : : : : : : : : : : : : : : : : : : 41 4.1 Optimality of the RM Priority Assignment : : : : : : : : : : : : : : : : : : : : : 41 4.1.1 Worst Case Blocking Lengths of CCP : : : : : : : : : : : : : : : : : : : : 43 4.1.2 Optimality of the RM Priority Assignment in CCP : : : : : : : : : : : : : 45 4.2 Type-Specific CCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 4.2.1 Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49 4.2.2 Properties of TCCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 52 5 Dynamic Priority Convex Ceiling Protocol : : : : : : : : : : : : : : : : : : : : 55 5.1 Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 55 5.2 Properties of ECCP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 5.2.1 Single-Blocking and No-Deadlock Properties : : : : : : : : : : : : : : : : : 58 5.2.2 Serializability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 64 5.3 The Schedulability Condition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66 vi 6 Other Consistency Requirements : : : : : : : : : : : : : : : : : : : : : : : : : : 68 6.1 External Consistency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 6.1.1 System Thresholds : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 71 6.1.2 Maintaining External Consistency : : : : : : : : : : : : : : : : : : : : : : 72 6.2 Temporal Consistency : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74 6.2.1 Using Time Stamps : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75 6.2.2 Using Locks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 76 6.2.3 Using Priority Ceiling Functions : : : : : : : : : : : : : : : : : : : : : : : 78 6.2.4 Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 6.3 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 7 The Period Assignment Problem : : : : : : : : : : : : : : : : : : : : : : : : : : 81 7.1 System Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81 7.2 Period Assignment with External Tasks Only : : : : : : : : : : : : : : : : : : : : 83 7.2.1 Maximizing the Weighted Sum of External Consistencies : : : : : : : : : 84 7.2.2 Maximizing the Uniform External Consistency : : : : : : : : : : : : : : : 88 7.3 Period Assignment with Both External and Internal Tasks : : : : : : : : : : : : : 90 7.3.1 Maximizing the Weighted Sum of External Consistencies : : : : : : : : : 90 7.3.2 Maximizing the Uniform External Consistency : : : : : : : : : : : : : : : 93 7.4 Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 94 8 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 97 8.1 Summary of Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98 8.2 Directions for Future Research : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100 Bibliography : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 102 Vita : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 105 vii List of Symbols Symbol Explanation Ti A task. A task is a program segment that implements a set of functions. A task may be executed once or repeatedly. Ji A job. Job Ji is an execution of task Ti. ei The execution time of Ti. pi The period of Ti when Ti is a periodic task. U The total processor utilization of a system. aej A data object, a data item, or a resource. Li(aej) A lock operation by Ji on aej (or the time the lock operation is performed). Ui(aej) An unlock operation by Ji on aej (or the time the unlock operation is performed). Dj A demand class. It is described by a pair: (data item, access type). ssi The original priority of Ji. ss\Lambda i The augmented priority of Ji. It is the execution priority of Ji. *i The preemption level of Ti. Ti with a shorter relative deadline is assigned a higher preemption level. OEj The priority ceiling of aej in PCP, CCP and ECCP and the priority ceiling of Dj in TCCP. In PCP and CCP, it is the highest original priority of tasks that access aej. In TCCP, it is the highest original priority of tasks that access Dj. In ECCP, it is the highest preemption level of tasks that access aej. ^Ri(t) The highest priority ceiling of all data items locked by Ji at time t. \Phi i(t) The priority ceiling function of Ji (or Ti). It is a function of time. \Omega i(t) The maximum remainder priority. It is the maximum priority ceiling of data items that Ji (or Ti) uses after time t. CP CPi (t) The highest priority ceiling among resources locked by jobs other than Ji in the system at time t. CCCPi (t) The maximum \Phi j(t) of all jobs Jj 's in the system except Ji. (It is used in CCP.) CT CCPi (t) The maximum \Phi j(t) of all jobs Jj 's in the system except Ji. (It is used in TCCP.) CECCPi (t) The maximum \Phi j(t) of all jobs Jj's in the system except Ji. (It is used in ECCP.) tiniti (aej) The initial access instant of Ji on aej. It is the time Ji accesses data item aej for the first time. tfini (aej) The final access instant of Ji on aej. It is the time Ji accesses data item aej for the last time. tiniti (Dj) The initial access instant of Ji on Dj. It is the time Ji accesses a pair: (data item, access type) that belongs to Dj for the first time. viii tfini (Dj) The final access instant of Ji on Dj. It is the time Ji accesses a pair: (data item, access type) that belongs to Dj for the last time. Bi The worst case blocking length of Ti. filh The effective blocking unit of Jl against Jh. It is the maximum duration for which Jl may block Jh. ii The blocking task set of Ti. It is the set of tasks that may block jobs of Ti. RUi The upper resource set of Ti. It is the set of all data items accessed by tasks whose priorities are higher than or equal to the priority of Ti. RLi The lower resource set of Ti. It is the set of all data items accessed by tasks whose priorities are lower than the priority of Ti. RBi The blocking resource set of Ti. It is the set of all data items that may cause jobs of Ti to be blocked. Ri The set of data items accessed by Ti. T Li The set of tasks whose priorities are lower than the priority of Ti. `A The accuracy threshold. `T The tolerance threshold. `E The effect threshold. v The maximum evolution rate of an external data object. \Phi Ri (t) The read priority ceiling function of Ti. \Phi Wi (t) The write priority ceiling function of Ti. \Omega Ri (t) The read maximum remainder priority of Ti. It is the maximum priority ceiling of data items to be read by Ti after time t. \Omega Wi (t) The write maximum remainder priority of Ti. It is the maximum priority ceiling of data items to be updated by Ti after time t. O/i An external data object. It is a value about a certain aspect of an object in the physical world. oi The external period of O/i. O/i has a new value in every oi. wi The importance of i-th internal data object. ri The goodness of the external consistency of i-th internal data object. (= oi=pi) ai The maximum acceptable period to update i-th internal data object. bi The minimum degree of the external consistency of i-th internal data object. (= oi=ai) fi (= 1=bi) Xi The set of external tasks that derived data object aei depends on. ix Chapter 1 Introduction 1.1 Real-Time Systems As computer performance increases and hardware cost continues to drop, many computers are being adopted in embedded systems to monitor or to control physical instruments and devices. Since the state of the physical world is constantly evolving, embedded systems must be designed to keep up with the physical world. Keeping up with the physical world means that each piece of work to be done has a deadline. Systems that have timing requirements are called real-time systems. An example of a real-time system is an air traffic control system. In a conventional non-real-time system, we say that the system works correctly if it produces logically correct results from input data. However, the correctness of the results from a real-time system depends not only on logical correctness but also on temporal correctness. By temporal correctness, we mean that all the timing requirements specified by the application are satisfied. If a collision detection subsystem in an air traffic control system does not produce a collision alarm in time, two airplanes may collide. If a control instruction for a robot arm on an assembly line is not produced in time, the arm may destroy a product. In the conventional real-time system design process, the temporal behavior of a system needs to be extensively tested and examined after the implementation is completed [Gla80]. The test is necessary because no mechanism to guarantee the timing requirements is provided in the design. Even after extensive testing, the system may still have some probability of violating the timing requirements. In addition, a system usually evolves as time goes by. Whenever the system is modified, another round of tests is required to verify its temporal behavior. Because of the cost involved in the examination process and the possibility of timing violations even after testing, alternative approaches are being sought [Mok83, Sta88]. In these new approaches, some mechanisms to guarantee meeting the timing requirements are implanted in the system. One such mechanism is to adopt scheduling algorithms that can analytically guarantee the temporal behavior of the system. Much research has been conducted on scheduling algorithms for real1 time systems [LSS87, LL73, Mok83, MC70]. The scheduling algorithms for real-time systems usually provide closed form schedulability conditions. Such a scheduling algorithm guarantees that timing constraints are satisfied, provided that its schedulability condition is satisfied. Sha and Goodenough [SG90] show examples which apply the schedulability conditions to real-time system design in order to guarantee timing behavior of the system. 1.2 Database Systems In database systems, many concurrent executions of transactions share data objects. For example, many travel agents access the airline reservation system simultaneously. The shared data access must be controlled using concurrency control algorithms in order to avoid undesirable interference among the executions. We assume each transaction produces "correct" results if it is executed by itself. However, when there is an interference, the same transaction may produce "incorrect" results. For example, suppose two transactions update the same bank account. Transaction T1 withdraws $100 from account A and deposits it to account B. Transaction T2 deposits $50 to account A. Suppose account A's balance is $1000 and account B's balance is $500 at the beginning. If T1 is completed before T2, the final balance is $950 and $600 for accounts A and B, respectively. However, if the two transaction executions interleave as follows: T1 reads the balance ($1000) of A, T2 reads the balance ($1000) of A, T1 writes the new balance ($900) into A, T2 writes the new balance ($1050) into A, T1 reads the balance ($500) of B, T1 writes the new balance ($600) into B, the final balances are $1050 and $600 for accounts A and B, respectively. Thus, if transaction executions are interleaved and interfere with each other, the results may not be correct. It has been shown that a serializable schedule [BHG87] which has the same execution ordering of interfering operations as some serial execution will produce correct results. In the above example, account A is read by T2, written by T1, and then written by T2. From this sequence of events, we cannot decide the serial ordering of the two transactions. Therefore, the above example is not serializable. Executing transactions in a serializable schedule is considered to be 2 mandatory in most conventional database systems. We say that the database maintains logical consistency if transactions are executed in a serializable fashion. Although database systems have always been designed to provide adequate performance, the main concern in non-real-time settings has been logical consistency. To prevent the database from becoming logically inconsistent, many concurrency control and error recovery protocols have been designed [EGLT76, BHG87]. In contrast, the database performance issue is often treated as an optimization problem which is studied only after a logically consistent system has been built. 1.3 Real-Time Database Systems Many database applications have timing constraints. In an air traffic control system, a large amount of data on airplanes, airports, and weather conditions must be maintained. The system also needs to send instructions to airplanes by certain deadlines in order to maintain their safety. Computers that monitor the stock market or conduct program trading manage huge amounts of stock information and require timely executions of transactions. When a client asks for stock prices and requests a transaction, the system must respond in a timely fashion in order to have the result the client expects. A factory automation system may contain an inventory control sub-system. As requests to the inventory controller come from many assembly lines, concurrency control is necessary. In order to keep the assembly lines in uninterrupted operation, the inventory controller must respond in a timely fashion to each line's request. A real-time computation is not correct unless it can satisfy its timing requirements. A database system is not correct unless the system maintains its logical consistency. Thus, a realtime database system may not be correct unless the system can satisfy the timing requirements as well as the logical consistency. Data objects are shared among transactions in a database system. We must control access to the shared data objects to maintain the logical consistency using a concurrency control algorithm. Controlling data access means that some transaction executions may require exclusive access to the data objects, and other transaction executions may be suspended while waiting for the data objects. The suspension introduces unpredictability into the temporal behavior of the transactions. Mok [Mok83] has shown that the problem of deciding whether a set of periodically 3 executed transactions with exclusive accesses to shared data objects can be completed before their deadlines is NP-hard. In other words, we may not be able to find a schedule satisfying all deadlines of the transaction set in a reasonable amount of time even if such a schedule exists. We need a scheduling algorithm that can produce good but sub-optimal performances, and still maintain logical consistency as well as meet all timing requirements. Many researchers [AGM88, BMHD89, CJL89, HCL90, HSTR89, LS90] have taken heuristic approaches to integrating scheduling and concurrency control. However, their algorithms may not guarantee timely completions even for critical transactions. We believe that guaranteeing a predictable performance for critical transactions is one of the major design goals for realtime systems. Sha et al. [SRL90] have proposed an algorithm that can guarantee meeting all timing constraints even when there are exclusive accesses to shared data objects. However, their algorithm does not guarantee logical consistency. In this thesis, we propose scheduling algorithms that maintain logical consistency and have a predictable temporal behavior. The algorithms proposed by Sha et al. and proposed in this thesis assume that data objects accessed by transactions are known before the transactions start executions. This assumption may be too restrictive for conventional database management systems. However, it may be acceptable for real-time database systems because the designers of real-time systems must know in what environment the systems operate in order to design systems satisfying timing requirements. In addition to the timing constraints and logical consistency, there are other requirements for real-time database systems. Many embedded systems monitor the physical world and generate control instructions. To generate a control instruction, several sensor readouts are used. Since the control instructions are generated to adapt to the change in the physical world, the information used to generate the instruction must be up-to-date. Otherwise, the instruction may be obsolete. We call this requirement of up-to-dateness external consistency [Lin89]. The data values used to produce a control instruction may be read from several sensors. Ideally, the system should view a snapshot of the physical world, i.e., all data values should be read simultaneously. However, it is not always possible. Instead, we should minimize the discrepancy of update times between data values that are used by one execution of a transaction. Song and Liu [SL90, SL92] call the discrepancy dispersion. If the dispersion of the data values 4 used by a transaction is reasonably small, we say the data values are temporally consistent [LLS88]. Some real-time systems require the temporal consistency to be maintained. External consistency and temporal consistency are additional requirements for real-time database systems. We will discuss these two consistency issues from the system design perspective in this thesis. 1.4 Period Assignment A system needs to read sensor data, do computations, store data in storage devices, and generate control instructions. These activities are made up of many program segments, each of which implements a set of functions as shown in the example in Section 1.2. We call such a program segment a task. We use a task as a unit to enforce the timing requirements. As stated earlier, real-time database systems must satisfy timing requirements. These system timing requirements are usually transformed into timing constraints for each task. We need a systematic way to assign timing requirements to tasks from the timing requirements of the system. In many applications, the timing requirements are given as ranges and not as fixed values. For example, a control instruction for a robot arm may need to be generated at least every one hundred milliseconds. But a shorter interval is usually acceptable. From the range, we need to decide a specific period to each task [NL91]. When we assign periods to tasks, we want to divide system resources among tasks so that the system provides the most useful results while satisfying the timing requirements of the system. Since some data in the physical world change their values more frequently than others, the tasks recording these values should also be executed more frequently. Airplanes closer to airports need to be tracked with shorter periods than farther-away ones. On the other hand, computations in a real-time system may be more sensitive to changes in certain data than others, which in turn defines the different timing constraints on different tasks. For example, although a thermometer may produce a new temperature reading every second, computations may need to use the reading only once in every ten minutes. Given that most real-time systems have insufficient computing resources, we need to intelligently divide the computing resources among different tasks in order for the database system to produce the "best" result. In this thesis, we have studied strategies that provide effective utilization of the system resources. 5 1.5 Organization of the Thesis The goal of our research is to study design methodologies for real-time database systems. We have stated some of the requirements and the problems of real-time database systems in this chapter. In this thesis, we focus on the temporal aspect of real-time database systems. We propose scheduling and concurrency control algorithms to satisfy timing requirements and logical consistency. We also describe a design strategy to maintain external and temporal consistency and a method to assign timing constraints to tasks to effectively utilize resources. In the next chapter, we review related work in real-time system scheduling and database concurrency control. We describe in detail priority ceiling protocols as they are closely related to the algorithms we propose in this thesis. In Chapter 3, we propose an integrated scheduling and concurrency control algorithm for real-time database systems that guarantees a predictable temporal behavior and a serializable schedule. The algorithm in the chapter uses a static priority assignment scheduling algorithm. We prove the optimality of using the rate monotonic priority assignment [LL73] in the algorithm among static priority assignments in Chapter 4. Also in Chapter 4, we extend the algorithm to systems that allow different access types, such as read and write, for their data. We further extend the algorithm to a dynamic priority assignment scheme in Chapter 5. We then describe the impact of external consistency and temporal consistency on defining the temporal behavior of tasks in Chapter 6. In Chapter 7, we propose a method to assign timing constraints to tasks in order to utilize the CPU effectively. The thesis is concluded and future work discussed in Chapter 8. 6 Chapter 2 Past Research A real-time computer system has a set of tasks which are executed either only once or repeatedly. We use Ti to express the i-th task in the system. We call a single execution of a task a job. We express a job of a task Ti by Ji. Each real-time job has a ready time and a deadline which are usually specified in the task timing definition. A job can only be executed after its ready time and must be completed before the deadline. A job is active after the job becomes ready for execution, or arrives, and before the job execution completes. We show a job execution with rectangles as in Figure 2.1. In the diagram, time progresses from left to right. An up arrow expresses the arrival of a job and a down arrow expresses the deadline of the job. A task can be either periodic or aperiodic. A periodic task becomes active repeatedly in a constant interval. Monitoring and controlling tasks are usually performed periodically. A job of an aperiodic task is activated by pre-defined but irregular events. For example, the event can be an operator pushing a button or temperature exceeding a threshold value. For periodic tasks, computations usually must be finished by the end of each period. For aperiodic tasks, deadlines are individually defined. 2.1 Scheduling in Real-Time Systems Scheduling algorithms for real-time systems must provide predictable temporal behavior of jobs. A commonly used strategy in real-time scheduling algorithms is the priority-driven approach. job start completion time ready time preemption active deadline Figure 2.1: Execution of a Job 7 Using priority-driven algorithms, each job is assigned a priority based on some attributes of the job, such as the deadline and the period length. If a job execution can be interrupted anytime to allow the execution of other jobs, we say the scheduling policy is preemptive. The system always executes the ready job with the highest priority first if a priority-driven preemptive algorithm is used. The rate monotonic (RM) algorithm and the earliest deadline first (EDF) algorithm are two well-known priority-driven preemptive scheduling algorithms. RM is used for periodic tasks and assigns priorities to tasks according to period lengths of the tasks. A task with a shorter period is assigned a higher priority. Each job of a task is scheduled using the priority of the task. Since every job of a task has the same priority, we call this priority assignment static. Liu and Layland [LL73] show that the RM priority assignment is optimal among all static priority assignments in the sense that if a task set is feasible using the priority-driven preemptive approach with a static priority assignment, the task set is also feasible with RM. We say a task set is feasible in a schedule if all jobs of the task set can complete before their deadlines. EDF handles both periodic and aperiodic tasks and assigns priorities according to the deadlines of jobs. A job with an earlier deadline is assigned a higher priority. With the EDF priority assignment, jobs of the same task do not have the same priority. The priority assignment thus is dynamic. Liu and Layland [LL73] also show that the EDF priority assignment is optimal among dynamic priority assignments. EDF can schedule more tasks than RM [LL73] without missing the tasks' deadlines. It may however cause more jobs to miss their deadlines than RM under overload conditions because the algorithm executes jobs with missed deadlines first. On the other hand, RM has a predictable behavior under overload since tasks have fixed priorities. Suppose there are n periodic tasks, T1; T2; \Delta \Delta \Delta ; Tn. Let pj and ej be the period and the execution time of task Tj. We define ej=pj as the processor utilization of Tj. It has been shown [LL73] that if the total processor utilization of n periodic tasks in a processor is less than or equal to n(21=n \Gamma 1), i.e., U j nX i=1 ei pi ^ n(2 1=n \Gamma 1); (2.1) RM guarantees that the task set is feasible. In fact, the RM algorithm can often schedule a set of tasks with a total utilization greater than the said bound. Lehoczky et al. [LSD89] have derived an exact condition for a task set to be feasible by RM. Assuming the tasks are 8 pre-sorted so that p1 ! p2 ! \Delta \Delta \Delta ! pn, the feasibility condition is expressed with Li j min0!t^p i iX j=1 ej t & t pj ' ^ 1; for all i, 1 ^ i ^ n: (2.2) In Eq.(2.2), we need to find the minimum in the continuous interval of 0 ! t ^ pi. However, we only need to look at discrete t because of the characteristics of the ceiling function. We can rewrite Eq.(2.2) as follows: Li j mint2S i iX j=1 ej t & t pj ' ^ 1; for all i, 1 ^ i ^ n; (2.3) where Si = fkpjjj = 1; 2; \Delta \Delta \Delta ; i; k = 1; 2; \Delta \Delta \Delta ; bpi=pjcg: We call condition (2.1) the LL condition and both Eq.(2.2) and Eq.(2.3) the LSD condition. Although the LSD condition provides a higher processor utilization bound, it is much more complicated to check. EDF guarantees the feasibility of a task set if U = nX i=1 ei pi ^ 1 (2.4) is satisfied. 2.2 Concurrency Control Jobs share data objects in database systems. Many jobs may compete for data objects at any time. They cannot be randomly interleaved because some interleaving may cause erroneous results even if each job produces a correct result when it is executed by itself. A mechanism often used for concurrency control is the locking mechanism. Using the locking mechanism, a set of data objects is protected by a lock. We call the set of data objects a data item. The size of a data item, which is often called granularity, is defined by other criteria [BHG87]. Usually, a data item can be locked in different access types in database systems. For example, a data item can be locked in either read or write type. Whenever a job accesses a data item, the job must acquire the lock of the data item in the appropriate access type. Two accesses on the same data item may or may not be compatible. It depends on the type of the lock. For example, a read lock and a write lock are usually incompatible because the execution order of a read access and a write access makes a difference in the result database state. In this thesis, we assume 9 that there is only one access type for each lock and accesses on the same data item are always incompatible except in Chapter 4 where we discuss an algorithm for different access types. The job must release locks after using the data items so that other jobs can get the locks. The order of acquiring and releasing locks is defined by other criteria, such as the two-phase locking protocol [EGLT76], so that tasks produce "correct" results. We call acquiring and releasing a lock a lock operation and an unlock operation, respectively. If job Ji has the lock for data item aek when another job Jj requests the same lock, either Jj waits until the lock is released by Ji or Ji is aborted so that Jj can acquire the lock. The duration between a lock operation and the corresponding unlock operation defines a critical section. Serializability is a commonly used correctness criterion for interleaved job executions. An execution is serializable if it produces the same result and transforms the database to the same state as some serial execution does. Without knowing what each job does at each step of the execution, we can only guarantee serializability by ordering conflicting accesses in the same order as in a serial execution. An interleaved job execution is serializable if and only if the serialization graph of the execution schedule is acyclic [BHG87]. The serialization graph is a directed graph whose nodes are jobs. An edge Ji ! Jj indicates that a data access by Ji precedes a data access by Jj in the schedule, i.e., there exists a data item aek such that Ji accesses aek before Jj in the schedule. A commonly used protocol to maintain serializability is the two-phase locking protocol (2PL) [EGLT76]. 2PL requires that a job performs lock and unlock operations in two phases: a growing phase and a shrinking phase. During the growing phase, the job can acquire new locks. At the first unlock operation, the job enters into the shrinking phase and cannot receive any more new locks. 2.3 Priority Ceiling Protocols 2PL guarantees serializability. However, deadlocks may occur [BHG87]. Deadlocks are fatal in real-time systems since no jobs involved in the deadlocks can complete. The other problem is that priority inversions [SRL90] may occur. Priority inversion is a state of a system in which a higher priority job is forced to wait while a lower priority job is executing. We say that job Jh is blocked by job Jl when there are a higher priority ready job Jh and a lower priority 10 12345678901234567891234567890123456789 123456789012345678912345671234567 123 123 1234512345 12345 rk rj Ji Li(rj) Ui(rj) Ui(rk) Li(rk) 1234567812345678 1234567812345678 12345678rl 1234567812345678 1234567812345678 1234567812345678 1234567812345678 Li(rl) Ui(rl) Figure 2.2: Lock Operation Ordering Policy of PCP ready job Jl in a system, and Jh is waiting while Jl is executing. Jobs are usually assigned priorities in real-time database systems. A low priority job may lock a data item before a high priority job needs it and cause a priority inversion. Uncontrolled priority inversion causes unpredictable delay for jobs. There is a class of algorithms that integrates scheduling and concurrency control algorithms and utilizes the idea of priority ceiling [SRL90, CL90, Bak90]. We call this class of algorithms priority ceiling protocols. With the priority ceiling protocols, deadlocks are avoided and the priority inversion delay is limited to at most the duration of a single critical section. Priority ceiling protocols are applicable not only to databases but also to systems with exclusively accessed resources. Therefore, we use "resource" instead of "data item" in the following discussions. We discuss priority ceiling protocols in detail in this section because the algorithms we are proposing are based on these algorithms. 2.3.1 The Original Priority Ceiling Protocol The original priority ceiling protocol (PCP) [SRL90] prevents deadlocks and has a predictable temporal behavior. PCP is made of three parts: a scheduling algorithm, a concurrency control algorithm, and an interaction protocol between the scheduling algorithm and the concurrency control algorithm. PCP uses a priority-driven approach as its scheduling algorithm and is designed for systems with static task priorities. Similar algorithms using dynamic priority schemes have also been proposed [CL90, Bak90]. They will be discussed later. PCP assumes that critical sections are properly nested (Figure 2.2). Let Li(aej) and Ui(aej) be the lock and unlock operations by job Ji for resource aej. Properly nested critical sections mean that, for all j and k, Ui(aek) is performed between Li(aej) and Ui(aej) if and only if Li(aek) is 11 performed between Li(aej) and Ui(aej). Also, a job can have critical sections that do not overlap. PCP cannot handle partially overlapped critical sections. In PCP, a priority ceiling is defined for each resource. The priority ceiling of a resource is the highest task priority among the tasks that may access the resource. For example, if there are three tasks T1, T2, and T3 that will lock resource aej and T1 has the highest priority, the priority ceiling of aej is defined to be the priority of T1. We express the priority ceiling of aej by OEj. Note that the priority of a job is equal to the priority of a task which the job is an execution of, since PCP assumes a static priority assignment. Let CP CPi (t) denote the highest priority ceiling among resources whose locks are held by jobs other than Ji in the system at time t. Assuming all priorities are positive numbers, we define CP CPi (t) = 0 while no lock is held by any job other than Ji. In PCP, when Ji requests a lock on any resource at time t, Ji's priority ssi and CP CPi (t) are compared. If ssi is less than or equal to CP CPi (t), the lock request is rejected and Ji is suspended. When a lock request of a job is rejected, the system suspends the job and starts executing the highest priority job among other active jobs. When job Jh's blocking condition is resolved, the system checks whether Jh has the highest priority. If it has, the system suspends the currently running job and resumes the execution of Jh. In real-time systems, the concurrency control algorithm must interact with the scheduling algorithm. PCP adopts priority inheritance for this purpose. According to priority inheritance, when a job Ji is suspended, the job that holds the lock of a resource with the priority ceiling CP CPi (t) inherits job Ji's priority ssi. We call the execution priority of Ji, which is the maximum of Ji's original priority and its inherited priorities, augmented priority. We express the augmented priority of Ji with ss\Lambda i . PCP is summarized as follows. Priority Ceiling Protocol (PCP) 1. (Scheduling Algorithm) A static priority assignment priority-driven algorithm. 2. (Lock Operation Ordering) Critical sections are properly nested. 3. (Concurrency Control Algorithm) (a) A request for Li(aej) is granted if ssi ? CP CPi (t). Otherwise, it is rejected. (b) A request for Ui(aej) is always granted. 12 T1 T3 T2 12341234 12341234 1234 12345671234567 123123 123 0 2 4 time T1 T3 T2 12341234 12341234 0 2 4 123456123456 6 8 10 1234512345 123123 r1 r2 1234567812345678 1234567812345678 6 12341234 12341234 1234 1234512345 1234512345 12345 123123 123 14 12 (a) Task Set (b) Schedule 1234567812345678 1234567812345678 1234567812345678 1234567812345678 123456789123456789 123456789123456789 123456789r3 1234567812345678 1234567812345678 1234567812345678 1234567812345678 8 time 1234512345 12341234 1234 16 critical sections 12341234 12341234 18 20 123123 123 10 f1=p1 f2=p2 f3=p3 Figure 2.3: Example 1 4. (Interaction Protocol) (a) If the request for Li(aej) is rejected, job Ji is blocked. Job Jk that holds the lock for resource aeh whose priority ceiling is CP CPi (t) inherits Ji's priority. (b) When Ji performs Ui(aej), Ji's augmented priority becomes the highest among Ji's original priority ssi and the priorities of jobs which Ji still blocks after releasing the lock for aej. 2 Example 1 In Figure 2.3, there are three tasks T1, T2, and T3. T1, T2, and T3 have priorities ss1, ss2, and ss3 respectively. ss1 is the highest priority and ss3 is the lowest. Periods of T1, T2, and T3 are 8, 26, and 65, respectively. A job of T3 arrives at time 0, a job of T2 arrives at 2, and a job of T1 arrives at 5. Resource ae1 is used by T1, T2, and T3. ae2 is used by T2 and T3. ae3 is accessed by T3. Then, the priority ceilings of ae1, ae2, and ae3 are ss1, ss2, and ss3, respectively. The start and end of the critical sections and execution times of the tasks are shown in Figure 2.3 (a). 13 Using PCP, the example task set is scheduled as in Figure 2.3 (b). J3 starts to execute at time 0. J3 enters into the critical section for ae1 at time 1 since CP CP3 (1) = 0. At time 2, J2 arrives. Since ss\Lambda 2 = ss2 ? ss\Lambda 3 = ss3, J2 starts execution. When J2 requests the lock for ae2 at time 3, J2 is blocked because J2's priority ss2 is lower than CP CP2 (3) = ss1. J3 inherits J2's priority ss2 and now ss\Lambda 3 = ss2. J3 resumes execution. J3 releases ae1 at time 4. Since CP CP2 (4) = 0, J2's blocking is resolved and J2 is granted the lock for ae2. Also, ss\Lambda 3 returns to its original priority ss3. J2 preempts J3 and resumes execution. At time 5, J1 arrives and starts execution. J1 requests the lock for ae1 at time 6. At this time, CP CP1 (6) = ss2 is lower than J1's original priority ss1. J1 is allowed to enter into the critical section and continues the execution. J1 exits from its critical section at time 7 and completes execution at time 8. J2 resumes execution and requests the lock for ae1 at time 9. Since CP CP2 (9) = 0, the lock is granted. J2 releases the locks for ae1 and ae2 at time 10 and completes execution at time 11. J3 resumes execution. The second job J1 of T1 arrives at time 13 and starts execution. At time 14, J1 requests the lock for ae1. Since CP CP1 (14) = 0, J1 enters into the critical section. J1 exits from the critical section at time 15 and completes at time 16. J3 resumes execution and requests the lock for ae2 at time 16. Since CP CP3 (16) = 0, the lock is granted. J3 releases the lock for ae2 at time 17. J3 requests the lock for ae3 at time 19, releases the lock at time 20, and completes its execution at time 21. 2 PCP guarantees that a job is blocked at most once by one outermost critical section of a lower priority job (single-blocking property). Because of this property, we can determine whether a task set satisfies all deadlines when the RM priority assignment and PCP are used [SRL90]. If a task set satisfies the following condition for all tasks Ti in the task set, the task set is feasible by the combination of the RM priority assignment and PCP: e1 p1 + e2 p2 + \Delta \Delta \Delta + ei pi + Bi pi ^ i(2 1=i \Gamma 1); for all i, 1 ^ i ^ n; (2.5) where ei and pi are the execution time and the period of task Ti. Bi is the worst-case blocking length of task Ti, i.e., the duration of the longest critical section that can cause Ti to be blocked. Since we know that only the critical sections that are in a lower priority task and lock resources with higher priority ceilings than ssi can block Ti, we can decide Bi for each Ti before the system becomes operational. 14 Sha et al. [SRL90] also provide a tighter condition than Eq.(2.5). The condition is an extension of Eq.(2.2). If a task set satisfies the following condition for all tasks Ti in the task set, the task set is feasible by the combination of the RM priority assignment and PCP: min0!t^p i 0@ iX j=1 ej t & t pj ' + Bi t 1A ^ 1; for all i, 1 ^ i ^ n: (2.6) PCP avoids deadlock (no-deadlock property) and allows only one blocking by a single critical section in a lower priority job (single-blocking property). However, it does not guarantee serializability. For example, in Figure 2.3 (b), resource ae1 is first used by J3, then J1, and finally J2. Resource ae2 is first accessed by J2 and then J3. Therefore, there is a cycle J1 ! J2 ! J3 ! J1 in its serialization graph and the schedule is not serializable. In order to avoid the serializability violation, we can use 2PL along with PCP. However, 2PL tends to make blocking longer because it forces jobs to hold locks until all locks are acquired, even if some of the resources are only used much earlier than the job is allowed to release locks. With longer blocking, we can schedule fewer jobs in order to satisfy their timing constraints. In Section 3.5, we will compare the combination of PCP and 2PL with the algorithm we propose in the next chapter. 2.3.2 Priority Ceiling Protocols with Dynamic Priority Assignment PCP assumes that task priorities are static. With a static priority assignment, the processor utilization needs to be lower than the utilization with a dynamic priority assignment in order to satisfy timing constraints. Modified algorithms of PCP that can use a dynamic priority assignment and are applicable to both periodic and aperiodic tasks have been proposed by Chen and Lin [CL90], and Baker [Bak90]. The Dynamic priority ceiling protocol (DPCP), proposed by Chen and Lin [CL90], uses the EDF algorithm for the priority assignment of jobs. The priority of a job is defined by the deadline of the job. In addition to jobs, each task has a priority. The task priority is the priority of its current job if the job is active. Otherwise, it is the priority of the next job to arrive in the next period. Priority ceilings of resources are defined based on the task priority definition. Since the priority of a task changes when a job of the task completes, some priority ceilings may change as well. 15 T1 T2 1234512345 1234512345 12345671234567 0 2 4 time T1 T2 12341234 12341234 0 2 4 6 8 10 1234512345 12345 123123r1 r2 6 14 12 (a) Task Set (b) Schedule time 1234567812345678 12345678 12345671234567 123456789123456789 123456789 16 critical sections 1234512345 1234512345 12341234 12341234 123123 12341234 1234 12341234 1234512345 12345 Figure 2.4: Example 2 Example 2 There are two periodic tasks T1 and T2 (Figure 2.4). T1 uses resource ae1 and T2 uses resources ae1 and ae2. T1's period is 7. T2 has a period of 12. ffl At time 0, the priorities of T1 and T2 are 71 and 12. The priority ceilings of ae1 and ae2 are 7 and 12. ffl At time 3, T1 completes its first job and its priority becomes 14. The priority ceilings of ae1 and ae2 both become 12 because ae1's priority ceiling is now the maximum of T1's priority 14 and T2's priority 12. 2 The concurrency control algorithm is the same as PCP. A lock request is granted if the priority of the requesting job is higher than the highest priority ceiling of all locks currently locked by other jobs. DPCP also uses priority inheritance as the interaction protocol. Baker [Bak90] proposes another approach to dynamic priority assignment scheduling. The algorithm is called stack resource policy (SRP). There are three major differences between PCP and SRP in addition to using a dynamic priority assignment. The first difference is the time that blocking occurs. In SRP, blocking occurs at the beginning of job executions instead of on the resource requests. A job is blocked before it starts to execute if the priority of the job is not higher than the highest priority ceiling of all resources locked by other jobs. In this way, we can reduce the number of context switches. 1Although we use a larger number to express a higher priority in the rest of the thesis, we use a smaller number for a higher priority in this example for convenience. 16 Tj Ti higher priority late arrival Figure 2.5: High Priority with Late Arrival The second difference is that SRP uses a different criterion to assign the priority ceilings. The criterion is called preemption level. The preemption level is defined for each task. Task Ti has a higher preemption level than task Tj if and only if Ti can have a higher priority than Tj and can arrive after Tj. When the EDF algorithm is used, Ti can have a higher priority than Tj and can arrive after Tj if and only if Ti has a shorter relative deadline than Tj (Figure 2.5). The relative deadline of a job is defined by the duration between the ready time and the deadline. A task with a shorter relative deadline has a higher preemption level. Therefore, if every aperiodic task has a fixed relative deadline, we can assign a preemption level to each aperiodic task. The priority ceiling of a resource is defined by the maximum preemption level of tasks that use the resource. If we use the relative deadline to express the preemption level, the preemption levels of T1 and T2 in Figure 2.4 are the fixed values 7 and 12, respectively. Priority ceilings of ae1 and ae2 are 7 and 12. In SRP, a job is blocked from starting its execution if its preemption level is not higher than the maximum priority ceiling of all resources locked by other jobs. As the third contribution of SRP, Baker extends the algorithm to be applicable to multi-unit resources. The priority ceiling is assigned to each resource type with the number of available resources of the type as a parameter of the priority ceiling. When there are i units of resources available, the priority ceiling of the resource type is the maximum preemption level of the tasks that require more than i units of the resources. Suppose there are three tasks, T1, T2, and T3 and three units of resource ae. T1, T2, and T3 use 1 unit, 2 units, and 3 units of resource ae, respectively. Let OE(i) be the priority ceiling of resource ae when there are i units available. Assuming that T1 has the highest preemption level and T3 has the lowest, the priority ceiling OE(i) is defined as follows: OE(0) = (preemption level of T1), OE(1) = (preemption level of T2), OE(2) = (preemption level of T3), and OE(3) = 0. 17 Both DPCP and SRP have the same two properties that PCP has. They avoid deadlocks and each job is blocked at most once. They also share the same shortcoming of not guaranteeing serializability. From the no-deadlock and the single-blocking properties, we can determine the schedulability of a task set. Let T1, T2, \Delta \Delta \Delta , Tn be a task set in ascending order of relative deadline lengths, i.e., Tn has the longest relative deadline. Baker [Bak91] proved that a task set is feasible using SRP if the task set satisfies the following condition: iX j=1 ei pi + Bi pi ^ 1; for all i, 1 ^ i ^ n: (2.7) Chen [Che91] further analyzes the schedulability condition of both DPCP and SRP. 18 Chapter 3 The Convex Ceiling Protocol As we have explained in Chapter 2, PCP, DPCP, and SRP can only guarantee a predictable temporal behavior, not serializability. In other words, they can satisfy the timing requirements, but not the need for logical consistency. For some applications, however, logical consistency is necessary. In this chapter, we propose a concurrency control algorithm that ensures the serializability property in addition to the no-deadlock and the single-blocking properties. In this chapter, we assume that the system has a single processor and all tasks in the system are periodic tasks. Each job must complete its execution by the end of its period. Tasks are assigned fixed priorities and the priority-driven algorithm is used for scheduling. The periodic task assumption will be removed in a later chapter. 3.1 Priority Ceiling Function The priority ceiling is defined in PCP. Let ^Ri(t) be the highest priority ceiling of all data items locked by job Ji at time t. If no data item is locked by Ji, ^Ri(t) = 0. A lock operation Li(aex) and an unlock operation Ui(aey) may change ^Ri(t). The ^R3(t) of Example 1 in Chapter 2 is shown in Figure 3.1. Here an empty circle means an open end, while a filled square means a closed end. The reason for the serializability violation in Example 1 is that ^R3(t) has more than one peak. J2 can enter into a critical section when ^R3(t) drops below ss2 between time 3 and 5. Generally, if ^Ri(t) of a job Ji has more than one peak, serializability may not be guaranteed in PCP. Suppose there is a job Jj with priority ssj, ssj ? ssi. If Ji accesses data items aek and ael in two non-overlapped critical sections and ^Ri(t) drops below ssj between the two critical sections as shown in Figure 3.2, then there is a possibility that Jj enters its critical sections for both aek and ael in the ^Ri(t) valley. With this observation, if we control ^Ri(t) to have only one peak, serializability may be ensured. We define a function \Phi i(t) for each task such that \Phi i(t) * ^Ri(t) 19 p1 p2 p3 2 4 6 R3( t) time t T3 123123 123 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 0 0 8 12341234 1234 123123 r1 r2 1234567812345678 1234567812345678 12345678r3 f1=p1 f2=p2 f3=p3 ^ 10 Figure 3.1: ^R3(t) in Example 1 and \Phi i(t) has no valley. Our intention is to use \Phi i(t) instead of ^Ri(t) for concurrency control. We call \Phi i(t) the priority ceiling function of task Ti. In order to define \Phi i(t), we define the maximum remainder priority \Omega i(t) for each task Ti. \Omega i(t) is the maximum priority ceiling of data items that task Ti will use after time t. If no data item will be used after time t, \Omega i(t) = 0. For example, the maximum remainder priority of task T3 in Example 1 is shown in Figure 3.3. Note that \Omega i(t) is a monotonically decreasing function. Let tiniti (aej) and tfini (aej) be the times a job Ji of a task Ti accesses data item aej for the first time and the last time, respectively. We call tiniti (aej) and tfini (aej) the initial and final access instants of Ji on aej. When a job Ji accesses a data item aej, if the priority ceiling of aej is higher than \Phi i(t), \Phi i(t) is set to the priority ceiling of aej. Therefore, \Phi i(t) * ^Ri(t). When Ji accesses aej for the last time, \Phi i(t) is set to the highest priority ceiling of data items that Ji accesses after that time. \Phi i(t) is the function used in our concurrency control algorithm explained in the next section. \Phi i(t) is defined as follows. Priority Ceiling Function: \Phi i(t) (\Phi 1) When a job of Ti starts execution, \Phi i(t) = 0. (\Phi 2) At t = tiniti (aej), if \Phi i(t) is less than the priority ceiling of aej, \Phi i(t) is set to the priority ceiling of aej. 20 fkfl flflp jfl pifl 0fl Ri( t)fl^fl Li(rl)fl Ui(rl)flUi(rk)flLi(rk)fl time tfl Figure 3.2: ^Ri(t) with Two Peaks (\Phi 3) At t = tfini (aej), if \Phi i(t) ? \Omega i(t), \Phi i(t) is set to \Omega i(t). 2 \Phi i(t) increases only in the event of (\Phi 2) and decreases only in the event of (\Phi 3) in the above definition. \Phi i(t) = 0 before a job of Ti accesses any data item and after a job of Ti accesses all data items. The priority ceiling function of T3 in Example 1 is defined as in Figure 3.4. Theorem 3.1 \Phi i(t) has at most one peak. p1 p2 p3 \Omega 3(t) time t T3 123123 123 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 0 1234512345 12345 123123 r1 r2 1234567812345678 1234567812345678r3 f1=p1 f2=p2 f3=p3 0 2 4 6 8 10 Figure 3.3: Maximum Remainder Priority 21 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma f1= p1fl r2fl r3fl f2= p2fl f3= p3fl time tfl8fl p1fl p2fl p3fl 0fl \Phi 3 (t)fl 6fl4fl2fl 10fl0fl r1fl T3fl Figure 3.4: Priority Ceiling Function Proof: Suppose \Phi i(t) has two peaks. Since \Phi i(t) decreases only when (\Phi 3) is applied and increases only when (\Phi 2) is applied, there are times t1 and t2, t1 ! t2, such that (\Phi 3) is applied at time t1 and (\Phi 2) is applied at time t2. Without loss of generality, let us assume that there is no event (\Phi 2) or (\Phi 3) between t1 and t2. From the definition of \Omega i(t), Ti does not access any data item with a priority ceiling higher than \Omega i(t1) = \Phi i(t1+) after time t1. By t1+, we mean immediately after \Phi i(t) is set to \Omega i(t1). At time t2, Ti accesses a data item aej such that \Phi i(t2) is less than the priority ceiling of aej. Since neither (\Phi 2) nor (\Phi 3) happens between t1 and t2, \Phi i(t1+) = \Phi i(t2). Since \Phi i(t1+) = \Omega i(t1), \Omega i(t1) must be less than the priority ceiling of aej which is used after t1. This is a contradiction. Therefore, \Phi i(t) has only one peak. 2 3.2 The Convex Ceiling Protocol The Convex ceiling protocol (CCP) is an integrated scheduling and concurrency control algorithm. CCP avoids any deadlock and guarantees a single blocking by one lower priority job. In addition, CCP guarantees serializability. When a job Ji requests its first access to a data item, the system checks if job Ji's priority ssi is higher than all other jobs' priority ceiling function values. We express the maximum \Phi j(t) of all jobs Jj , j 6= i, by CCCPi (t). If ssi ? CCCPi (t), Ji is allowed to proceed. Otherwise, Ji is suspended. The priority of the suspended job is inherited by the job Jj that has the maximum \Phi j(t). Therefore, Jj 's execution priority may be different from its original priority ssj. We call the execution priority the augmented priority and denote it by ss\Lambda j . 22 CCP is defined as follows. Note that \Phi i(t), and consequently CCCPi (t), changes according to the definition of \Phi i(t) as the execution progresses. Convex Ceiling Protocol (CCP) 1. (Scheduling Algorithm) A static priority assignment priority-driven algorithm. 2. (Concurrency Control Algorithm) When a job Ji requests the first data access to a data item, (a) if Ji's original priority ssi is higher than CCCPi (t), Ji is allowed to proceed to the initial access instant on the data item. (b) Otherwise, Ji is suspended until ssi ? CCCPi (t) is satisfied. 3. (Interaction Protocol) If Ji is suspended, the augmented priority ss\Lambda i of Ji is assigned to job Jk which has the maximum \Phi k(t). 2 CCP does not use locks. However, since the scheduler needs to know the first access and the last access to each data item in a job execution in order to control \Phi i(t) and CCCPi (t), it is convenient to assume that each job sends messages to the scheduler before the first access and after the last access. We call a message of the first access an initial access notice and a message of the last access a final access notice. Between an initial access notice and the corresponding initial access instant of a job, a lower priority job may execute if the former job is blocked. Between a final access instant and the corresponding final access notice of a job, no lower priority job can execute. We call the duration between an initial access instant and the corresponding final access instant a resource demand section. For example, the tasks of the task set in Example 1 (Figure 3.5(a)) are assigned priority ceiling functions as shown in Figure 3.5(b). T3's priority ceiling function is equal to 0 for the first time unit, then it goes up to ss1 until the third time unit. Then, \Phi 3(t) goes down to ss2, and so on. If a job J3 of T3 arrives at time 0 and both a job J1 of T1 and a job J2 of T2 arrive at time 2, the task set is scheduled as in Figure 3.5(c). When J1 arrives at time 2, C1(2) is ss1. However, since J1 does not request any data item for the first time unit of the execution, J1 executes. At time 3, J1 sends an initial access notice for ae1. Since C1(3) = ss1, J1 is blocked. J3 sends its final access notice for ae1 at time 4. Since C1(4) is equal to 0 now, J1 resumes its execution. Time 4 is the initial access instant of J1 on ae1. 23 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma T1fl 8fl 10fl4fl0fl 2fl T2fl T3fl r1fl r2fl r3fl 6fl p3flp2flp1fl f2=p2fl f3=p3fl f1=p1fl p1fl p1fl p1fl p1flp2fl p3flp3flp2flp1flp1fl timefl timefl 0fl 2fl 4fl 6fl 8fl 10fl 12fl 10fl 16fl 18fl 20fl 22fl 24fl p1fl p1flp2fl (a) Task Setfl (b) Priority Ceiling Functionfl (c) Schedulefl initial access notice for r1fl initial access instant on r1fl final access notice for r1fl 0fl 0fl 0fl 0fl 0fl 0flT3fl T2fl T1fl T1fl T2fl T3fl 8fl 10fl4fl0fl 2fl 6fl timefl Figure 3.5: Example of Convex Ceiling Protocol 24 3.3 Properties of CCP Since CCP is an extension of PCP, it inherits and preserves the desirable properties of PCP. In addition, CCP guarantees the serializability of application processes. In this section, we prove the properties of CCP. 3.3.1 Single-Blocking and No-Deadlock Properties We assume tasks T1, T2, \Delta \Delta \Delta , Tn, are sorted in descending order of priority, i.e., ss1 ? ss2 ? \Delta \Delta \Delta ? ssn. Also, we assume, for simplicity of proofs, that there is no tie in priorities. Modification of the proofs to include ties in priorities is straightforward. The blocking condition is summarized in the next lemma. From the blocking condition, we prove that a job cannot block other jobs before entering any of its resource demand sections. Also, we prove that a job can be blocked only before entering any of its resource demand sections. From these two properties, we see that a job cannot wait for a data item while holding another data item, i.e., CCP avoids deadlocks. We prove the single-blocking property by contradiction. If a job Ji is blocked by two lower priority jobs, Jj and Jk, the lower priority jobs' priority ceiling functions must have higher values than the priority of Ji. Then, one of the lower priority jobs, for example Jj , must enter its resource demand section while Jk's priority ceiling function is higher than the priority of Ji. This contradicts CCP. The formal proofs are described below. Lemma 3.2 A job Jh is blocked by a lower priority job Jl at time tb if and only if the following four conditions are satisfied: 1. a job Ji with its augmented priority ss\Lambda i higher than or equal to ss\Lambda h sends an initial access notice at some time ta, ta ! tb, 2. ssi ^ CCCPi (t), for all t, ta ^ t ^ tb, 3. CCCPi (tb) = \Phi l(tb), and 4. ss\Lambda l is the highest priority in the system at tb. Proof: From the definition of the blocking, job Jh is blocked by job Jl when Jh is waiting while Jl is executing. With CCP, this situation occurs if and only if Jl inherits an augmented priority higher than or equal to the augmented priority ss\Lambda h of Jh and it is the highest augmented 25 priority in the system. Jl inherits an augmented priority higher than or equal to ss\Lambda h at time tb if and only if a job Ji with its augmented priority ss\Lambda i higher than or equal to ss\Lambda h sends an initial access notice at time ta, ta ! tb, ssi ? CCCPi (t) is not satisfied between ta and tb, and CCCPi (tb) = \Phi l(tb): Therefore, the lemma holds. 2 Lemma 3.3 If the execution of a job is at an instant either before the first initial access instant or after the last final access instant, the job cannot block any other jobs with higher original priorities. Proof: From Lemma 3.2, a job Jh is blocked by a lower priority job Jl at time tb only if there is a job Ji such that ssi ^ CCCPi (t); for all t, ta ^ t ^ tb, and CCCPi (tb) = \Phi l(tb): From the definition of \Phi l(t), if the execution of Jl is at an instant either before the first initial access instant or after the last final access instant, \Phi l(t) = 0. ssi ^ \Phi l(tb) = 0 cannot be true because ssi ? 0 for every job Ji from the definition of the priority. Therefore, Jl cannot block Jh. 2 Let us call the duration between the first initial access instant and the last final access instant the outermost demand section. Lemma 3.4 If time t1 is the first initial access instant of a job Ji, then CCCPi (t1) = maxi!k^n \Phi k(t1); i.e., no higher priority job than Ji is in its outermost demand section at time t1. Proof: Since Ji is not in its outermost demand section before t1, Ji is not blocking any other job from Lemma 3.3. The augmented priority ss\Lambda i of Ji is equal to ssi. Because ss\Lambda i = ssi is the highest priority among priorities of active jobs at t1, all jobs with higher priorities than Ji are not ready and cannot be in their outermost demand sections, i.e., max1^k!i \Phi k(t1) = 0. Therefore, CCCPi (t1) = max 1^k^n k6=i \Phi k(t1) = maxi!k^n \Phi k(t1): 2 26 Theorem 3.5 A job may be blocked only before the first initial access instant. Proof: Let t1 be the first initial access instant of job Ji. Since Ji is allowed to proceed at t1, we know ssi ? CCCPi (t1): (3.1) From Lemma 3.4 and Eq.(3.1), ssi ? maxi!k^n \Phi k(t1): (3.2) Suppose Ji is blocked by a lower priority job Jl at t2 for the first time after t1. From Lemma 3.2, Ji is blocked by a lower priority job Jl at time t2 if and only if there is a job Jj with its augmented priority ss\Lambda j higher than or equal to ss\Lambda i , Jj sends an initial access notice just before time t2, and ssj ^ CCCPj (t2) = max 1^k^n k6=j \Phi k(t2) = \Phi l(t2): (3.3) No job with a lower original priority than ssi is executed between t1 and t2 because Ji is blocked for the first time at t2 after t1. Therefore, the original priority ssj of Jj must be higher than or equal to ssi, i.e., ssj * ssi: (3.4) Since we assume ss1 ? ss2 ? \Delta \Delta \Delta ? ssn, ssi ? ssl means i ! l. Therefore, max 1^k^n k6=j \Phi k(t2) = \Phi l(t2) = maxi!k^n \Phi k(t2): (3.5) Since no job with its priority lower than ssi is executed from t1 to t2, maxi!k^n \Phi k(t2) = maxi!k^n \Phi k(t1): (3.6) From Eq.(3.3), (3.4), (3.5), and (3.6), we have ssi ^ maxi!k^n \Phi k(t1): This inequality contradicts Eq.(3.2). 2 Corollary 3.6 Suppose t1 is the first initial access instant of Ji. Then, maxi!k^n \Phi k(t) = maxi!k^n \Phi k(t1); for all t, t1 ! t ^ tc; where tc is the completion time of Ji. 27 Proof: From Theorem 3.5, Ji is not blocked for t1 ^ t ^ tc. This means that a job Jk whose original priority is less than ssi has no chance to be performed in this interval. Hence, the corollary holds. 2 Corollary 3.7 When job Ji is blocked, Ji's augmented priority ss\Lambda i is equal to its original priority ssi. Proof: From Theorem 3.5, Ji can be blocked only before its first initial access instant. Since Ji is not in its outermost demand section, Ji cannot block jobs from Lemma 3.3. Hence, ss\Lambda i = ssi. 2 When there is a deadlock, there must be a circular wait condition [CES71]. In the circular wait condition, each job in the cycle is in at least one resource demand section while waiting to enter into another resource demand section of a data item and the data item is being accessed by another job in its resource demand section. Theorem 3.8 CCP prevents deadlocks. Proof: Suppose there is a deadlock. Then, there must be a circular wait condition. Let Ji be the highest priority job in the cycle. From Theorem 3.5, Ji will not be blocked once it enters into a resource demand section. However, since Ji is the highest priority job in the cycle, no blocking means that Ji need not wait in order to enter into additional resource demand sections. In other words, there is no circular wait condition. Therefore, there is no deadlock. 2 Theorem 3.9 A job Jl blocks a higher priority job Jh at time tb if and only if the following four conditions are satisfied: 1. a job Ji with its original priority ssi higher than or equal to ssh sends an initial access notice at some time ta, ta ! tb, 2. ssi ^ CCCPi (t); for all t, ta ^ t ^ tb; 3. CCCPi (tb) = \Phi l(tb), and 4. ss\Lambda l is the highest priority in the system at tb. Proof: From Lemma 3.2, a job Jh is blocked by a lower priority job Jl at time tb if and only if a job Ji with its augmented priority ss\Lambda i higher than or equal to ss\Lambda h sends an initial access 28 notice at time ta, ta ! tb, and the conditions 2, 3, and 4 are satisfied. Since both Jh and Ji are being blocked between ta and tb, ss\Lambda i = ssi and ss\Lambda h = ssh at ta from Corollary 3.7. Therefore, ss\Lambda h = ssh ^ ss\Lambda i = ssi at ta. 2 Theorem 3.9 will be used in the next section to discuss the schedulability condition. Lemma 3.10 If a job Jl blocks a higher priority job Jh, \Phi l(t) * ssh holds when Jh arrives at time t. Proof: Suppose \Phi l(t1) ! ssh where t1 is the arrival time of Jh and Jl blocks Jh at time t2, t2 ? t1. Let t2 be the first time Jl blocks Jh. Then, from Theorem 3.9, a job Ji such that ssh ^ ssi ^ CCCPi (t2) = \Phi l(t2) is sending its first initial access notice just before t2. Because \Phi l(t1) ! ssh ^ \Phi l(t2), Jl must be executed sometime between t1 and t2 in order to have \Phi l(t) increased. However, since t2 is the first time Jh is blocked by Jl, Jl cannot be executed in the duration between t1 and t2. This is a contradiction. 2 Lemma 3.11 If we assume tiniti (aex) ! tfini (aey), it must be true that \Phi i(t) * min(OEx; OEy) for tiniti (aex) ! t ^ tfini (aey) in any schedule produced by CCP. Proof: From the definition of \Phi i(t), \Phi i(t) is at least as high as OEx at time t just after tiniti (aex) and is at least as high as OEy at t = tfini (aey). Since \Phi i(t) has only one peak from Theorem 3.1, the lemma holds. 2 Theorem 3.12 A job is blocked by at most one lower priority job. Proof: Suppose Jh is blocked by lower priority jobs Jl1 and Jl2. Then, from Lemma 3.10, \Phi l1(th) * ssh and \Phi l2(th) * ssh where th is the time Jh arrives. Both Jl1 and Jl2 are in their outermost demand sections when they block Jh from Lemma 3.3. Without loss of generality, we can assume that Jl1 enters into its outermost demand section before Jl2 does. Let tl1 and tl2 be the times Jl1 and Jl2 enter into their outermost demand sections. While Jl1 is in its outermost demand section, \Phi l1(t) is at least as high as ssl1 from Lemma 3.11. Therefore, CCCPl2 (t) is also at least as high as ssl1. Since Jl2 enters into its outermost demand section after Jl1 has entered its outermost demand section, ssl2 ? CCCPl2 (tl2) * ssl1. From Lemma 3.4, ssl2 ? CCCPl2 (tl2) = maxl2!k^n \Phi k(tl2): (3.7) 29 From Corollary 3.6, maxl2!k^n \Phi k(t) = maxl2!k^n \Phi k(tl2) for all t, tl2 ! t ^ (completion time of Jl2). Since Jl2 blocks Jh, Jl2 has not completed when Jh arrives. Therefore, maxl2!k^n \Phi k(th) = maxl2!k^n \Phi k(tl2): (3.8) Since l1 ? l2 from the assumption ss1 ? ss2 ? \Delta \Delta \Delta ? ssn and ssl1 ^ ssl2, maxl2!k^n \Phi k(th) * \Phi l1(th) (3.9) From Eq.(3.7), (3.8), and (3.9), and the assumption \Phi l1(th) * ssh, ssl2 ? \Phi l1(th) * ssh: However, Jl2 must have a lower priority than Jh. This is a contradiction. 2 3.3.2 Serializability If CCP does not guarantee serializability, there may be a cycle Jn ! J1 ! \Delta \Delta \Delta ! Jn\Gamma 1 ! Jn in the serialization graph of a schedule generated by CCP. Suppose Jn is the lowest priority job in the cycle. Then, J1, \Delta \Delta \Delta , Jn\Gamma 1 must be executed between two data accesses of Jn. Since the priority ceiling function of Jn has at most one peak, CCP cannot generate this schedule. The detailed proof follows. Lemma 3.13 When two jobs Ji and Jj use the same data item aek, we must have either tfini (aek) ! tinitj (aek) or tfinj (aek) ! tiniti (aek) in any schedule produced by CCP. Proof: Suppose Ji first uses aek and Jj requests aek before the final access instant of Ji on aek. From Lemma 3.11, \Phi i(t) * OEk for tiniti (aek) ! t ^ tfini (aek). Since Jj uses aek, the priority ceiling of aek is at least as high as the priority of Jj , i.e., OEk * ssj. Therefore, \Phi i(t) * ssj for tiniti (aek) ! t ^ tfini (aek), and Jj cannot access aek at least until tfini (aek). Therefore, tfini (aek) ! tinitj (aek) must hold. Similarly, if Jj enters into the resource demand section for aek first, tfinj (aek) ! tiniti (aek) must hold. 2 Theorem 3.14 Any schedule produced by CCP is serializable. 30 Proof: Suppose the serialization graph of a schedule produced by CCP contains a cycle Jn ! J1 ! \Delta \Delta \Delta ! Jn\Gamma 1 ! Jn, where n ? 1 and Jn has the lowest priority. This means there must exist data items aex and aey such that tfinn (aex) ! tinit1 (aex) and tfinn\Gamma 1(aey) ! tinitn (aey) (3.10) from Lemma 3.13. We first show that, for any job Ji, i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1, either none of its accesses to any of the data items is performed before tinitn (aex) or all of its accesses are performed before tinitn (aex). Suppose otherwise, i.e., the first access to data item aej by Ji is performed before tinitn (aex) and the last access to data item aek by Ji is performed after tinitn (aex), i.e., tiniti (aej) ! tinitn (aex) ! tfini (aek): (3.11) From Lemma 3.11, we know that \Phi i(t) * min(OEj; OEk) for tiniti (aej) ! t ^ tfini (aek). min(OEj; OEk) is greater than or equal to the priority of Jn because both aej and aek are used by other jobs in the cycle and Jn has the lowest priority among jobs in the cycle. In other words, Jn cannot access aex in the duration between tiniti (aej) and tfini (aek) since ssn is not high enough. This is a contradiction of Eq.(3.11). Using the same argument and replacing tinitn (aex) by tinitn (aey), we can show that either no access or all accesses of job Ji for i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1 are performed before tinitn (aey). Since tinitn (aex) must be before tfinn (aex) and Eq.(3.10) holds, tinitn (aex) ! tfinn (aex) ! tinit1 (aex). That is, all data accesses by J1 must be after tinitn (aex). Some data accesses of Ji+1 must come after some data accesses of Ji when there is a relation Ji ! Ji+1 in the serialization graph. If all data accesses of Ji come after tinitn (aex), all data accesses of Ji+1 must be after tinitn (aex) as well. Using induction, we can conclude that all data accesses of Jn\Gamma 1 must be performed after tinitn (aex), i.e. tinitn (aex) ! tinitn\Gamma 1(aey) ! tfinn\Gamma 1(aey). Using Eq.(3.10), tinitn (aex) ! tinitn\Gamma 1(aey) ! tfinn\Gamma 1(aey) ! tinitn (aey) ! tfinn (aey): (3.12) Similarly, from tfinn\Gamma 1(aey) ! tinitn (aey), all data accesses of Jn\Gamma 1 must be performed before tinitn (aey). Continuing the same argument, we can conclude that all data accesses of J1 must be before tinitn (aey), i.e., tinit1 (aex) ! tfin1 (aex) ! tinitn (aey). From Eq.(3.10), tinitn (aex) ! tfinn (aex) ! tinit1 (aex) ! tfin1 (aex) ! tinitn (aey) ! tfinn (aey): (3.13) 31 fflHlfl pflhfl pfllfl\Phi fl lfl(fltfl) fl timefltfl bfllhfl 0fl Figure 3.6: Blocking Task However, from Lemma 3.11, \Phi n(t) * min(OEx; OEy) for tinitn (aex) ! t ^ tfinn (aey). If OEx * OEy, \Phi n(t) * OEy for tinitn (aex) ! t ^ tfinn (aey). Since aey is accessed by Jn\Gamma 1, OEy is at least as high as ssn\Gamma 1. Therefore, tinitn\Gamma 1(aey) cannot happen between tinitn (aex) and tfinn (aey), and Eq.(3.12) is not possible. If OEx ! OEy, \Phi n(t) * OEx for tinitn (aex) ! t ^ tfinn (aey). Since aex is accessed by J1, OEx is at least as high as ss1. tinitn (aex) ! tinit1 (aex) ! tfinn (aey) in Eq.(3.13) is not possible. Eq.(3.12) and Eq.(3.13) cannot be satisfied simultaneously. Therefore, no cycle can exist in the serialization graph. 2 3.4 The Schedulability Condition In real-time systems, predictable temporal behaviors of the systems are the most important system design goal. In particular, the system must satisfy all deadlines of tasks if the tasks have hard deadlines. In this section, we discuss the condition for a task set to satisfy all deadlines when the task set is scheduled with CCP. We assume the deadline of a periodic task is at the end of each period. 3.4.1 The Schedulability Condition for the RM Priority Assignment As we have proved in Theorem 3.9, Jh can be blocked by Jl if ssh ^ \Phi l(tb) and Jh itself sends the initial access notice at ta. Jl may block Jh at most for the duration during which ssh ^ \Phi l(t) holds. Let us call the duration the effective blocking unit of Jl against Jh. The effective blocking unit of Tl against Th is shown as the duration specified by filh in Figure 3.6. Since \Phi l(t) has only one peak, each lower priority job Jl has at most one effective blocking unit with length jfilhj against a higher priority job Jh. We call the set of tasks that may block jobs 32 of task Th a blocking task set of task Th, denoted by ih. Task Tl is in ih if ssl ^ ssh ^ OEHl where OEHl is the maximum priority ceiling of data items that Tl uses. Then, the worst case blocking length of Th is given by Bh = 8?!?: maxTl2ih jfi lhj; if ih 6= OE; 0; if ih = OE: A job is blocked at most once by a lower priority job when CCP is used. Therefore, when a task set T1, T2, \Delta \Delta \Delta , Tn is scheduled by CCP with the RM priority assignment, and if we know all Bis, the schedulability condition of the task set is given by iX k=1 ek pk + Bi pi ^ i(2 1=i \Gamma 1); for all i, 1 ^ i ^ n; with similar reasoning as in [SRL90]. Here ek and pk are the execution time and the period of task Tk. This condition has the same form as the condition Eq.(2.5) for PCP with the RM priority assignment. Similarly, we can use Eq.(2.6) as the schedulability condition for CCP with the RM priority assignment. In either condition, shorter worst case blocking lengths are desirable in order to utilize the processor more efficiently. As we will see in Section 3.5, CCP may shorten the worst case blocking lengths. 3.4.2 The Schedulability Condition for General Static Priority Assignment When there is no blocking, a periodic task set is feasible for all task phasings with the RM algorithm if and only if Eq.(2.2): min0!t^p i 0@ iX j=1 ej t & t pj '1A ^ 1; for all i, 1 ^ i ^ n; is satisfied. However, we claim that this condition is applicable for periodic task scheduling with any static priority assignment. Theorem 3.15 Given periodic tasks T1, T2, \Delta \Delta \Delta , Tn with fixed priorities ss1, ss2, \Delta \Delta \Delta , ssn where ss1 ? ss2 ? \Delta \Delta \Delta ? ssn, we can schedule Ti, 1 ^ i ^ n, without missing its deadline for all task phasings if and only if min0!t^p i 0@ iX j=1 ej t & t pj '1A ^ 1: (3.14) 33 Proof: Liu and Layland [LL73] prove that a task has the longest response time when a job of the task becomes active simultaneously with requests of all higher priority tasks. If the longest response time of Ti is within its period pi, Ti is schedulable. Since only higher priority jobs can preempt Ti, we only need to consider T1, T2, \Delta \Delta \Delta , Ti in order to determine the schedulability of Ti. Suppose all T1, T2, \Delta \Delta \Delta , Ti become active at time 0 and Ti completes at time t. Since a job of Ti is executing just before t, all requests by T1, T2, \Delta \Delta \Delta , Ti\Gamma 1 are completed by t. Since a job of Ti also completes at t, the total computation time by T1, T2, \Delta \Delta \Delta , Ti between time 0 and time t is iX j=1 ej & tp j ' : The total computation time is less than or equal to t. In other words, iX j=1 ej t & t pj ' ^ 1: Ti meets its deadline if and only if 0 ! t ^ pi. The theorem follows. 2 If all tasks satisfy the condition of Eq.(3.14), the task set is feasible. Corollary 3.16 Let T1, T2, \Delta \Delta \Delta , Tn be a set of tasks in descending order of fixed priorities. The task set is feasible for all task phasings if and only if min0!t^p i 0@ iX j=1 ej t & t pj '1A ^ 1; for all i, 1 ^ i ^ n: (3.15) 2 Note that the task phasings for the longest response time for all tasks coincide, i.e., a phasing which causes all tasks to experience their longest response time exists. If the worst case task phasing does not occur with a particular task phasing, a task set may be schedulable with that particular task phasing even if the task set does not satisfy Eq.(3.15). The above theorem can be extended to a schedulability condition of a task set scheduled by CCP. Theorem 3.17 Let T1, T2, \Delta \Delta \Delta , Tn be periodic tasks with fixed priorities ss1, ss2, \Delta \Delta \Delta , ssn where ss1 ? ss2 ? \Delta \Delta \Delta ? ssn. Ti has the worst case blocking length Bi. Then, CCP can schedule Ti, 34 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Tjfl Tifl Ti-1fl T1fl Figure 3.7: Worst Case Task Phasing for CCP 1 ^ i ^ n, without missing its deadline for all task phasings if and only if min0!t^p i 0@ iX j=1 ej t & t pj ' + Bi t 1A ^ 1: (3.16) Proof: Let Ti and Tj be two tasks. Ti has a higher priority than Tj and Tj's effective blocking unit causes the worst case blocking length of Ti. Ti's execution is preempted only by all higher priority jobs and at most one lower priority job using CCP. Therefore, we only need to consider T1, T2, \Delta \Delta \Delta , Ti, and Tj in order to determine the schedulability of Ti. Without blocking, Ti experiences the longest response time when Ti arrives simultaneously with all higher priority tasks. When there is a blocking, we only need to consider the effect of the blocking additionally. Since at most one effective blocking unit of a lower priority job is possible before the completion of a job of Ti, Ti experiences the longest response time when a job of Ti becomes active simultaneously with all higher priority tasks and Tj has just entered its effective blocking unit as shown in Figure 3.7. Suppose all T1, T2, \Delta \Delta \Delta , Ti become active at time 0 and a job of Tj enters into its effective blocking unit against Ti just before time 0. Let t be the time the job of Ti completes. Since all requests by T1, T2, \Delta \Delta \Delta , Ti\Gamma 1 must be completed before Ti, the total computation time by jobs of T1, T2, \Delta \Delta \Delta , Ti between time 0 and time t is iX j=1 ej & tp j ' : In addition, since the job of Ti needs to access data items before it can complete, the job of Tj must exit from the effective blocking unit before the job of Ti completes. The length of the effective blocking unit of Tj against Ti is Bi. Therefore, in order to complete execution of the 35 job of Ti, at most the computation time iX j=1 ej & tp j ' + Bi; is required. Since this much computation is completed at time t, the total computation time is less than or equal to t. If t is less than the period of Ti, Ti satisfies its deadline. In other words, Ti meets its deadline for all task phasings if and only if iX j=1 ej t & t pj ' + Bi t ^ 1; for some t, 0 ! t ^ pi: 2 Since a task set is feasible if all the tasks satisfy their deadlines, we have the following corollary. Corollary 3.18 Let T1, T2, \Delta \Delta \Delta , Tn be a periodic task set in descending order of fixed priorities. Ti has the worst case blocking length Bi. Then, the task set is feasible for all task phasings using CCP if and only if min0!t^p i 0@ iX j=1 ej t & t pj ' + Bi t 1A ^ 1; for all i, 1 ^ i ^ n: (3.17) 2 Note that the worst case task phasing for each task may not be the same. Suppose Ti's effective blocking unit causes the worst case blocking for Ti\Gamma 1 and Ti's worst case blocking is caused by the effective blocking unit of Tj. Then, the worst case task phasing for Ti\Gamma 1 occurs as in Figure 3.8(a). On the other hand, Ti's worst case task phasing occurs as in Figure 3.8(b). These two situations cannot occur simultaneously. Therefore, a task phasing which causes all tasks to experience their worst case blocking does not exist for a task set in general. By satisfying Eq.(3.17), all tasks are schedulable for all task phasings even if the worst case task phasing exists. However, if the worst case task phasing does not exist for a task set or the task set is executed in a certain task phasing in which the worst case task phasing does not occur, the task set may be schedulable without satisfying Eq.(3.17). We can similarly prove that Eq.(3.17) is the schedulability condition for PCP. 36 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Tjfl Tifl Ti-1fl T1fl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Tifl Ti-1fl T1fl (b) Worst Case for Tifl (a) Worst Case for Ti-1fl Figure 3.8: Worst Case Task Phasing 3.5 Comparison with PCP+2PL PCP has no-deadlock and single-blocking properties. Since it cannot guarantee serializability, we can combine it with 2PL. In this section, we compare the performance of CCP with the combination of 2PL and PCP. PCP assumes that critical sections are properly nested. However, we can prove that no-deadlock and single-blocking properties still hold in PCP without that assumption if we consider one continuous blocking by one lower priority job as a single blocking. For example, in Figure 3.9, job Jj uses three data items, ae1, ae2, and ae3, and their critical sections 1234567812345678123456789012345678901234561234567890123456789012345612345678901234567890123456 12345678901234567890123456 123456789012123456789012 1234512345 123123 1234567812345678 1234567812345678r1 r3 r2 Jj bji Figure 3.9: Overlapped Critical Sections 37 are overlapped. If ae2 and ae3 have higher priority ceilings than the priority of a job Ji, Jj may block Ji for the duration specified as fiji if PCP is used. For our discussion, we consider blockings of Ti in fiji as one blocking. The proof for the no-deadlock and the single-blocking properties is similar to the one in [CL90]. In this section, we assume jobs do not have properly nested critical sections. Example 3 Figure 3.10 (a) is the same task set as we used previously. The shaded rectangles in the figure are resource demand sections, i.e., at least the shaded part of the execution must be protected by locks if a locking mechanism is used. Figure 3.10 (b) shows the critical sections defined by the combination of 2PL and PCP and effective blocking units defined by CCP. Note that ae1 and ae2 are unlocked only after locking ae3 by 2PL. With the combination algorithm, the worst case blocking lengths of T1 and T2 are both 7 time units. In the worst case, T1 and T2 may be blocked after T3 finishes 1 time unit of execution until T3 finishes 8 time units of execution. For CCP, the worst case blocking lengths of T1 and T2 are 2 time units and 5 time units respectively. This is because T1 may be blocked after T3 finishes 1 time unit of execution until T3 completes 3 time units of execution, while T2 may be blocked after T3 completes 1 time unit of execution until T3 finishes 6 time units of execution. CCP provides shorter worst case blocking and, consequently, a better schedulability condition. Schedules for both algorithms are shown in Figure 3.11 assuming T3 arrives at time 0 and both T1 and T2 arrive at time 2. With the combination algorithm, the first job J1 of the highest priority task T1 is blocked by job J3 of low priority task T3 from time 3 to 9 and J1 misses its deadline. This blocking is shortened using CCP so that J1 meets the deadline. 2 According to 2PL, a job can request locks whenever they are necessary. On the other hand, the job cannot release locks until all locks are acquired. Assume a job accesses a data item with a high priority ceiling early in the execution of the job and does not access it later. Then, the data item with a high priority ceiling is locked early in the execution and the job must hold the lock unnecessarily until all necessary locks are acquired. CCP alleviates this problem. With CCP, a job can request a data access whenever it is necessary, as with 2PL. However, CCP allows other jobs with higher priorities to use some data items even before the rest of the data items are completely locked by the job. CCP allows more flexible scheduling and achieves a higher processor utilization. 38 1234512345 1234512345 12345 12345671234567 1234567812345678 1234567812345678 123123 123 T1 T2 T3 12341234 12341234 1234 123123 123 1234567812345678 1234567812345678 1234567812345678 1234567812345678 r1 r2 r3 0 2 4 6 1234567890123456789012345612345678901234567890123456 12345678901234567890123456 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234567812345678 12345678T3 T3 2PL +PCP CCP (a) Task Set Blocking Length (b) resource demand sections time t 1234512345 8 p1 p3 p1 p2 (critical sections) p3 10 T3 R3(t) ^ \Phi 3(t) Figure 3.10: Example 3 1234512345 1234512345 12345 12345671234567 T1 T2 T3 0 2 4 6 1234567890123456789012312345678901234567890123 12345678901234567890123 8 10 12 T1 T2 T3 0 2 4 6 8 10 12 14 14 (a) 2PL+PCP (b) CCP time time 12341234 1234 16 16 p1 p1 p2 p1 p2 p1 1234512345 1234512345 18 20 12341234 12341234 p1 18 20 1234512345 1234512345 p1 12345671234567 22 22 p3 critical section effective blocking unit p3 missed deadline 1234567812345678 1234567812345678 1234567812345678 1234567812345678 Figure 3.11: Schedule (Example 3) 39 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Resouce DemandflSectionfl Critical Sectionfl modifyflT 3fl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma r2fl f2= p2fl \Gamma \Gamma \Gamma \Gamma r3fl f3= p3fl \Gamma \Gamma f1= p1flr1fl p1fl p2fl p3flp3flp1fl0fl 0fl 0fl 0flR3(t)fl^fl Figure 3.12: Minimum Blocking Lock Request In fact, the combination of 2PL and PCP can achieve the same worst case blocking as CCP if we are allowed to modify the resource demand sections, i.e., the times that lock operations and unlock operations are requested. For example, if we modify the resource demand sections of T3 in Example 3 as shown in Figure 3.12, the worst case blocking lengths of T1 and T2 will be the same as the worst case blocking lengths by CCP shown in Figure 3.10. However, in some real-time systems with mode change, database systems may require dynamic change of their task sets and periods of the tasks. For example, in an air traffic control system, we may be required to shorten the tracking period of an airplane as the airplane is approaching an airport. In a space shuttle, the set of tasks running in the landing process should be different from the set of tasks running while the shuttle is traveling in space. If we use the combination of 2PL and PCP and reduce the worst case blocking length by changing the placement of lock requests and unlock requests, the programs must be modified or different versions of the programs must be used in order to shorten the worst case blocking when the systems change their task sets. CCP can alleviate this difficulty. If a scheduler is given a list of data items that each task accesses and the order of the initial access and final access notices, the scheduler can dynamically construct the priority ceiling function for each task. Based on the priority ceiling functions, the system can use CCP. In this way, CCP can adapt to the dynamic change of the task set and still produce shorter worst case blocking lengths than the combination of 2PL and PCP. 40 Chapter 4 Optimization and Extension of CCP In CCP, tasks have fixed priorities. There are many ways the priorities can be assigned to tasks. Some of them will make the tasks more schedulable. In this chapter, we prove that the RM priority assignment is optimal among static priority assignments. Also, we extend CCP to handle access types on data items such as "read" and "write" in this chapter. 4.1 Optimality of the RM Priority Assignment Liu and Layland [LL73] have proved that the RM priority assignment is optimal among static priority assignments when there are no shared data. The optimality is in the sense that a task set that is not feasible with the RM priority assignment cannot be feasible with any other static priority assignment. In this section, we prove that the RM priority assignment is also optimal when there are shared data and CCP is used. When there are no shared data, we can easily see the optimality from Theorem 3.15. Let T1, T2, \Delta \Delta \Delta , Tn be a feasible task set for all task phasings with priorities ss1, ss2, \Delta \Delta \Delta , ssn where ss1 ? ss2 ? \Delta \Delta \Delta ? ssn. Suppose pi ? pi+1, i.e., Ti and Ti+1 do not follow the RM priority assignment. Since Ti+1 is schedulable for all task phasings, the following condition must hold: min0!t^p i+1 0@ i+1X j=1 ej t & t pj '1A ^ 1: (4.1) Suppose we exchange the priorities of Ti and Ti+1 so that the priority assignment is RM. We must prove that min0!t^p i 0@ i+1X j=1 ej t & t pj '1A ^ 1 (4.2) is satisfied if Ti is to be schedulable for all task phasings after the priority exchange. Since a value derived by taking the minimum over a shorter interval (0; pi+1] is greater than or equal to the minimum value over a longer interval (0; pi] as shown in Figure 4.1, we know, min0!t^p i 0@ i+1X j=1 ej t & t pj '1A ^ min0!t^pi+1 0@ i+1X j=1 ej t & t pj '1A : (4.3) 41 pi+1fl pifl0fl val uefl durationfl Figure 4.1: Minimum Value Function From Eq.(4.1) and (4.3), Eq.(4.2) holds. If condition min0!t^p i+1 0@ i\Gamma 1X j=1 ej t & t pj ' + ei+1 t , t pi+1 ss1A ^ 1 (4.4) is satisfied, Ti+1 is schedulable for all task phasings after the priority exchange. Eq.(4.4) holds because min0!t^p i+1 0@ i\Gamma 1X j=1 ej t & t pj ' + ei+1 t , t pi+1 ss1A ^ min0!t^pi+1 0@ i+1X j=1 ej t & t pj '1A : The schedulability condition of task Tk, k 6= i; i + 1, does not change before and after the priority exchange. Since Tk is schedulable before the priority exchange, it is also schedulable after the exchange. Therefore, all tasks are schedulable after the priority exchange, i.e., the task set is schedulable with the RM priority assignment. However, the optimality of the RM priority assignment is not obvious when blockings caused by CCP exist. When there are blockings, schedulability conditions of Ti and Ti+1 become as follows: min0!t^p i 0@ iX j=1 ej t & t pj ' + Bi t 1A ^ 1 and min0!t^pi+1 0@ i+1X j=1 ej t & t pj ' + Bi+1 t 1A ^ 1: Exchanging priorities of Ti and Ti+1 changes Bi and Bi+1 to B0i and B0i+1. Hence, it is not obvious if schedulability conditions of Ti and Ti+1 after exchanging the priorities can be satisfied without further investigation on how exchanging priorities affects the value of Bi in general. For the rest of this section, we investigate how priority exchange may affect Bi, and thus the schedulability condition. 42 4.1.1 Worst Case Blocking Lengths of CCP Let Ri be the set of data items that Ti accesses. Let RUi be the set of data items accessed by tasks whose priorities are higher than or equal to task Ti and RLi be the set of data items accessed by lower priority tasks than Ti. Thus, Ri `RUi . We call RUi and RLi the upper resource set and the lower resource set of Ti, respectively. Let RBi be the blocking resource set of Ti which is the set of data items that may cause jobs of Ti to be blocked. In other words, RBi is the set of data items that are accessed by lower priority tasks than Ti and have priority ceilings higher than or equal to the priority of Ti. Thus, RBi = RUi " RLi : For example, suppose T1, T2, and T3 are three tasks with priorities ss1 ? ss2 ? ss3. T1 uses data item ae1, T2 uses ae2 and ae3, and T3 uses ae3. Then, RU1 = fae1g, RL1 = fae2; ae3g, RB1 = OE, RU2 = fae1; ae2; ae3g, RL2 = fae3g, RB2 = fae3g, RU3 = fae1; ae2; ae3g, RL3 = OE, and RB3 = OE. Using RBi , we can define the blocking task set as follows: the blocking task set ii of Ti is a set of lower priority tasks that use data items in RBi . Let T Li be the set of tasks whose priorities are lower than Ti's. Then, ii = fTkjTk is in T Li and accesses aej 2 RBi g = fTkjTk is in T Li and accesses aej 2 (RUi " RLi )g (4.5) Since aej is in RLi if aej is accessed by Tk 2 T Li , we can rewrite Eq.(4.5) as follows: ii = fTkjTk is in T Li and accesses aej 2 RUi g: (4.6) Lemma 4.1 The effective blocking unit fiji of Tj 2 ii against Ti depends only on RUi and the resource demand sections of Tj. fiji depends on neither Ti's resource demand sections nor Ri. Proof: From the definition of the effective blocking unit, fiji is defined by the duration that \Phi j(t) * ssi. From the definition of \Phi j(t), the beginning of fiji is defined by the first initial access instant on all data items in RUi and the end of fiji is defined by the last final access instant on all data items in RUi . Therefore, fiji is defined only by RUi and the resource demand sections of Tj. Ti's resource demand sections have nothing to do with fiji. Ri may affect fiji since Ri `RUi . However, as long as RUi has the same elements, fiji does not change even if the elements of Ri change. 2 43 Lemma 4.2 Let T1, T2, \Delta \Delta \Delta , Tn be tasks in descending order of priorities and T1 have the highest priority. If Tj is in both ii and ii+1, then jfijij ^ jfij(i+1)j: Proof: From Lemma 4.1, given a task Tj 2 ik, jfijkj depends only on RUk and the resource demand sections of Tj. Since RUi+1 = RUi [ Ri+1 ' RUi from the definition of RUi , the set of resource demand sections of Tj for data items in RUi+1 contains the resource demand sections for data items in RUi . This means that the duration in which \Phi j (t) is greater than or equal to ssi+1 is not shorter than the duration in which \Phi j(t) is greater than or equal to ssi, i.e., jfijij ^ jfij(i+1)j. 2 Lemma 4.3 Let T1, T2, \Delta \Delta \Delta , Tn be tasks in descending order of fixed priorities. If the worst case blocking lengths Bi and Bi+1 of tasks Ti and Ti+1 have a relation Bi ? Bi+1, then Bi = jfi(i+1)ij. Proof: Since T Li = T Li+1 [ fTi+1g, ii can be decomposed as ii = fTkjTk is in T Li and accesses aej 2 RUi g = fTkjTk is in T Li+1 and accesses aej 2 RUi g [ fTkjTk = Ti+1 and accesses aej 2 RUi g: Since RUi+1 = RUi [ Ri+1, ii+1 can be decomposed as ii+1 = fTkjTk is in T Li+1 and accesses aej 2 RUi+1g = fTkjTk is in T Li+1 and accesses aej 2 RUi g [fTkjTk is in T Li+1 and accesses aej 2 Ri+1g: Let iC, i0, and i00 be the sets fTkjTk is in T Li+1 and accesses aej 2 RUi g, fTkjTk = Ti+1 and accesses aej 2 RUi g, and fTkjTk is in T Li+1 and accesses aej 2 Ri+1g, respectively. Then, the worst case blocking length of Ti is Bi = 8??!??: maxTj2ii jfi jij = max maxT j2iC jfijij, maxTj2i0 jfijij! ; if ii 6= OE; 0; if ii = OE: 44 Also, the worst case blocking length of Ti+1 is Bi+1 = 8??!??: maxTj2ii+1 jfi j(i+1)j = max maxT j2iC jfij(i+1)j, maxTj2i00 jfij(i+1)j! ; if ii+1 6= OE; 0; if ii+1 = OE: Since iC ` ii and iC ` ii+1, Tj 2 iC is in both ii and ii+1. From Lemma 4.2, jfijij ^ jfij(i+1)j. Since maxT j2iC jfijij and maxTj2iC jfij(i+1)j are taking the maximum over the same set of tasks, max Tj2iC jfijij ^ maxTj2iC jfij(i+1)j: Case 1: Suppose ii+1 = OE. Then, Bi must be equal to maxT j2i0 jfijij and i 0 6= OE in order to have Bi ? Bi+1 since iC = OE. Case 2: Suppose Bi+1 = max Tj2iC jfij(i+1)j. Since Bi ? Bi+1 = max Tj2iC jfij(i+1)j * maxTj2iC jfijij; Bi must be equal to maxT j2i0 jfijij and i 0 6= OE. Case 3: Suppose Bi+1 = maxT j2i00 jfij(i+1)j. Then, maxTj2iC jfij(i+1)j ^ maxTj2i00 jfij(i+1)j. Since maxT j2iC jfijij ^ maxTj2iC jfij(i+1)j ^ maxTj2i00 jfij(i+1)j = Bi+1; Bi must be equal to maxT j2i0 jfijij and i 0 6= OE in order to have Bi ? Bi+1. In all cases, Bi must be equal to maxT j2i0 jfijij and i 0 cannot be empty. Therefore, i0 must be fTi+1g and Bi = jfi(i+1)ij. 2 4.1.2 Optimality of the RM Priority Assignment in CCP Lemma 4.4 Let T1, T2, \Delta \Delta \Delta , Tn be tasks in descending order of fixed priorities and Ti and Ti+1 be two tasks among them. Suppose we exchange the priorities of Ti and Ti+1. Using CCP, the worst case blocking length Bi+1 of Ti+1 before exchanging priorities is equal to the worst case blocking length B0i of Ti after the priority exchange. Proof: RUi+1 is the upper resource set of Ti+1. Ti has a lower priority than Ti+1 after exchanging priorities of the two tasks. Let RU0i be the upper resource set of Ti after exchanging the priorities. Since RUi+1 includes data items used by both Ti+1 and Ti, it contains the same data items as RU0i . Also, the set T Li+1 of tasks with lower priorities than Ti+1 before the priority exchange is equal 45 to the set T L0i of tasks with lower priorities than Ti after the priority exchange. Therefore, the blocking task set ii+1 of Ti+1 before the priority exchange is equal to the blocking task set i0i of Ti after the priority exchange: ii+1 = fTkjTk is in T Li+1 and accesses aej 2 RUi+1g = fTkjTk is in T L0i and accesses aej 2 RU0i g = i0i: From Lemma 4.1, jfij(i+1)j for all Tj 2 ii+1 only depends on RUi+1 and the resource demand sections of Tj. Similarly, jfi0jij for all Tj 2 i0i only depends on RU0i and the resource demand sections of Tj where fi0ji is the effective blocking unit of Tj against Ti after the priority exchange. Since RUi+1 = RU0i , jfij(i+1)j = jfi0jij for all Tj 2 ii+1 = i0i. Therefore, Bi+1 = maxT j2ii+1 jfij(i+1)j = maxTj2i0i jfi 0jij = B0i: 2 Theorem 4.5 Let T1, T2, \Delta \Delta \Delta , Tn be a periodic task set in descending order of priorities. The task set can be feasibly scheduled using CCP for all task phasings. Suppose Ti for some i, 1 ^ i ! n, has a longer period than Ti+1, i.e., pi ? pi+1. This task set is still feasible using CCP after exchanging the priorities of Ti and Ti+1. Proof: Since Ti+1 is schedulable for all task phasings before the priority exchange, min0!t^p i+1 0@ i+1X j=1 ej t & t pj ' + Bi+1 t 1A ^ 1 (4.7) from Theorem 3.17. After exchanging the priorities of Ti and Ti+1, if both min0!t^p i 0@ i+1X j=1 ej t & t pj ' + B0i t 1A ^ 1, and (4.8) min0!t^p i+1 0@ i\Gamma 1X j=1 ej t & t pj ' + ei+1 t , t pi+1 ss + B0i+1 t 1A ^ 1 (4.9) are satisfied, both Ti and Ti+1 are schedulable. Here, B0i and B0i+1 are the worst case blocking lengths of Ti and Ti+1 after the priority exchange. 46 From Lemma 4.4, Bi+1 = B0i. Therefore, min0!t^p i 0@ i+1X j=1 ej t & t pj ' + B0i t 1A = min0!t^pi 0@ i+1X j=1 ej t & t pj ' + Bi+1 t 1A : (4.10) Since pi ? pi+1, a value derived by taking minimum over the interval 0 ! t ^ pi+1 is greater than or equal to a value derived by taking minimum over the interval 0 ! t ^ pi, i.e., min0!t^p i 0@ i+1X j=1 ej t & t pj ' + Bi+1 t 1A ^ min0!t^pi+1 0@ i+1X j=1 ej t & t pj ' + Bi+1 t 1A : (4.11) From Eq.(4.7), (4.10), and (4.11), min0!t^p i 0@ i+1X j=1 ej t & t pj ' + B0i t 1A ^ 1; i.e., Eq.(4.8) holds. By comparing Eq.(4.7) and (4.9), if ei t , t pi ss + Bi+1 t * B0i+1 t (4.12) is satisfied, Eq.(4.9) holds. Suppose B0i+1 ^ B0i. Then, B0i+1 ^ Bi+1 from Lemma 4.4. Therefore, B0i+1 t ^ Bi+1 t ! ei t , t pi ss + Bi+1 t : Eq.(4.12) holds. Suppose B0i+1 ? B0i. From Lemma 4.3, Ti's effective blocking unit against Ti+1 defines B0i+1, i.e., B0i+1 = jfi0i(i+1)j where fi0i(i+1) is the effective blocking unit of Ti against Ti+1 after the priority exchange. Since the effective blocking unit of Ti must be less than or equal to the execution time of Ti, ei * jfi0i(i+1)j = B0i+1. Therefore, B0i+1 t ^ ei t ! ei t , t pi ss + Bi+1 t : Eq.(4.12) holds. Therefore, both Eq.(4.8) and (4.9) are satisfied. Both Ti and Ti+1 are schedulable after the priority exchange. Since task Tk, k 6= i; i + 1, is schedulable before the priority exchange, min0!t^p k 0@ kX j=1 ej t & t pj ' + Bk t 1A ^ 1 (4.13) 47 is satisfied. Let B0k and i0k be the worst case blocking length and the blocking task set of Tk after the priority exchange, respectively. Since T Lk and RUk do not change after the priority exchange, i0k = fTxjTx is in T L0k and accesses aey 2 RU0k g = fTxjTx is in T Lk and accesses aey 2 RUk g = ik where T L0k is the set of tasks whose priorities are lower than Tk and RU0k is the upper resource set of Tk after the priority exchange. jfijkj only depends on Tj and RUk and jfi0jkj only depends on Tj and RU0k . Since RU0k = RUk , jfi0jkj = jfijkj for all Tj 2 i0k = ik. Therefore, B0k = maxT j2i0k jfi 0jkj = max Tj2ik jfijkj = Bk: Tk is schedulable after the priority exchange because min0!t^p k 0@ kX j=1 ej t & t pj ' + B0k t 1A = min0!t^pk 0@ kX j=1 ej t & t pj ' + Bk t 1A ^ 1: (4.14) Since Eq.(4.8) and (4.9) are satisfied and Eq.(4.14) holds, the task set is feasible after the priority exchange. 2 Theorem 4.5 tells us that if a task set is feasible with a certain static priority assignment, the task set is also feasible after exchanging priorities which do not follow the RM priority assignment. Therefore, the RM priority assignment is optimal among static priority assignments when CCP is used. A similar argument is also applicable to PCP. 4.2 Type-Specific CCP In database systems, read accesses and write accesses may be distinguished in order to increase concurrency. A read access to a data item does not conflict with other read accesses to the same data item. On the other hand, a write access conflicts with both read and write accesses. If two tasks only access data items with the read access type, they can be freely interleaved without a serializability violation. In general, a data item can be accessed by several different access types. For example, a data item may be accessed by write, read, or increment type. The 48 Table 4.1: Compatibility Matrix Read Write Increment Read y n n Write n n n Increment n n y conflict of access types can be described by a compatibility matrix. Table 4.1 shows an example of the compatibility matrix. In this chapter, we describe an extension of CCP for a system that has several access types for data accesses. We assume in this section that all tasks are periodic and are assigned fixed priorities. 4.2.1 Algorithm We define a demand class as a class of data accesses on the same data item with the same access type. It serves as a handle for concurrency control since we need to know both the data item and the access type in order to decide if an access is in conflict with another access. A demand class can be expressed as a pair (data item identifier, access type). (ae1, write) expresses a demand class that accesses data item ae1 with write access type. Also, we use Di to specify i-th demand class. A demand class (aex, Ax) conflicts with another demand class (aey, Ay) if aex is the same data item as aey and the access types Ax and Ay conflict. We define the initial and the final access instants of a job for each demand class. A resource demand section is the duration between an initial access instant and the corresponding final access instant. The resource demand sections are classified into the demand classes. For each demand class, we define a priority ceiling as follows: Definition: The priority ceiling OEi of a demand class Di is defined by the maximum priority of tasks that have resource demand sections in conflict with Di. 2 For example, a task set in Figure 4.2 has three tasks and T1 has the highest priority and T3 has the lowest priority. Each task has resource demand sections as shown in the figure. Suppose the compatibility matrix in Table 4.1 is applicable among the access types. Demand class Da which accesses data item ae1 with "read" access type only conflicts with demand class Db which accesses ae1 with "write" access type. Since T3 has the highest priority among tasks 49 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Da: (r1, read)fl Db: (r1, write)fl Dc: (r2, read)fl Dd: (r2, write)fl De: (r3, read)fl T1fl T2fl T3fl 0fl 2fl 4fl 6fl timefl fa=p3fl fb=p1fl fc=p1fl fd=p1fl fe=0fl resourcefldemandfl sectionfl Figure 4.2: Example Task Set with resource demand sections of Db, the priority ceiling of Da is ss3. Db conflicts with both Da and Db itself. Since T1 has the highest priority among tasks with resource demand sections of Da and Db, the priority ceiling of Db is ss1. The priority ceilings of Dc, Dd, and De are ss1, ss1, and 0 respectively. We call the convex ceiling protocol which differentiates access types the type-specific convex ceiling protocol (TCCP). Based on the definition of priority ceilings given above, we can define the maximum remainder priority in a similar way to CCP as follows: Definition: The maximum remainder priority \Omega i(t) of task Ti is the maximum priority ceiling of all demand classes of which task Ti has resource demand sections after time t. 2 Let tiniti (Dj) and tfini (Dj) be the initial and the final access instants of Ji on Dj. Then, the priority ceiling function of a task Ti for TCCP is defined as follows. Priority Ceiling Function of TCCP: \Phi i(t) (\Phi 1) When a job of task Ti starts execution, \Phi i(t) = 0. (\Phi 2) At t = tiniti (Dj), if \Phi i(t) is less than the priority ceiling of Dj, \Phi i(t) is set to the priority ceiling of Dj. (\Phi 3) At t = tfini (Dj), if \Phi i(t) ? \Omega i(t), \Phi i(t) is set to \Omega i(t). 2 For example, the maximum remainder priority and the priority ceiling function of T3 in Figure 4.2 are as shown in Figure 4.3. We can prove that \Phi i(t) has at most one peak in the same way as in Theorem 3.1. We use CT CCPi (t) to express the maximum \Phi j(t) of all job Jjs in the system except Ji itself. We can define TCCP as follows. 50 12341234 1234 123456123456 123456123456 123456123456T3 1234567812345678 1234567812345678 12345678 p1 p2 p3 \Omega 3(t) time t 0 12341234 1234 123456123456 123456 Da Db 12341234De fa=p3 fb=p1 fe= 0 0 2 4 6 12341234 1234 123456123456 123456123456 123456123456T3 1234567812345678 1234567812345678 12345678 p1 p2 p3 \Phi 3(t) time t 0 0 2 4 6 Maximum Remainder Priority Priority Ceiling Function Figure 4.3: Maximum Remainder Priority and Priority Ceiling Function Type-Specific Convex Ceiling Protocol (TCCP) 1. (Scheduling Algorithm) A static priority assignment priority-driven algorithm. 2. (Concurrency Control Algorithm) When a job Ji requests the first data access to a data item, (a) if Ji's original priority ssi is higher than CT CCPi (t), Ji is allowed to proceed to the initial access instant on the data item. (b) Otherwise, Ji is suspended until ssi ? CT CCPi (t) is satisfied. 3. (Interaction Protocol) If Ji is suspended, the augmented priority ss\Lambda i of Ji is assigned to job Jk that has the maximum \Phi k(t). 2 With TCCP, the example task set in Figure 4.2 may be scheduled as in Figure 4.4. T3 becomes active at time 0, and enters into a resource demand section of Da at time 1. At time 2, T2 becomes active and preempts T1. At time 3, T2 requests to enter a resource demand section of Dc. Since C2(3) is ss3, T2 enters into the resource demand section. At time 4, T1 becomes active and preempts T2. T1 requests to enter a resource demand section of Dc at time 5, but since C1(5) = ss1, it is blocked. T2 resumes execution and exits from the resource demand section of Dc at time 6. Now, C1(6) = ss3 and T1 resumes execution. T1 enters into its resource demand section of Dc at time 6. T1 also requests to enter into a resource demand section of Da 51 12341234 12341234 1234 123123 123 123123 123 123123 123 1234512345 1234512345 123456789123456789 123456789123456789 123456789123456789 123456789123456789 1234567812345678 12345678 123456123456 123456123456 123456123456 123456123456 1234512345 1234512345 12345 123123 123123 0 2 4 6 8 10 12 14 16 18 time T1 T3 T2 1234512345 12345Da: (r1,read) 123456123456 123456Db: (r1,write) 123123Dc: (r2,read) 1234567812345678 1234567812345678 12345678Dd: (r2,write) 12341234De: (r3,read) Figure 4.4: Example Schedule of TCCP at time 7. Although T3 is in the resource demand section of Da, i.e., T3 is using resource ae1 in "read" access type, T1 is allowed to "read" as well. 4.2.2 Properties of TCCP We can prove the no-deadlock and the single-blocking properties of TCCP in the same way as in Chapter 3. Also, the schedulability condition for CCP is applicable for TCCP. The only difference in the schedulability condition is the length of the worst case blocking. In the rest of this section, we prove the serializability property of TCCP with a similar proof strategy to the one for CCP. Lemma 4.6 Let Dx and Dy be any demand classes that Ji has resource demand sections of. If we assume tiniti (Dx) ! tfini (Dy), it must be true that \Phi i(t) * min(OEx; OEy) for tiniti (Dx) ! t ^ tfini (Dy) in any schedule produced by TCCP. Proof: From the definition of \Phi i(t), \Phi i(t) is at least as high as OEx at time t just after tiniti (Dx) and is at least as high as OEy at t = tfini (Dy). Since \Phi i(t) has only one peak, the lemma holds. 2 Lemma 4.7 If job Ji has a resource demand section of demand class Dx, job Jj has a resource demand section of demand class Dy, and Dx and Dy conflict, we must have either tfini (Dx) ! tinitj (Dy) or tfinj (Dy) ! tiniti (Dx) in any schedule produced by TCCP. 52 Proof: Suppose Jj requests to enter into the resource demand section of demand class Dy after Ji enters into the resource demand section of demand class Dx but before the final access instant of Ji on Dx. From Lemma 4.6, \Phi i(t) * OEx for tiniti (Dx) ! t ^ tfini (Dx). Since Jj has a resource demand section of demand class Dy which conflicts with Dx, the priority ceiling of Dx is at least as high as the priority of Jj , i.e., OEx * ssj. Therefore, \Phi i(t) * ssj for tiniti (Dx) ! t ^ tfini (Dx), and Jj cannot enter the resource demand section of Dy in the duration. Therefore, tfini (Dx) ! tinitj (Dy) must hold. Similarly, if Jj enters the resource demand section of demand class Dy first, tfinj (Dy) ! tiniti (Dx) must hold. 2 Theorem 4.8 Any schedule produced by TCCP is serializable. Proof: Suppose the serialization graph contains a cycle Jn ! J1 ! \Delta \Delta \Delta ! Jn\Gamma 1 ! Jn where n ? 1 and Jn has the lowest priority. It means that Jn has a resource demand section of demand class Da, J1 has a resource demand section of demand class Db, and Da and Db conflict. Also, Jn\Gamma 1 has a resource demand section of demand class Dc, Jn has a resource demand section of demand class Dd, and Dc and Dd conflict. From Lemma 4.7, accesses on those resource demand sections have the following precedence: tfinn (Da) ! tinit1 (Db) and tfinn\Gamma 1(Dc) ! tinitn (Dd): (4.15) We first show that either none or all of the resource demand sections in each job Ji, i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1 that conflict with other resource demand sections in job Jj , j = 1; 2; \Delta \Delta \Delta ; n and j 6= i, must be before tinitn (Da). Suppose otherwise. A job Ji, i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1, has two resource demand sections that conflict with other resource demand sections in job Ji, j = 1; 2; \Delta \Delta \Delta ; n and j 6= i. One of them starts before tinitn (Da) and belongs to demand class Dx. The other ends after tinitn (Da) and belongs to demand class Dy. Then, tiniti (Dx) ! tinitn (Da) ! tfini (Dy): (4.16) Since both Dx and Dy conflict with resource demand sections in job Jj, j = 1; 2; \Delta \Delta \Delta ; n and j 6= i, and Jn has the lowest priority, min(OEx; OEy) * ssn. Therefore, \Phi i(t) * ssn for tiniti (Dx) ! t ^ tfini (Dy) from Lemma 4.6. In other words, Jn cannot enter the resource demand section of Da after tiniti (Dx) and before tfini (Dy) since ssn is not high enough. This is a contradiction to Eq.(4.16). 53 Using the same argument and replacing tinitn (Da) by tinitn (Dd), we can show that either no or all resource demand sections of each job Ji, i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1, that conflict with other resource demand sections in job Jj , j = 1; 2; \Delta \Delta \Delta ; n and j 6= i, must be before tinitn (Dd). From Eq.(4.15), we know that tinitn (Da) ! tfinn (Da) ! tinit1 (Db). That is, all resource demand sections in J1 that conflict with other resource demand sections in Jj, i = 2; 3; \Delta \Delta \Delta ; n must be after tinitn (Da). A resource demand section of Ji+1 that conflicts with a resource demand section of job Ji must come after the resource demand section of Ji when there is a relation Ji ! Ji+1 in the serialization graph. If all resource demand sections of Ji that conflict with resource demand sections of job Jj , j = 1; 2; \Delta \Delta \Delta ; n and i 6= j, come after tinitn (Da), all resource demand sections of Ji+1 that conflict with resource demand sections of job Jj , j = 1; 2; \Delta \Delta \Delta ; n and j 6= i + 1, must be after tinitn (Da) as well. Using induction, we can conclude that all resource demand sections of Jn\Gamma 1 that conflict with resource demand sections of job Jj, j = 1; 2; \Delta \Delta \Delta ; n \Gamma 2; n must be after tinitn (Da), i.e. tinitn (Da) ! tinitn\Gamma 1(Dc) ! tfinn\Gamma 1(Dc). Using Eq.(4.15), tinitn (Da) ! tinitn\Gamma 1(Dc) ! tfinn\Gamma 1(Dc) ! tinitn (Dd) ! tfinn (Dd): (4.17) Similarly, from tfinn\Gamma 1(Dc) ! tinitn (Dd), all resource demand sections of Jn\Gamma 1 that conflict with resource demand sections of job Jj , j = 1; 2; \Delta \Delta \Delta ; n \Gamma 2; n, must be before tinitn (Dd). Continuing the same argument, we can conclude that all resource demand sections of J1 that conflict with resource demand sections of job Jj, j = 2; 3; \Delta \Delta \Delta ; n, must be before tinitn (Dd), i.e., tinit1 (Db) ! tfin1 (Db) ! tinitn (Dd). From Eq.(4.15), tinitn (Da) ! tfinn (Da) ! tinit1 (Db) ! tfin1 (Db) ! tinitn (Dd) ! tfinn (Dd): (4.18) However, from Lemma 4.6, \Phi n(t) * min(OEa; OEd) for tinitn (Da) ! t ^ tfinn (Dd). Suppose OEa * OEd. Dd conflicts with Dc. Since Jn\Gamma 1 has a resource demand section of Dc, OEd * ssn\Gamma 1. Therefore, \Phi n(t) * ssn\Gamma 1 for tinitn (Da) ! t ^ tfinn (Dd). Eq.(4.17) is not possible since tinitn\Gamma 1(Dc) cannot happen between tinitn (Da) and tfinn (Dd). Suppose OEa ! OEd. Since Da conflicts with Db and J1 has a resource demand section of Db, OEa * ss1. Therefore, \Phi n(t) * ss1 for tinitn (Da) ! t ^ tfinn (Dd). Then, Eq.(4.18) is not possible since tinit1 (Db) cannot happen between tinitn (Da) and tfinn (Dd). Eq.(4.17) and (4.18) cannot be satisfied simultaneously. Therefore, no cycle can exist in the serialization graph. 2 54 Chapter 5 Dynamic Priority Convex Ceiling Protocol CCP as defined in Chapter 3 uses static priority assignment for scheduling. However, dynamic priority assignment schemes may achieve higher processor utilization without missing deadlines. In this chapter, we extend CCP in order to use a dynamic priority scheduling algorithm, the EDF algorithm. Also, we include aperiodic tasks into consideration in this algorithm. 5.1 Algorithm The EDF algorithm assigns priorities to jobs according to their deadlines. A job with an earlier deadline is assigned a higher priority. Hence, jobs from the same periodic task have different priorities. For aperiodic tasks, we do not know their ready times and deadlines, and consequently priorities, before the arrival of the tasks. Therefore, we cannot assign priority ceilings to data items statically. This dynamically changing priority ceiling imposes difficulties on CCP. For example, suppose a new job Jj becomes active while job Ji is accessing data item aek. If Jj also uses aek and Jj has an earlier deadline, the priority ceiling of aek, and consequently \Phi i(t), suddenly increases to Jj 's priority. Also, every time a higher priority job arrives, priority ceilings, maximum remainder priorities, and priority ceiling functions must be re-calculated. However, as pointed out by Chen and Lin [CL90] and by Baker [Bak90], a job can be blocked only by jobs with longer relative deadlines when EDF is used. In general, we may know the relative deadlines of all tasks from the system specification even though we may not know exactly when jobs become active. In this chapter, we assume all jobs of a task have the same relative deadlines. By using the relative deadlines to define the priority ceilings, and consequently the maximum remainder priorities and the priority ceiling functions, we can modify CCP to accommodate EDF. A task with a shorter relative deadline is assigned a higher preemption level. The concept of preemption level was proposed in [Bak90]. We express the preemption level of task Ti by *i. 55 A task Ti with a shorter relative deadline is assigned a higher preemption level *i. We use positive numbers as the preemption levels, i.e., *i ? 0 for all i. All jobs of a task have the same preemption level. The priority ceiling and the maximum remainder priority are defined as follows. Definition: The priority ceiling of a data item aej is the maximum preemption level of tasks that access aej. 2 Definition: The maximum remainder priority \Omega i(t) of a task Ti is the maximum priority ceiling of all data items that Ti will access after time t. If no data item will be accessed after time t, \Omega i(t) = 0. 2 The priority ceiling function \Phi i(t) of a task Ti can be defined in the same way as in CCP. We use tiniti (aej) and tfini (aej) to express the initial access instant and the final access instant of a job Ji of Ti on aej as before. The duration between an initial access instant and the corresponding final access instant is a resource demand section. Also, the duration between the first initial access instant and the last final access instant of a job is the outermost demand section of the job. Priority Ceiling Function: \Phi i(t) (\Phi 1) When a job of Ti starts execution, \Phi i(t) = 0. (\Phi 2) At t = tiniti (aej), if \Phi i(t) is less than the priority ceiling of aej, \Phi i(t) is set to the priority ceiling of aej. (\Phi 3) At t = tfini (aej), if \Phi i(t) ? \Omega i(t), \Phi i(t) is set to \Omega i(t). 2 We can prove that \Phi i(t) has at most one peak as in Theorem 3.1. Let CECCPi (t) be the maximum \Phi j(t) of all jobs except Ji itself at time t. With this modification, we can define the EDF based CCP as follows. EDF Based Convex Ceiling Protocol (ECCP) 1. (Scheduling Algorithm) The EDF algorithm. 2. (Concurrency Control Algorithm) When job Ji requests the first data access to a data item, 56 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma timefl 4fl0fl 2fl T1fl T2fl T3fl r2fl r1fl (a) Task Setfl f1= l1fl f2= l2fl timefl 4fl0fl 2fl T1fl T2fl T3fl (b) Priority Ceiling Functionfl 0fl 0fl 0fl 0fl l1fl l1fl l2fl l2fl timefl 4fl0fl 2fl T1fl T2fl T3fl (c) Schedulefl 0fl 0fl l1fl l1fl l2fl 0fl 0fll2fl l1fl 6fl 8fl 10fl 12fl 14fl 16fl Figure 5.1: Example 4 (a) if Ji's preemption level *i is higher than CECCPi (t), Ji is allowed to proceed to the initial access instant on the data item. (b) Otherwise, Ji is suspended until *i ? CECCPi (t) is satisfied. 3. (Interaction Protocol) If Ji is suspended, the augmented priority ss\Lambda i of Ji is assigned to job Jk that has the maximum \Phi k(t). 2 As in Chapter 3, we assume that each job sends the scheduler a message indicating the first access to a data item before the first access and a message indicating the last access to the data item after the last access. We call those messages initial access notices and final access notices as before. Example 4 In Figure 5.1, there are three tasks T1, T2, and T3. As shown in Figure 5.1(a), T1 accesses data item ae1, T2 accesses data item ae2, and T3 accesses both ae1 and ae2. T1 arrives at 57 time 1 and its deadline is time 12. T2 arrives at time 3 and its deadline is time 17. T3 arrives at time 0 and its deadline is time 16. The preemption levels of T1, T2, and T3 are *1, *2, and *3 respectively. Since the relative deadlines of T1, T2, and T3 are 11, 14, and 16 respectively, T1 has the highest preemption level and T3 has the lowest, i.e., *1 ? *2 ? *3. The priority ceiling OE1 of ae1 is *1 and the priority ceiling OE2 of ae2 is *2. The priority ceiling functions of tasks are as shown in Figure 5.1(b). In the priority ceiling function of T3, the value of the function is kept at *2 between times 2 and 3 although no data item is used at that time. The schedule is as shown in Figure 5.1(c). When T1 arrives at time 1, T1 preempts T3 because the deadline of T1 is earlier than the deadline of T3. When T1 sends the initial access notice for ae1 at time 2, T1 is blocked since CECCP1 (2) = *1 and T1's preemption level is *1. T3 inherits the priority of T1 and resumes execution. T3 sends the final access notice for ae1 at time 3 and CECCP1 (3) is now *2. The priority of T3 returns to its original priority. T1 enters into its resource demand section for ae1 and exits from it at time 4. T2 is ready at time 4. However, T2 cannot execute because T2 has a later deadline than T3, i.e., T2 has a lower priority than T3, although T2 has a higher preemption level. After T3 completes, T2 starts to execute. 2 5.2 Properties of ECCP ECCP has the same three properties as CCP. ECCP guarantees no deadlock. A job is blocked by at most one lower priority job. ECCP produces serializable schedules. In this section, we prove these properties. 5.2.1 Single-Blocking and No-Deadlock Properties In the following discussion, we assume tasks, T1, T2, \Delta \Delta \Delta , Tn, are sorted in descending order of preemption levels and T1 has the highest preemption level. Also, we assume, for simplicity of proofs, that there is no tie in preemption levels. The strategies to prove the single-blocking and the no-deadlock properties are the same as the proofs for CCP, but the preemption levels require additional consideration. Lemma 5.1 After a job Ji starts its execution and before Ji completes, every active job Jl with a lower preemption level than Ji has a lower original priority than Ji. 58 Jifl Jlfl longer relative deadlinefl earlier deadlinefl timefl Figure 5.2: Deadline and Relative Deadline Proof: When Ji starts to execute, Ji has the highest augmented priority ss\Lambda i = ssi in the system. Suppose Jl with *l ! *i has a higher original priority after Ji starts execution and before Ji completes. It means Jl has an earlier deadline. Since *l ! *i, Jl has a longer relative deadline than Ji. Therefore, Jl must be active when Ji arrives, as shown in Figure 5.2. Since the augmented priority is at least as high as the original priority of the job, it means there is at least one job Jl with a higher augmented priority than Ji when Ji starts execution. This is a contradiction and the lemma holds. 2 Lemma 5.2 A job Jh is blocked by a lower priority job Jl at time tb if and only if the following four conditions are satisfied: 1. a job Ji with its augmented priority ss\Lambda i higher than or equal to ss\Lambda h sends an initial access notice at some time ta, ta ! tb, 2. *i ^ CECCPi (t), for all t, ta ^ t ^ tb, 3. CECCPi (tb) = \Phi l(tb), and 4. ss\Lambda l is the highest priority in the system at tb. Proof: From the definition of the blocking, job Jh is blocked by job Jl when Jh is waiting while Jl is executing. With ECCP, this situation occurs if and only if Jl inherits an augmented priority higher than or equal to the augmented priority ss\Lambda h of Jh and it is the highest augmented priority in the system. From the definition of ECCP, Jl inherits an augmented priority higher than or equal to ss\Lambda h at time tb if and only if a job Ji with its augmented priority ss\Lambda i higher than or equal to ss\Lambda h sends an initial access notice at some time ta, ta ! tb, *i ? CECCPi (t) is not 59 satisfied between ta and tb, and CECCPi (tb) = \Phi l(tb): Therefore, the lemma holds. 2 Lemma 5.3 If a job has not entered its outermost demand section, the job cannot block any other jobs with higher original priorities. Proof: From Lemma 5.2, a job Jh is blocked by a lower priority job Jl at time tb only if there is a job Ji such that *i ^ CECCPi (t); for all t, ta ^ t ^ tb, and CECCPi (tb) = \Phi l(tb): From the definition of \Phi l(t), if the execution of Jl has not entered its outermost demand section, \Phi l(t) = 0. *i ^ \Phi l(tb) = 0 cannot be true because *i ? 0 for every job Ji from the definition of the preemption level. Therefore, Jl cannot block Jh. 2 Lemma 5.4 If time t1 is the first initial access instant of a job Ji, then CECCPi (t1) = maxi!k^n \Phi k(t1): Proof: Since Ji has not entered its outermost demand section before t1, Ji is not blocking any other jobs and ss\Lambda i = ssi from Lemma 5.3. Let Jj be an active job with a higher preemption level than Ji, i.e., *j ? *i, at time t1. This means that Jj has a shorter relative deadline than Ji. Since ss\Lambda i = ssi and Ji is executing, ssi is the highest priority in the system at time t1. Jj must have a lower priority than Ji. Jj 's deadline must be later than Ji's. Therefore, Jj arrives after Ji. This means that Jj has not been executed at all at the first initial access instant of Ji. Jj cannot be in its outermost demand section. Since this is true for every Jj that has a higher preemption level than Ji and is active at t1, max1^k!i \Phi k(t1) = 0. Therefore, CECCPi (t1) = max 1^k^n k6=i \Phi k(t1) = maxi!k^n \Phi k(t1): 2 Theorem 5.5 A job can be blocked only before the first initial access instant. 60 Proof: Suppose time t1 is the first initial access instant for job Ji. Since Ji is allowed to proceed at t1, we know *i ? CECCPi (t1): (5.1) From Lemma 5.4 and Eq.(5.1), *i ? maxi!k^n \Phi k(t1): (5.2) Suppose Ji is blocked at time t2 for the first time after t1 by a lower priority job Jl. From Lemma 5.2, Ji is blocked by a lower priority job Jl at time t2 if and only if a job Jj with its augmented priority ss\Lambda j higher than or equal to ss\Lambda i sends an initial access notice just before time t2 and *j ^ CECCPj (t2) = max 1^k^n k6=j \Phi k(t2) = \Phi l(t2): (5.3) Jj's original priority ssj must be greater than or equal to ssi because no job with a lower priority than ssi is executed after t1 and before t2. Since job Jj has an original priority higher than or equal to job Ji at t2, the preemption level of Jj must be higher than or equal to the preemption level of Ji, i.e., *j * *i from Lemma 5.1. From Eq.(5.3), *i ^ \Phi l(t2): (5.4) Suppose the blocking job Jl has a shorter relative deadline than Ji, i.e., *l ? *i. Since Jl has a later deadline than Ji from ssi ? ssl, Jl's arrival is later than Ji's. Therefore, Jl does not have any chance to be executed until Ji completes. Since Jl cannot be in its outermost demand section, Jl cannot block Ji. Therefore, in order for Jl to block Ji, Jl must satisfy *l ^ *i. Since jobs are sorted in descending order of preemption levels, *l ^ *i means l * i. Therefore, max 1^k^n k6=j \Phi k(t2) = \Phi l(t2) = maxi!k^n \Phi k(t2): (5.5) Since every job Jk, i ! k ^ n, has a lower preemption level than Ji, every active job Jk, i ! k ^ n, has a lower original priority than Ji after t1 and before Ji completes from Lemma 5.1. Also, no job with an original priority lower than ssi is executed from t1 to t2 because Ji is blocked for the first time at t2. Therefore, no job Jk, i ! k ^ n, executes between t1 and t2. Hence, \Phi k(t) for every k, i ! k ^ n, does not change between t1 and t2, i.e., maxi!k^n \Phi k(t2) = maxi!k^n \Phi k(t1): (5.6) 61 From Eq.(5.4), (5.5), and (5.6), we have *i ^ maxi!k^n \Phi k(t1): This contradicts Eq.(5.2). 2 Corollary 5.6 Suppose time t1 is the first initial access instant of a job Ji. Then, maxi!k^n \Phi k(t) = maxi!k^n \Phi k(t1); for all t, t1 ^ t ^ tc; where tc is the completion time of Ji. Proof: Since tasks are sorted, every job Jl, i ! l ^ n, has a lower preemption level than Ji. Therefore, from Lemma 5.1, all active job Jls, i ! l ^ n, have lower original priorities than Ji after Ji starts execution and before Ji completes. From Theorem 5.5, Ji is not blocked by lower priority jobs during t1 ^ t ^ tc. Having an original priority less than ssi, Jl, i ! l ^ n, has no chance to execute between t1 and tc. The corollary follows. 2 Corollary 5.7 When job Ji is blocked, Ji's augmented priority ss\Lambda i is equal to its original priority ssi. Proof: From Theorem 5.5, Ji can be blocked only before the first initial access instant. Since Ji is not in its outermost demand section, Ji cannot block any job from Lemma 5.3. Hence, ss\Lambda i = ssi. 2 Theorem 5.8 ECCP prevents deadlocks. Proof: Suppose there is a deadlock. Then, there must be a circular wait condition. Let Ji be the highest priority job in the cycle. From Theorem 5.5, Ji will not be blocked once it enters into a resource demand section. However, since Ji is the highest priority job in the cycle, no blocking means that Ji need not wait in order to enter into additional resource demand sections. In other words, there is no circular wait condition. This is a contradiction. 2 Lemma 5.9 If a job Jl blocks a higher priority job Jh, there is a job Ji such that ssi * ssh and \Phi l(t) * *i when Jh arrives at time t. 62 Proof: Let t1 be the arrival time of Jh and t2 be the first time Jl blocks Jh. Then, from Lemma 5.2, a job Ji with ss\Lambda i * ss\Lambda h sends an initial access notice at ta, ta ! t2, *i ^ CECCPi (t); for all t, ta ^ t ^ t2, and *i ^ CECCPi (t2) = \Phi l(t2): (5.7) Since both Ji and Jh are blocked at t2, ss\Lambda i = ssi * ss\Lambda h = ssh. Since t2 is the first time Jh is blocked, Jl does not execute between t1 and t2. Therefore, \Phi l(t1) = \Phi l(t2): (5.8) From Eq.(5.7) and Eq.(5.8), \Phi l(t1) * *i: The lemma follows. 2 Lemma 5.10 If we assume tiniti (aex) ! tfini (aey), it must be true that \Phi i(t) * min(OEx; OEy) at any time t, tiniti (aex) ! t ^ tfini (aey), in any schedule produced by ECCP. Proof: From the definition of \Phi i(t), \Phi i(t) is at least as high as OEx at time t just after tiniti (aex) and is at least as high as OEy at t = tfini (aey). Since \Phi i(t) has only one peak, the lemma holds. 2 Theorem 5.11 A job is blocked by at most one lower priority job. Proof: Suppose Jh is blocked by lower priority jobs Jl1 and Jl2. Then, from Lemma 5.9, there is a job Ji such that ssi * ssh, \Phi l1(th) * *i, and \Phi l2(th) * *i where th is the time Jh arrives. Both Jl1 and Jl2 are in their outermost demand sections when they block Jh from Lemma 5.3. Without loss of generality, we can assume that Jl1 enters into its outermost demand section before Jl2 does. Let tl1 and tl2 be the times Jl1 and Jl2 enter into their outermost demand sections. While Jl1 is in its outermost demand section, \Phi l1(t) is at least as high as *l1 from Lemma 5.10. Therefore, CECCPl2 (t) is also at least as high as *l1. Since Jl2 enters into its outermost demand section after Jl1 entered its outermost demand section, *l2 ? CECCPl2 (tl2) * *l1. From Lemma 5.4, *l2 ? CECCPl2 (tl2) = maxl2!k^n \Phi k(tl2): (5.9) 63 From Corollary 5.6, maxl2!k^n \Phi k(t) = maxl2!k^n \Phi k(tl2) for tl2 ^ t ^ (completion time of Jl2). Since Jl2 blocks Jh, Jl2 has not completed when Jh arrives. Therefore, maxl2!k^n \Phi k(th) = maxl2!k^n \Phi k(tl2): (5.10) Since l2 ! l1 from *1 ? *2 ? \Delta \Delta \Delta ? *n and *l1 ! *l2, maxl2!k^n \Phi k(th) * \Phi l1(th): (5.11) From Eq.(5.9), Eq.(5.10), and Eq.(5.11) and the assumption \Phi l1(th) * *i, *l2 ? \Phi l1(th) * *i: (5.12) Since Jl2 has started its execution but has not completed at th, from Lemma 5.1, *l2 ? *i means ssl2 ? ssi * ssh. However, Jl2 must have a lower priority than Jh. This is a contradiction. 2 5.2.2 Serializability To prove the serializability property of ECCP, we assume that there is a cycle in the serialization graph of a schedule generated by ECCP. We draw a contradiction from this assumption with the one-peak property of the priority ceiling function. Lemma 5.12 When two jobs Ji and Jj use the same data item aek, we must have either tfini (aek) ! tinitj (aek) or tfinj (aek) ! tiniti (aek) in any schedule produced by ECCP. Proof: Suppose Ji first uses aek and Jj requests aek before the final access instant of Ji on aek. From Lemma 5.10, \Phi i(t) * OEk for tiniti (aek) ! t ^ tfini (aek). Since Jj uses aek, the priority ceiling of aek is at least as high as the preemption level of Jj , i.e., OEk * *j. \Phi i(t) * *j must hold for tiniti (aek) ! t ^ tfini (aek). Since Jj cannot access aek at least until tfini (aek), tfini (aek) ! tinitj (aek) must hold. Similarly, if Jj enters into the resource demand section for aek first, tfinj (aek) ! tiniti (aek) must hold. 2 Theorem 5.13 Any schedule produced by ECCP is serializable. 64 Proof: Suppose the serialization graph of a schedule produced by ECCP contains a cycle Jn ! J1 ! \Delta \Delta \Delta ! Jn\Gamma 1 ! Jn, where n ? 1 and Jn has the lowest preemption level. This means there must exist data items aex and aey such that tfinn (aex) ! tinit1 (aex) and tfinn\Gamma 1(aey) ! tinitn (aey) (5.13) from Lemma 5.12. We first show that, for any job Ji, i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1, either none of its accesses to any of the data items is performed before tinitn (aex) or all of its accesses are performed before tinitn (aex). Suppose otherwise, i.e., the first access to data item aej by Ji is performed before tinitn (aex) and the last access to data item aek by Ji is performed after tinitn (aex), i.e., tiniti (aej) ! tinitn (aex) ! tfini (aek): (5.14) From Lemma 5.10, we know that \Phi i(t) * min(OEj; OEk) for tiniti (aej) ! t ^ tfini (aek). min(OEj; OEk) is greater than or equal to the preemption level of Jn because both aej and aek are used by other jobs in the cycle and Jn has the lowest preemption level among jobs in the cycle. In other words, Jn cannot access aex in the duration between tiniti (aej) and tfini (aek) since *n is not high enough. This is a contradiction to Eq.(5.14). Using the same argument and replacing tinitn (aex) by tinitn (aey), we can show that either no access or all accesses of job Ji for i = 1; 2; \Delta \Delta \Delta ; n \Gamma 1 are performed before tinitn (aey). Since tinitn (aex) must be before tfinn (aex) and Eq.(5.13) holds, tinitn (aex) ! tfinn (aex) ! tinit1 (aex). That is, all data accesses by J1 must be after tinitn (aex). When there is a relation Ji ! Ji+1 in the serialization graph, some data accesses of Ji+1 must come after some data accesses of Ji. If all data accesses of Ji come after tinitn (aex), all data accesses of Ji+1 must be after tinitn (aex) as well. Using induction, we can conclude that all data accesses of Jn\Gamma 1 must be performed after tinitn (aex), i.e. tinitn (aex) ! tinitn\Gamma 1(aey) ! tfinn\Gamma 1(aey). Using Eq.(5.13), tinitn (aex) ! tinitn\Gamma 1(aey) ! tfinn\Gamma 1(aey) ! tinitn (aey) ! tfinn (aey): (5.15) Similarly, from tfinn\Gamma 1(aey) ! tinitn (aey), all data accesses of Jn must be performed before tinitn (aey). Continuing the same argument, we can conclude that all data accesses of J1 must be before tinitn (aey), i.e., tinit1 (aex) ! tfin1 (aex) ! tinitn (aey). From Eq.(5.13), tinitn (aex) ! tfinn (aex) ! tinit1 (aex) ! tfin1 (aex) ! tinitn (aey) ! tfinn (aey): (5.16) 65 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma J3fl J2fl J1fl 0fl 50fl 100fl blocking by J3fl missed deadlinefl 125fl75fl25fl timefl b32fl Figure 5.3: Worst Case Blocking of ECCP However, from Lemma 5.10, \Phi n(t) * min(OEx; OEy) for tinitn (aex) ! t ^ tfinn (aey). If OEx * OEy, \Phi n(t) * OEy for tinitn (aex) ! t ^ tfinn (aey). Since aey is accessed by Jn\Gamma 1, OEy is at least as high as *n\Gamma 1. Therefore, tinitn (aex) ! tinitn\Gamma 1(aey) ! tfinn (aey) in Eq.(5.15) is not possible. If OEx ! OEy, \Phi n(t) * OEx for tinitn (aex) ! t ^ tfinn (aey). Since aex is accessed by J1, OEx is at least as high as *1. tinitn (aex) ! tinit1 (aex) ! tfinn (aey) in Eq.(5.16) is not possible. Eq.(5.15) and Eq.(5.16) cannot be satisfied simultaneously. Therefore, no cycle can exist in the serialization graph. 2 5.3 The Schedulability Condition Lemma 5.1 says that after Jh starts its execution, every active job Jl with a lower preemption level than job Jh must have a lower original priority. However, before Jh starts, there may exist a Jm which has a lower preemption level and a higher original priority than Jh. If such a Jm is blocked, the blocking can also be a blocking for Jh. Figure 5.3 shows an example. There are three jobs J1, J2, and J3. Their relative deadlines are 45, 50, and 140 respectively. J3 arrives at time 0. J3 starts its execution and sends the initial access notice on a data item at time 10 and J3 proceeds. At time 11, J2 arrives. J2 immediately sends the initial access notice on the same data item but it is blocked. J3 inherits the priority of J2 and continues its execution. At time 18, J1 arrives. Although J1 has a higher preemption level, J2 has a higher priority than J1. Therefore, J3 continues execution. Both J1 and J2 are blocked by J3 at this moment. J1 is blocked by J3 although J1 could not be blocked by J3 if there were not the intermediate priority job J2. Let us call the blocking of a higher preemption level job Jh caused by an intermediate preemption level job Jm that has a higher priority than Jh and is blocked 66 by a lower preemption level job Jl an indirect blocking. Other blockings are direct blockings. In the example, the blocking of J1 by J3 is an indirect blocking while the blocking of J2 by J3 is a direct blocking. The indirect blocking of J1 by J3 must be taken into account when the schedulability condition of J1 is considered. Tasks that may indirectly block Th must be included in the blocking task set of Th. Therefore, the blocking task set of task Th contains all tasks that may have a lower priority than Th at the arrival of Th instead of only tasks that access data items with a priority ceiling higher than or equal to the priority of Th as in CCP. Since every task with a priority lower than Th at the arrival of Th has a lower preemption level than Th, the blocking task set of task Th contains all tasks with lower preemption levels. In Figure 5.3, the blocking length of the intermediate preemption level job J2 by J3 affects the schedulability condition of the high preemption level job J1. In other words, the worst case blocking length Bh of a task Th is affected by all possible blockings that may be experienced by tasks with preemption levels lower than Th. Therefore, Bh is the longest effective blocking unit of all tasks with preemption levels lower than Th against any other task. With Bi known, we can use Eq.(2.7): iX j=1 ei pi + Bi pi ^ 1; for all i, 1 ^ i ^ n: as the schedulability condition for ECCP. Since DPCP and SRP do not maintain serializability, we need to use them with a concurrency control algorithm such as 2PL in order to maintain serializability. On the other hand, ECCP can maintain serializability in addition to predictable temporal behavior. Since the worst case blocking length is the longest of the blocking lengths for which a low preemption level task may block other tasks, ECCP does not have any advantage over the combination of 2PL and DPCP or the combination of 2PL and SRP in terms of the schedulability condition. However, ECCP provides shorter direct blocking lengths. In Figure 5.3, ECCP makes fi32 shorter. Consequently, ECCP reduces the chance for a task to encounter the worst case blocking length. 67 Chapter 6 Other Consistency Requirements As stated in Chapter 1, three types of consistencies may be required on real-time database systems in addition to meeting timing constraints. They are logical consistency, external consistency, and temporal consistency. We have already discussed logical consistency in detail in earlier chapters. We consider the impact of the other two consistencies on the timing design of a real-time database system in this chapter. A typical real-time database system is a monitoring and controlling system. Tasks in a monitoring and controlling system read sensor readings, perform computations using the sensor data, store data in the database, and send control signals to the controllers. The sensor readings are read periodically so that the information used in the system is up-to-date. The control signals are sent both periodically and aperiodically in order to control the external devices properly. In addition to storing the current values of data, a real-time database may record the histories of data values. For example, temperature may be monitored every five minutes and the values for the past 24 hours may be kept in a database. There are constant flows of information from sensors to controllers or to data histories in a real-time database system as shown in Figure 6.1. We call the flows information flows. Some tasks in the information flows take readings from the sensors, perform computations, and record data in the database. Other tasks may read data from the database, perform computations, and send out control signals. We call the tasks that implement a part of the information flow flow tasks. There are a few ways to implement information flows. An information flow may be implemented by one periodic task with several other tasks with precedence constraints. In this implementation, the periodic task reads sensor readouts and the information is passed to the next task along the precedence graph. A task becomes ready when all the preceding tasks complete. Information passing is usually synchronous in this type of implementation. Another way of implementation uses only periodic tasks and no precedence constraints. In this type of implementation, each task executes independently with its own period. A task uses any available data produced by other tasks. Information passing is asynchronous in this case. The 68 Information Flow Sensor Sensor Controller Real-Time System Real-Time Database Display Operator Physical World Real-Time Data Non-Real-Time Data Figure 6.1: Information Flow former implementation has more constraints than the latter. Therefore, scheduling algorithms for the former implementation are more complicated and deciding whether a task set is feasible is difficult. For this reason, we assume the latter implementation. Information flows are a part of a data dependency graph as shown in Figure 6.1. A node in a data dependency graph expresses a data object. If there is an edge from data 1 to data 2, data 2 receives some information from data 1. An information flow is implemented by several tasks passing information among them. If a data value is passed between two tasks, there usually is a time lag between the time one task writes a value on a data object and the time another task reads the value. In order to produce timely control signals, it is ideal for one task to handle all processing from the sensor input to the control signal output. However, this is not always possible. Some data values required to produce the control signals are not available directly from the sensors and are computed from values of other data objects. Such data values need to be stored in the database for other tasks to use. Also, the period of reading the sensor data may not be the same as the period of sending out the control signals. Therefore, passing information between tasks is unavoidable. In order to produce timely control signals, the flow tasks are executed periodically and are assigned deadlines. We assume the flow tasks have deadlines at the end of the periods. In addition to the flow tasks, there are real-time tasks that are executed aperiodically. On the 69 other hand, there exist non-real-time tasks in the system that have no strict timing constraints. The non-real-time tasks are executed either aperiodically or periodically with low priorities. 6.1 External Consistency A monitoring and controlling system contains many smaller (or lower level) monitoring and controlling sub-systems. For example, to maintain a certain altitude, an airplane uses a feedback control system for the elevators of the airplane. The feedback control uses an altitude value from a higher level control system as the reference value and an output value from the altimeter as the controlled variable. Such low level systems like the feedback control are required to control physical devices with small error and a short response time, and are likely to be implemented by hardware and/or firmware. The low level control systems do not likely involve database accesses in them. However, high level control systems need to maintain data object in the system to reflect the values of the corresponding data objects in the physical world since the values in the physical world constantly evolve and the control signals are produced using the stored values. We call the data objects existing in the physical world external data objects and the data objects stored in the database internal data objects. The value of an external data object is the actual value of the data in the physical world at that instant. We can consider that every internal data object has its corresponding external data object. The external data object has the true value of the internal data object. Since the update of an internal data object takes place some time after the corresponding external data object is read, the internal data object usually has an old value of the data. However, if the change of the value in the external data object is small enough, we may be able to ignore the discrepancy and use the value of the corresponding internal data object without updating it. The threshold values that define the acceptable value difference are usually derived from the system design specification. For instance, suppose we want to control room temperature. The sensor may be able to measure temperature with 0:01ffiF accuracy. If the change in the temperature is less than 0:01ffiF, we cannot detect any change. If we want to control a heater so that the room temperature is within 2ffiF above or below a set value, we may want to design the system so that at least every 2ffiF change can be detected and a control signal is sent to the heater. 70 6.1.1 System Thresholds Since an external data object, and consequently an internal data object, can take a value from a continuous variable, it is reasonable to assume that there is a threshold value such that we can ignore the change in the external and internal data values completely within the threshold. We call such a threshold value accuracy threshold `A of the data object. If two values of a data object are within the range of the accuracy threshold of the data object, we can regard the values as exactly the same. The accuracy threshold may be defined by the sensitivity of the sensor measuring the external data value. Some external data values need to be controlled by control devices. Such external data objects have threshold values. If the value change in an external data object is beyond its threshold value, the control device must receive another control signal. We call the threshold value tolerance threshold `T . A tolerance threshold corresponds to a significant change in an external data value that requires an action. For a data history, the evolution of the value more than the tolerance threshold means the data need to be recorded. If the value change is within the tolerance threshold, responding to the change may not be necessary. Some internal data object may need to be updated if the value of its corresponding external data object evolves more than a certain value in order to indirectly satisfy the tolerance thresholds of other related internal data objects. We call the value effect threshold `E. In the above example, if the temperature is used to compute another control signal and 0:1ffiF change of the temperature is enough to cause the external data value controlled by the control signal to change more than the tolerance threshold, then 0:1ffiF is an effect threshold. Suppose there are data I1 and I2 in the database. A task reads I1 and updates I2. Let E1 and E2 be the corresponding external data objects of I1 and I2. If a change in E1 more than \Delta should cause change in E2 more than the minimum of its tolerance and effect threshold, \Delta is the effect threshold of E1. If the change in E1 after the last update of I1 is less than its effect threshold, the value of I1 without update may be used to compute the next I2. Since the designers of a real-time system know the environment that the system should handle, it may not be too difficult to find the maximum evolution rate for each external data object. We call the rate maximum evolution rate v. Then, the value will not show any change at least for the duration of `A=v (Figure 6.2). The value does not have a significant effect on 71 timefl val uefl qTfl qEfl qAfl ___flvflqTfl___flvflqEfl___flvflqAfl vfl Figure 6.2: Thresholds other values at least for `E=v. We may be able to ignore the change of the value for at least `T =v. For an internal data object to be externally consistent, the value of an internal data object should only exist for the duration of less than the minimum of `E=v and `T =v after its source sensor readings are taken. Note that reading a sensor readout with a period less than `A=v is a waste of resource. 6.1.2 Maintaining External Consistency In order to maintain external consistency without consuming too much resource, there should be as few data passing delay among tasks as possible in an information flow from sensors to controllers or to data histories. Figure 6.3 shows an example. Suppose a control signal is generated through three tasks and data are passed from one task to another. Task 1 reads the value from the sensor at time t0 and writes data 1. Task 2 reads the value from data 1 and writes data 2 at time t1. Task 3 reads the value from data 2 and sends the control signal at time t2. The control signal is kept by the controller until the next control signal is received at time t3. Therefore, for the control signal to be externally consistent, the evolution of the external data value must be ignorable for the duration between t0 and t3. Since each task executes once in its period, the maximum distance between two consecutive task executions is twice as long as its period [HL92]. Let p1, p2, and p3 be the periods of tasks 1, 2, and 3. Then, the maximum of t1 \Gamma t0 is 2p2. Similarly, the maximum of t2 \Gamma t1 is 2p3. However, the value 72 Controllerfl Task 3fl Data 1fl Data 2fl Task 1fl Task 2fl Sensorfl (a) Task Setfl t0fl t1fl t2fl Task 1fl Task 2fl timefl 3p3fl valid intervalfl (b) Information Propagation fl t3fl Task 3fl Data 1fl Data 2fl 2p2fl next updatefl ExternalflDatafl ControlflSignalfl p1fl Figure 6.3: External Consistency of an Information Flow written by task 3 must be valid until t3. The maximum distance between t1 and t3 is defined by three consecutive executions of task 3. Since the maximum distance between every other task execution is three times as long as the period, the maximum of t3 \Gamma t1 is 3p3. Therefore, the maximum value of t3 \Gamma t0 is 2p2 + 3p3. If the tolerance threshold of the control signal is `T , 2p2 + 3p3 ^ `T =v must be satisfied in order for the control signal to be externally consistent . On the other hand, if we can implement the information flow from the sensor read to the control signal output by only two tasks, tasks 1 and 2, the condition is 3p2 ^ `T =v: 73 Since `T =v is fixed from the specification, accommodating more data passing among tasks between a sensor input and a control signal output in an information flow means shortening the periods of tasks, and consequently, consuming more resources. We can define the condition to maintain external consistency as follows. Suppose an internal data object or a control signal has an effect threshold `E and a tolerance threshold `T . Also, suppose the internal data object or the control signal receives information from a sensor through tasks T1, T2, \Delta \Delta \Delta , Tn. There may be several possibilities of the task set T1, T2, \Delta \Delta \Delta , Tn because there are several routes to reach the internal data object or the control signal from a sensor in the data dependency graph. Then, the internal data object or the control signal updated by Tn is externally consistent if max(2(p2 + p3 + \Delta \Delta \Delta + pn\Gamma 1) + 3pn) ^ min(`E; `T )=v; where pi is the period of Ti and the maximum is taken over all the task sets T1, T2, \Delta \Delta \Delta , Tn. When periods are assigned to tasks, this condition should be considered. 6.2 Temporal Consistency If values of two data objects simultaneously exist in the physical world, we say that the value of the two data objects are temporally consistent. If an internal data object is externally consistent, we can consider that the corresponding external data object has the same value as the internal data object. Since all externally consistent internal data objects have the same values as the corresponding external data objects at that moment, we can consider that any two externally consistent internal data objects are temporally consistent with each other. We call the set of internal data objects that maintain external consistency, and consequently temporal consistency, a temporally consistent data set. We use the temporally consistent data set to maintain temporal consistency. Some internal data objects need not be externally consistent as long as tasks updating the data objects use temporally consistent data. Data histories are such examples. Data histories are not required to be up-to-date. However, they are required to correctly reflect the state of the physical world. For the data to be meaningful, the data must be derived from a temporally consistent data set. 74 All values of data objects in a temporally consistent data set are temporally consistent at one instant. However, a task may need to read all necessary data values one by one. In that case, the values read may not be temporally consistent because time elapses between read operations. Sometimes, a task may update more than one externally consistent internal data objects. Other tasks may read some data objects already updated by the task and others yet to be updated. We call this problem transitional update (vs. atomic update). For some applications, a visible transitional update is a serious problem. We need a mechanism to make all intermediate update states unobservable to other tasks. 6.2.1 Using Time Stamps One way to guarantee reading temporally consistent data values is to use time stamps. Whenever a value is written into a data item1, a new version of the data item is created and a time stamp is assigned to the new version. If the system needs to avoid the transitional update problem, all data items updated by one task must have the same time stamp. In addition, all data items updated by one task need to become visible to other tasks (or to be validated ) at the same time. If the updated data items are validated one by one, a task may read both an already updated data item and a data item yet to be updated. Since implementing simultaneous validation is unrealistic, we must use some sort of mutually exclusive executions while all new versions are validated in order to validate all updated data items virtually simultaneously. Let us call the duration of that mutually exclusive execution a validation period. Suppose a job J1 updating a data item ae1 is preempted after the time stamp is assigned and before the validation period ends. While J1 is preempted, the other data item ae2 is updated and validated. The other job J2 reads both ae1 and ae2 and completes. Now J1 resumes its execution and validates ae1. The value of ae1 read by J2 is an older version and its new version has an older time stamp than ae2 read by J2. That is, J2 read temporally inconsistent data. In order to avoid this temporal consistency violation, the task execution must be non-preemptive during the validation period. The value of the time stamp can be any time during the validation period because no other tasks can execute during 1Assuming a time stamp is assigned to a set of data objects, we use "data item" to describe the set. 75 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma shrinking phaseflgrowing phasefl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma All data items to befl read are locked.fl data items to be readflTaskfl critical section fl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma data items to be updated onlyfl Figure 6.4: 2PLfRg that time. Before the validation period, the other tasks cannot see the new versions while they can read old ones. When a task reads its first data item, the task can read the most recent version of the data item. If the data item is in its validation period, the task must wait. The task may be preempted after reading the first data item. Data items that the task needs to read later may be updated while the task is preempted. However, this update is not a problem since there are multiple versions. The task can read appropriate versions of the data items so that the read values are temporally consistent with the first data item. 6.2.2 Using Locks Alternatively, we can use the locking mechanism to avoid reading older data values. In order to read temporally consistent data values, there must be a time when all locks of data items to be read are held in the execution of a task as shown in Figure 6.4. In order to have the time to hold the locks of all data items to be read, such locks must be acquired and released in two phases: a growing phase and a shrinking phase, as in 2PL. Let us use 2PLfRg to specify this protocol. The difference between 2PL and 2PLfRg is that 2PL requires all data items to be locked in two phases while 2PLfRg is only concerned with locks on the data items to be read. 2PLfRg is less restrictive than 2PL. Using 2PLfRg, a task effectively reads values at the time when all data items to be read are locked. Therefore, the read data values are temporally consistent. 76 If the system needs to avoid the transitional update problem, there must be a time all locks of data items to be updated must be held. We use 2PLfWg to denote this protocol. 2PLfWg also has two phases like 2PL and the difference is that 2PLfWg is only concerned with locks on the data items to be updated. Theorem 6.1 If all tasks confirm 2PLfRg and 2PLfW g, a task Ti cannot read both data items already updated by another task Tj and some yet to be updated by Tj. Proof: Let ae1 and ae2 be a data item already updated and a data item yet to be updated by Tj, respectively. Suppose Ti reads both data items. Since there is a time that all locks on data items to be read by Ti are held by Ti, there is a time t1 when Ti holds the locks for both ae1 and ae2. Then, Tj must hold the lock for ae1 before t1 and the lock for ae2 after t1. Since Ti holds locks for both ae1 and ae2 at t1, Tj cannot hold the locks at t1. Therefore, Tj cannot hold locks for both ae1 and ae2 simultaneously. This is a contradiction. 2 Since any task Ti cannot read both a data item already updated by another task Tj and a data item yet to be updated by Tj, the effect of an update by Tj using 2PLfWg is identical with performing all updates when all data items to be updated are held by Tj. The combined protocol of 2PLfRg and 2PLfWg is still weaker than the restriction of 2PL. With the combined protocol, a task can be in the growing phase in 2PLfWg while it is in the shrinking phase in 2PLfRg as shown in Figure 6.5(a). 2PL does not allow this state (Figure 6.5(b)). However, if a task both reads and updates any of the data items, the effect of the combined protocol is the same as 2PL. In typical database systems, most of the data items are read before they are updated but it may not be the case in a real-time database system. As shown in Figure 6.1, tasks form information flows. Tasks in an information flow may update data items without reading the data items because the values to update the data items are taken from the physical world and may not be relative values to the old values of the data items. Therefore, the combined protocol is effective for real-time databases. Also, if the system is not required to avoid the visible transitional update problem, 2PLfRg provides less restriction than 2PL. 77 \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma shrinking phase in 2PL{R}fl and growing phase in 2PL{W}fl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Taskfl data items to be read onlyfldata items to be updated onlyfl critical section fl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma Taskfl data items to be read onlyfldata items to be updated onlyfl \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma \Gamma growing phasefl shrinking phasefl (a) 2PL{R}+2PL{W}fl (b) 2PLfl Figure 6.5: Difference between 2PL and 2PLfRg+2PLfWg 6.2.3 Using Priority Ceiling Functions If we use the simple locking mechanism to maintain temporal consistency, we must use either 2PLfRg or the combination of 2PLfRg and 2PLfWg as shown above. Since uncontrolled locking causes deadlocks, we must use additional mechanisms to avoid the deadlocks when we use locks to maintain temporal consistency in real-time database systems. Since PCP avoids the deadlocks, we can use PCP in addition to 2PLfRg or the combination of 2PLfRg and 2PLfWg. If a data item with a high priority ceiling is used at the beginning of the execution of a task and the data item is not used later, the task may unnecessarily hold the lock for the rest of the growing phase using 2PLfRg or the combination of 2PLfRg and 2PLfWg. Using the priority ceiling functions instead of locks alleviates this problem. The application of the priority ceiling functions to maintain temporal consistency is outlined as follows. For each task Ti, we construct a read maximum remainder priority \Omega Ri (t) and a write maximum remainder priority \Omega Wi (t). \Omega Ri (t) is the maximum priority ceiling of data items to be read by Ti after time t. \Omega Wi (t) is the maximum priority ceiling of data items to be updated by Ti after time t. From \Omega Ri (t) and \Omega Wi (t), we define two priority ceiling functions: a read priority ceiling function \Phi Ri (t) and a write priority ceiling function \Phi Wi (t) in a similar way as we defined \Phi i(t) 78 in CCP. Both the read priority ceiling function and the write priority ceiling function have only one peak. In short, we handle read accesses and update accesses separately. An overall priority ceiling function \Phi i(t) is defined by taking the maximum of \Phi Ri (t) and \Phi Wi (t). Scheduling and concurrency control are handled in the same way as in CCP using the overall priority ceiling function \Phi i(t). The difference of this algorithm from CCP is that now the priority ceiling function may have two peaks. The set of data values read by a task is effectively the data values existing in the database when the read priority ceiling function has the highest value. Similarly, the set of data items updated by a task is effectively updated simultaneously when the write priority ceiling function has the highest value. 6.2.4 Example Suppose two data items ae1 and ae2 are updated by T1 and T2 with periods 6 and 7 respectively as in Figure 6.6. T1 and T2 have priorities ss1 and ss2 respectively and ss1 ? ss2. T3 reads ae1 and ae2 and writes ae3 and has the lowest priority. ae1 has a priority ceiling OE1 = ss1 and ae2 has a priority ceiling OE2 = ss2. T3 reads ae1 at time 5 and ae2 at time 15 if we only use the simple locking mechanism as shown in Figure 6.6(a). However, ae1 read by T3 is written at time 0 by T1. ae2 read by T3 is written at time 8 by T2. Therefore, the data values read by T3 are temporally inconsistent. Using only PCP is no different from using the simple locking mechanism in terms of maintaining temporal consistency. By using 2PLfRg (and PCP in order to have a predictable temporal behavior) or the algorithm based on the priority ceiling function explained above, we can avoid reading temporally inconsistent data as in Figures 6.6(b) and (c). In Figure 6.6(b), T3 effectively reads ae1 and ae2 at time 11. Although 2PLfRg can avoid temporal inconsistency, the second job of T1 misses its deadline. In Figure 6.6(c), T3 effectively reads ae1 and ae2 at time 5. The algorithm based on the priority ceiling function can avoid missing the deadlines while satisfying the temporal consistency. 6.3 Summary In a real-time environment, we not only need to satisfy the deadline requirements of jobs, but also must have computations that are based on temporally correct data values. In this chapter, we have discussed two new consistency constraints--external consistency and temporal 79 consistency--that define temporal correctness. The external consistency requirements impose constraints on task periods. If temporal consistency is required, the three strategies--time stamp based strategy, lock based 2PLfRg or 2PLfRg+2PLfWg, and the strategy using the priority ceiling function--explained in this chapter are the candidates for the concurrency control algorithm of the system. 1234567890123456789012312345678901234567890123 1234567890123456789012312345678901234567890123 12345678901234567890123 123123 123123 123123 123123 123123 123 123123 123 123123 123 1234512345 1234512345 12345 12341234 12341234 1234 1234512345 1234512345 12345T1 T2 T3 1234512345 1234512345 12345 123123 123 1234567812345678 1234567812345678 1234567812345678 1234567812345678 1234512345 12345 123123 123456789123456789 123456789123456789 r1 r2 r3 0 2 4 6 8 10 14 12 time 16 18 123123 123123 1234512345 1234512345 1234512345 1234512345T1 T2 T3 123123 123 1234567812345678 1234567812345678 1234567812345678 1234567812345678 0 2 4 6 8 10 14 12 time 16 18 123123 123 1234512345 12345 123123 123 1234512345 1234512345 12341234 12341234 1234512345 1234512345T1 T2 T3 0 2 4 6 8 10 14 12 time 16 18 p1 p3 p2 (a) Locking (PCP) (b) 2PL{R} (+PCP) (c) Algorithm Based on the Priority Ceiling Function missed deadline p2 Figure 6.6: Maintaining Temporal Consistency 80 Chapter 7 The Period Assignment Problem External consistency defines a range of periods on how often a task should be executed. Moreover, system design specifications may assign a required range of periods for a task. From the ranges of possible periods, the next problem is how to assign a specific period to each task. In this chapter, we study strategies for period assignment when the RM algorithm is used for scheduling, given a range of periods for each task. In this chapter, we ignore the effect of blocking to simplify the discussion. This is acceptable, since blocking becomes a single term in the schedulability condition as shown in Chapter 3 and does not make a significant difference in the discussion of period assignment. In this way, we can concentrate on timing constraints and external consistency and ignore logical consistency and temporal consistency in the whole view of real-time database system requirements. 7.1 System Model As we have discussed in Chapter 6, a database stores information about the physical world. When the value of an external data object is recorded in a database, we have an internal data object. An internal data object is a raw data object if it has a value copied directly from the physical world without any manipulation. Another type of internal data object that has a value computed from other internal data objects is called a derived data object. An example of a derived data object is the averaged balance of an account. The averaged balance is derived from all previous balances. A raw data object should be updated whenever its corresponding external data object changes its value. However, since the values of external data objects evolve continuously and raw data objects are updated at discrete intervals, there will be inconsistency between an external data object and its raw data object from time to time. It is up to the database to correct the inconsistency. In our model, a real-time database system is connected to a set of physical devices. Figure 7.1 shows the relationship between a database and its surrounding physical world. Each physical 81 ExternalflDevicefl r: Raw Data Objectfl d: Derived Data Objectfl Physical Worldfl rfl rfl rfl dfl dfl Databasefl Figure 7.1: Real-Time Database System Model device produces the value of an external data object periodically and sends it to the database. Each device has a corresponding raw data object in the database. Since a database must have a correct picture of the physical world, the raw data objects must be kept up-to-date with the external data objects and the derived data objects must be kept consistent with the raw data objects. The concept of maintaining consistency between internal and external data objects is the external consistency discussed in the previous chapter. In order to maintain external consistency, periodic updates on raw data objects are conducted. All periodic tasks are defined with hard deadlines, which means we must guarantee that the tasks are completed by the deadlines. The deadline of a task in each period is assumed to be the end of that period. In this chapter, we study only single processor systems and use the RM algorithm for scheduling. Derived data objects are computed from raw data objects. We should therefore maintain good external consistency on raw data objects before improving external consistency on derived data objects. However, even among tasks updating raw data objects, some may be more important than others. For example, the readings of the temperatures of airplane engines are usually more important than the readings of the cabin temperature. In our model, we assign different weights to different tasks. A larger weight means the task is more important, and should be executed with a higher priority. In this way, we can maintain a better external consistency for more important internal data objects at the expense of the consistency for less important internal data objects. 82 SystemflProcessorfl ExternalflInterfacefl Processorfl Databasefl ExternalflDevicesfl Figure 7.2: Possible System Architecture 7.2 Period Assignment with External Tasks Only Using the above model, we study how we can assign a period to each periodic task in real-time database systems. We call tasks updating raw data objects external tasks and tasks updating derived data objects internal tasks. In this section, we assume systems have only external tasks. The model is extended to include both internal and external tasks in the next section. Many real-time database systems have two types of processors as shown in Figure 7.2. One type of processor, the external interface processor, monitors and controls external devices, and enters the values of raw data objects into the database. For this type of processor, there are only external tasks to be executed. Hereafter in this section, by a "task" we mean an external task unless specified otherwise. Task Ti is responsible for recording external data object O/i. For our study, we assume O/i has a new value in every fixed oi. oi is called the external period for O/i and is related to the maximum evolution rate of the data object. In our model, oi is defined by the data object itself, not by the physical device or the instrument used to sense the value. We assume the execution time ei of Ti is fixed and known. Given ois and eis, we want to decide pi, the period of task Ti. pi usually should be greater than oi. For example, a system may record the room temperature once in thirty minutes, even though the temperature may have a noticeable difference every five minutes. Making pi smaller than oi is meaningless since the system will record duplicate values in different periods. For a data object which is very critical to the system's correctness, pi should be assigned to be as close to oi as possible. We therefore can use ri = oi=pi as a normalized measure of external consistency, such that 0 ^ ri ^ 1. When ri = 1, it means 83 the raw data object always has the same value as the external data object. We define the importance of data object O/i, thus task Ti, by a weight wi. A larger wi means the data object requires a better external consistency. Two objective functions are defined in this research. The first is to maximize the weighted sum of external consistency, E = X i wiri: The second objective function is to provide a uniform level of external consistency to all tasks. Since tasks have different weights, external consistency is assigned proportionally to their weights, or D = riw i ; for all i: Again, we want to maximize D for a set of tasks to maximize processor utilization. 7.2.1 Maximizing the Weighted Sum of External Consistencies In our algorithm, the tasks are first sorted and reindexed in the order of w1o1=e1 ? w2o2=e2 ? \Delta \Delta \Delta ? wnon=en. We assume that each task Ti has a maximum acceptable period ai for the system to work correctly. We thus have a minimum degree of external consistency bi = oi=ai for Ti. Our first objective is to maximize E = Pi wiri with a condition bi ^ ri ^ 1. We first use the schedulability condition Eq.(2.1) to find the periods for the tasks with different bi and oi values. The problem is one of the continuous knapsack problems [LG75] and is studied in Section 7.2.1.1. To use the LSD condition in Eq.(2.3) for the general case is impractical since the size of the set Si can be very large. We thus define a special case in Section 7.2.1.2 in which all bis are the same and all ois are the same. We show how to use the LSD condition to find the intervals for the special case. 7.2.1.1 The General Case We first assign all pis to their maximum values: pi = ai = oib i : If this assignment satisfies the LL condition in Eq.(2.1), i.e., U = nX i=1 ei pi = nX i=1 eibi oi ^ n(2 1=n \Gamma 1); 84 the task set is feasible. From this initial assignment, we can increase some ri as long as U is smaller than the LL bound. Suppose we increase ri by \Delta ri. U will be increased by \Delta Ui j eio i \Delta ri: Also E will be increased by \Delta Ei j wi\Delta ri: Hence, \Delta Ei and \Delta Ui are related by \Delta Ei = wioie i \Delta Ui: (7.1) We want to choose i such that \Delta ri maximizes \Delta Ei but brings a minimum \Delta Ui. We therefore should choose i with the maximum wioi=ei. Since tasks are sorted by wioi=ei, we can always increase r1 to its maximum value 1; and then r2, r3, and so on. Eventually we get the task periods as follows: pi = 8????! ????: oi; for 1 ^ i ^ m \Gamma 1; com; for i = m; fioi; for m + 1 ^ i ^ n; (7.2) where fi = 1=bi = ai=oi and 1 ! c ^ fm. The problem is to find m and c. It is clear that m is the maximum value which satisfies m\Gamma 1X i=1 ei oi + nX i=m ei fioi ^ n(2 1=n \Gamma 1); and c is the minimum value which satisfies U = m\Gamma 1X k=1 ek ok + em com + nX k=m+1 ek fkok ^ n(2 1=n \Gamma 1): 7.2.1.2 A Restricted Case Let us now assume all external data objects monitored by an external interface processor are the same type and have similar external intervals. For example, a special processor is used to monitor the four brakes of an ABS system. In this case, tasks have the same minimum consistency requirement bi = 1=F for all i, 1 ^ i ^ n, and also oi = O for all i, 1 ^ i ^ n. We also assume F is an integer. Suppose we use the LL condition as in Section 7.2.1.1. pi will be assigned as follows: pi = 8?!?: O; for 1 ^ i ^ m \Gamma 1; F O; for m ^ i ^ n: (7.3) 85 If we use the LSD condition with this pi assignment, the schedulability condition of the task set can be derived as follows: Li = 8?????! ?????: mint2S i 1 t , t O ss iX k=1 ek! ^ 1; for 1 ^ i ^ m \Gamma 1; mint2S i 1 t , t O ss m\Gamma 1X k=1 ek + 1t , tF O ss iX k=m ek! ^ 1; for m ^ i ^ n; where Si = 8?!?: fOg; for 1 ^ i ^ m \Gamma 1; fO; 2O; \Delta \Delta \Delta ; F Og; for m ^ i ^ n: Hence, Li = 8?????! ?????: 1 O iX k=1 ek ^ 1; for 1 ^ i ^ m \Gamma 1; 1 O m\Gamma 1X k=1 ek + 1F O iX k=m ek ^ 1; for m ^ i ^ n: The feasibility condition of the task set is L = 1O m\Gamma 1X k=1 ek + 1F O nX k=m ek ^ 1: (7.4) Note L = U in Eq.(7.4). Regarding the derivation of L, L = U holds as long as pis are expressed as Eq.(7.3). Since the LSD condition is a superset of the LL condition, we can increase E from the original pi assignment. As long as pis are expressed as Eq.(7.3), we can follow the same argument as in Section 7.2.1.1 and get Eq.(7.1). Therefore, we can increase the weighted sum of external consistency, E, by increasing m one by one. The maximum E under the LSD condition and Eq.(7.3) can be found by the maximum m which satisfies Eq.(7.4). After we have decided on m, we can further increase the processor utilization by decreasing pm as much as possible. Let pi = 8????! ????: O; for 1 ^ i ^ m \Gamma 1; cO; for i = m; F O; for m + 1 ^ i ^ n; 86 be the assigned periods when the processor utilization has the maximum value and still satisfies the LSD condition. From Eq.(2.3), we have Li = 8???????? ??!??? ???????: mint2S i 1 t , t O ss iX k=1 ek! ^ 1; for 1 ^ i ^ m \Gamma 1; mint2S i 1 t , t O ss m\Gamma 1X k=1 ek + 1t , tcO ss em! ^ 1; for i = m; mint2S i 0@ 1 t , t O ss m\Gamma 1X k=1 ek + 1t , tcO ss em + 1t , tF O ss iX k=m+1 ek1A ^ 1; for m + 1 ^ i ^ n; (7.5) where Si = 8????! ????: fOg; for 1 ^ i ^ m \Gamma 1; fO; 2O; \Delta \Delta \Delta ; bccO; cOg for i = m; fO; 2O; \Delta \Delta \Delta ; F O; cO; 2cO; \Delta \Delta \Delta ; bF=cc cOg; for m + 1 ^ i ^ n: Hence, Li = 8???????? ????????! ??????????? ?????: 1 O iX k=1 ek ^ 1; for 1 ^ i ^ m \Gamma 1; min ( 1O m\Gamma 1X k=1 ek + embccO ; dcecO m\Gamma 1X k=1 ek + emcO ) ^ 1; for i = m; min 8!: 1O m\Gamma 1X k=1 ek + emcO + 1bF=cc cO iX k=m+1 ek; 1 O m\Gamma 1X k=1 ek + , Fc ss emF O + 1F O iX k=m+1 ek9=; ^ 1; for m + 1 ^ i ^ n: (7.6) The following two relations hold with Li: Li ! Lm; for 1 ^ i ^ m \Gamma 1; and Li ! Lk; for 1 ^ i ^ m \Gamma 1 and m + 1 ^ k ^ n: However, Lm ! Lk; for m + 1 ^ k ^ n; 87 does not necessarily hold. Hence, the task set is schedulable if8???? ?!???? ?: min ( 1O m\Gamma 1X k=1 ek + embccO ; dcecO m\Gamma 1X k=1 ek + emcO ) ^ 1; min 8!: 1O m\Gamma 1X k=1 ek + emcO + 1bF=cccO nX k=m+1 ek; 1O m\Gamma 1X k=1 ek + , Fc ss emF O + 1F O nX k=m+1 ek9=; ^ 1 (7.7) is satisfied. We must use an iterative method to derive the exact minimum c from this condition. However, the processor utilization may be good enough at this stage without deriving the exact minimum c and deriving it may not be worth the effort. If that is the case, we can analyze Eq.(7.7) further and derive a loose but easy to compute condition of c. Let us substitute some of the terms in Eq.(7.7) as follows: A0 = 1O m\Gamma 1X k=1 ek; A1 = emO ; and A2 = 1F O nX k=m+1 ek: Eq.(7.7) can be rewritten as8??! ??: min aeA0 + A1bcc , dcec A0 + A1c oe ^ 1; min aeA0 + A1c + F A2bF=ccc, A0 + , Fc ss A1F + A2oe ^ 1: (7.8) If 8??! ??: A0 + A1bcc ^ 1; A0 + A1c + F A2bF=ccc ^ 1; (7.9) is satisfied, Eq.(7.8) is also satisfied. From Eq.(7.9), we can derive the conditions for c,8??! ??: c * , A11 \Gamma A 0 ss ; c * Fb(1 \Gamma A 0 \Gamma A2)F=A1c : We use the minimum c which satisfies the above condition. 7.2.2 Maximizing the Uniform External Consistency In some applications, it is desirable to let all raw data objects have a comparable degree of external consistency. If weights are assigned in such a way that the consistency requirement between an external data object and a raw data object is proportional to the weight, then our objective is to satisfy r1 w1 = r2 w2 = \Delta \Delta \Delta = rn wn ; 88 while achieving maximum processor utilization. If we use the LL condition, this problem can be regarded as a knapsack sharing problem [Bro79]. We first sort and reindex Tis so that w1 * w2 * \Delta \Delta \Delta * wn: Assuming that r1 w1 = r2 w2 = \Delta \Delta \Delta = rn wn = D; (7.10) we thus assign pi = oiDw i ; for 1 ^ i ^ n: Since wi may be different for different tasks and the size of Si may be very large, it is not practical to use the LSD condition. With the LL condition, the feasibility condition is U = nX i=1 ei pi = D nX i=1 eiwi oi ^ n(2 1=n \Gamma 1): A larger D will always result in a larger U . With the derived maximum D, there may be an m, 1 ^ m ^ n, such that ri = oip i = Dwi * 1; for i ^ m: Since assigning pi to be smaller than oi is useless, we should assign pi = oi for such tasks. With this adjustment, our next objective is to find a new D satisfying ri wi = D; for m ! i ^ n: The utilization then becomes U = mX i=1 ei oi + nX i=m+1 eiwi oi D ^ n(2 1=n \Gamma 1): The condition of D is D ^ n(21=n \Gamma 1) \Gamma mX i=1 ei oi nX i=m+1 eiwi oi : (7.11) Again, with the new maximum D, we may need to adjust ri so that ri ^ 1 by increasing m. We repeat the adjustments until no more change in D and m occurs. With the algorithm, we can find the D and m which satisfy D = n(21=n \Gamma 1) \Gamma mX i=1 ei oi nX i=m+1 eiwi oi ; 89 and Dwj ! 1; for m + 1 ^ j ^ n: With the D and m, we can assign the periods as pi = 8?!?: o i; for 1 ^ i ^ m; oi Dwi ; for m + 1 ^ i ^ n: 7.3 Period Assignment with Both External and Internal Tasks In this section, we consider systems with both external and internal tasks. We assume that a derived data object is derived from a set of raw data objects only. The case in which a derived data object depends on both raw and derived data objects is left for future research. In our study, the degree of external consistency is defined to be the ratio between the external period and the assigned period. For a derived data object aei, there is no corresponding external period. We decide to use the derived external period, oi = minT j2Xi oj ; where Xi is the set of external tasks that the derived data object aei depends on. In this way, we can use the same ri = oi=pi to measure the consistency level of a period assignment. Note that there is no point in letting an internal task Ti have a shorter pi than any of the assigned pj of Tj in Xi, i.e., we should have pi * minT j2Xi pj : 7.3.1 Maximizing the Weighted Sum of External Consistencies For each internal task Ti, we can define the bound function s(i), such that if Tx 2 Xi and ox is the minimum among oj, Tj 2 Xi, then s(i) = x and Tx is an external task, i.e., os(i) = minT j2Xi oj. The tasks are first sorted and reindexed so that w1 o1e 1 ? w2 o2 e2 ? \Delta \Delta \Delta ? wn on en : We first assign the minimum requirement periods to all tasks, i.e., pi = ai = oib i ; for 1 ^ i ^ n: 90 We then start to decrease some pi in order to increase the processor utilization as in Section 7.2.1. The only difference is that pi should not be less than minT j2Xi pj for an internal task Ti. 7.3.1.1 The General Case As in Section 7.2.1.1, we use the LL condition for the general case. If there exist i and s(i) that satisfy wi oie i ? ws(i) os(i) es(i) ; we will decrease pi before ps(i). If that is the case, we need to change pi when we change pb(i). Since we use the LL condition, we can follow the same argument as in Section 7.2.1.1 and pi will be expressed as in Eq.(7.2). To derive pi, we first determine m and compute pis ignoring their lower limits. Then, if pi is less than its lower limit, we set pi to its lower limit. Next, we reduce pm to achieve maximum processor utilization. If Tm is internal, we can find pm by solving an equation, since shortening pm does not affect the lower limit of other pi. Otherwise, we may need to change several pis, including pm, simultaneously. The algorithm for finding pi is defined below. In the algorithm, Qm is the set of internal tasks such that, for any Ti 2 Qm, pi will change if we decrease pm from fmom to om. Qm can be found by including every task Ti such that Tm 2 Xi, wioi=ei ? wmom=em, and pi ? om. 1. Remove the limit pi * minT j2Xi pj and assign pi as: pi = 8?!?: oi; for 1 ^ i ^ m \Gamma 1; fioi; for m ^ i ^ n; such that m is the maximum value which satisfies m\Gamma 1X i=1 ei oi + nX i=m ei fioi ^ n(2 1=n \Gamma 1): 2. Find all internal tasks Tis which have pi = oi but minT j2Xi pj ? oi, and set pi = minTj2Xi pj. 3. If Tm is an internal task and U 0 j m\Gamma 1X i=1 ei pi + em minT j2Xm pj + nX i=m+1 ei fioi ^ n(2 1=n \Gamma 1); assign pm = minT j2Xm pj. Set m = m + 1 and repeat 3. 91 fmomom G4 G1 G2 G3 gm (x) x Figure 7.3: Values of gm(x) 4. If Tm is an internal task and U 0 ? n(21=n \Gamma 1); find pm such that m\Gamma 1X i=1 ei pi + em pm + nX i=m+1 ei fioi = n(2 1=n \Gamma 1); and stop. 5. If Tm is an external task, find the set of internal tasks Qm. Then define gm(x) j X Tj2Qm ej min(pj; x) + em x : If G j n(21=n \Gamma 1) \Gamma mX i=1T i62Qm ei pi \Gamma nX i=m+1 ei fioi ? gm(om); assign pm = om and for all Tj 2 Qm, pj = minT k2Xj pk. Set m = m + 1 and go back to 3. 6. If G ^ gm(om); find x which satisfies gm(x) = G. Set pm = x and for all Ti 2 Qm, pi = x and stop. gm(x) is a monotonically decreasing function and consists of connected hyperbolas as shown in Figure 7.3. To solve gm(x) = G, we compute the connection points of hyperbolas in gm(x) which are shown as G1; G2; \Delta \Delta \Delta in Figure 7.3. By comparing Gi and G, we can find a section in which G falls. Since we know the function equation of each section, we can derive x which satisfies gm(x) = G. 92 7.3.1.2 A Restricted Case Assume oi = O and bi = 1=F for all i, 1 ^ i ^ n. We first assign all pi to be F O. We then use the LL condition to derive the initial pi values. To maximize processor utilization, we can decrease p1 to its minimum, then p2 and so on. The minimum pi is O for external tasks and minT j2Xi pj for internal tasks. That is, we will assign pi as follows: pi = 8?????! ?????: O; 1 ^ i ^ m \Gamma 1 and Ti is external; minT j2Xi pj; 1 ^ i ^ m \Gamma 1 and Ti is internal; F O; m ^ i ^ n: (7.12) Suppose minT j2Xi pj = F O for an internal task Ti. It is useless for us to decrease pi. Therefore, we can modify wi such that wioi=ei ! maxT j2Xi(wjoj=ej ). Then, Eq.(7.12) becomes pi = 8?!?: O; 1 ^ i ^ m \Gamma 1; F O; m ^ i ^ n: (7.13) This is easier for our search of pis since we will never try to decrease pi of internal task Ti before minT j2Xi pj becomes equal to O. We therefore may follow the same method as in Section 7.2.1.2 to find pis. 7.3.2 Maximizing the Uniform External Consistency We will sort Tis so that w1 * w2 * \Delta \Delta \Delta * wn. Since we always assign higher priorities to external tasks than to internal tasks, wi ! wj holds for all internal tasks Ti and external tasks Tj. Assume task Ti is an external task if 1 ^ i ^ l and an internal task if l + 1 ^ i ^ n. Our objective is to find the maximum D such that r1 w1 = r2 w2 = \Delta \Delta \Delta = rn wn = D: (7.14) For each internal task Ti, oi = minT j2Xi oj = os(i): (7.15) From Eq.(7.14) and Eq.(7.15), we can assign pi = 8??!??: oi Dwi 1 ^ i ^ l;o i Dwi = os(i) Dwi ; l + 1 ^ i ^ n: 93 Hence, U = D nX i=1 eiwi oi = D lX i=1 eiwi oi + D nX i=l+1 eiwi os(i) : (7.16) With the same argument as in Section 7.2.2, we use the LL condition. With the constraint U ^ n(21=n \Gamma 1) in Eq.(7.16), we can find D and all pis. However, for an external task Ti, if pi ! oi we should change pi to oi. For an internal task Ti, if pi ! minT j2Xi pj we should set pi = minT j2Xi pj. We can find pi ! oi by testing Dwi ? 1. Testing pi ! minT j2Xi pj is not simple, since minTj2Xi pj may not be equal to os(i). However, since all external tasks have larger weights than all internal tasks, minT j2Xi pj is equal to os(i) when we adjust the period of an internal task. The reason is as follows. Suppose that after we have adjusted pjs for external tasks, an internal task Ti has oi = os(i) = minT j2Xi oj , and pi ! minT j2Xi pj : (7.17) ws(i) ? wi because Ts(i) is an external task and Ti is an internal task. If Dws(i) ! 1, ps(i) = os(i)Dw s(i) ! os(i) Dwi = pi: Since minT j2Xi pj ^ ps(i) ! pi, Eq.(7.17) cannot be satisfied. If Dws(i) * 1, ps(i) is already adjusted to os(i). Since os(i) ^ minT j2Xi pj, minTj2Xi pj must be equal to ps(i) = os(i). Thus, for both external and internal tasks, we only need to test Dwi ? 1 to adjust pi to oi. We therefore can follow the same algorithm as in Section 7.2.2. 7.4 Examples In this section, we show some examples using the algorithms discussed earlier. The first example task set is shown in Table 7.1. We assume all tasks have the same minimum external consistency requirement, i.e., all bi = 1=F . This satisfies all assumptions for the restricted case in Section 7.2.1.2. Since we want to maximize E = Pi wiri, an intuitive solution is to strictly assign a larger ri to a task with a larger wi. To do this, we first assign the minimum ris, i.e., the longest possible pi--F oi in this example--to all tasks. We then shorten the pis with the largest wis one at a time to their minimum oi. The process is repeated as long as the task set is still schedulable using the LSD condition. The results of such assignments are shown in the rows with label (i) in Table 7.2. We show the results for different F values. 94 Table 7.1: Example 1 Transaction (Ti) T1 T2 T3 oi 1 1 1 Execution Time (ei) 0.8 0.6 0.25 Weight (wi) 6 5 2 Table 7.2: Interval Assignment of Example 1 Algo- F Interval (pi) E U D rithm T1 T2 T3 (i) 3 1.23 3 3 7.23 0.936 -- (ii) 3 1 2.63 7.76 0.969 -- (i) 4 1.08 4 4 7.29 0.951 -- (ii) 4 1 1.29 8.05 0.994 -- (i) 5 1 3.75 5 7.73 0.983 -- (ii) 5 1 1.13 7.98 0.982 -- (iii) -- 1.78 2.13 5.32 6.09 0.778 0.094 Note: (i): Using weight only. (ii): Using the algorithm in Section 7.2.1.2. (iii): Using the algorithm in Section 7.2.2. The rows with label (ii) use the algorithm in Section 7.2.1.2. In these cases, we assign a longer interval to T1 than to T2 and T3 because w2o2 e2 = 8:33 ? w3o3 e3 = 8 ? w1o1 e1 = 7:5: We can see that the weighted sums E of external consistencies using this algorithm are consistently better than those in label (i). Also in Table 7.2, we show pi assigned using the algorithm in Section 7.2.2 in the row labeled (iii). As expected, E for (iii) is much lower than that in rows (i) and (ii). This is mostly due to the low processor utilization bound U since we do not use the LSD condition. The task set in Table 7.3 presents a more general example in which tasks have different fi values. Again all tasks are external. The results are shown in Table 7.4. When we use the algorithm in Section 7.2.2, the first value of D is about 0.126. With this D, Dw1 = 1:258 ? 1. We therefore set p1 = 2 and recalculate D. The new D is 0.150. With that, both Dw2 and Dw3 are less than 1. The example in Table 7.5 involves both external and internal tasks. T1 and T3 are external and T2 is internal. T2 uses raw data objects updated by T1 and T3. The result is shown in 95 Table 7.3: Example 2 Transaction (Ti) T1 T2 T3 oi 2 1 1 Execution Time (ei) 0.6 0.5 0.4 Weight (wi) 10 4 3 fi 5 4 6 Table 7.4: Interval Assignment of Example 2 Algo- Interval (pi) E U D rithm T1 T2 T3 (iv) 2 1.22 6 13.78 0.777 -- (iii) 2 1.67 2.22 13.75 0.780 0.150 Note: (iv): Using the algorithm in Section 7.2.1.1. (iii): Using the algorithm in Section 7.2.2. Table 7.5: Example 3 Transaction (Ti) T1 T2 T3 oi 2.3 1 1 Execution Time (ei) 0.55 0.5 0.7 Weight (wi) 5 3 4 fi 5 6 4 Ext/Int Ext Int Ext Table 7.6: Interval Assignment of Example 3 Algo- pi E U D rithm T1 T2 T3 (iv0) 2.3 2.22 2.22 8.15 0.780 -- (iv) 2.3 1.37 4 8.19 0.779 -- (iii0) 3.25 2.35 1.76 7.09 0.780 0.142 Note: (iv0): Using the algorithm in Section 7.3.1.1. (iv): Assuming all tasks are external. (iii0): Using the algorithm in Section 7.3.2. Table 7.6. For comparison, we include the result using the algorithm in Section 7.2.1.1 assuming all tasks are external. 96 Chapter 8 Conclusions A real-time system has timing requirements on the services that the system provides. In order for the real-time system to operate properly, the timing requirements must be satisfied. A database system requires task serializability in order to guarantee correct results when many tasks execute concurrently in the system and compete for data. A real-time database system therefore requires the functionalities of both a real-time system and a database system. In other words, a real-time database system must satisfy timing constraints while maintaining serializability. Maintaining serializability may cause the suspension of some tasks. A real-time system usually adopts a priority-driven scheduling algorithm. Tasks are assigned priorities and they are executed according to the priorities. The suspension of tasks causes blockings when a lower priority task executes while higher priority tasks are suspended. The suspension of higher priority tasks causes an unpredictable delay on the tasks that is not desirable for a real-time system. We need mechanisms to handle these two conflicting requirements. Two other requirements for real-time database systems must also be satisfied in addition to timing requirements and serializability; namely, external consistency and temporal consistency. Since a real-time database system responds to events in the physical world, the data used to produce the responses must be up-to-date. External consistency is defined to ensure the currentness of data in the database. On the other hand, several input data may be taken from the physical world to produce the responses. The set of input data taken from the physical world must present a simultaneous snapshot of the physical world. This simultaneity defines temporal consistency. The various consistency requirements and other system design specifications provide a range of legitimate periods for a task. A database designer must decide how to assign a specific period to each task. By using a scheduling algorithm that has a predictable temporal behavior and which is able to satisfy logical and temporal consistency, the designer guarantees that tasks are schedulable after period assignment and satisfies logical, external, and temporal consistency. 97 SystemflSpecificationfl Timing Constraintsfl External Consistencyfl Temporal Consistencyfl Logical Consistencyfl Scheduling Algorithmfl Ranges offlPeriodsfl ConcurrencyflControlfl Algorithmfl Schedulabilityfl Conditionfl PeriodflAssignmentfl Figure 8.1: Temporal Design Flow 8.1 Summary of Results We have studied the scheduling and concurrency issues for real-time databases. Using the scheduling and concurrency control algorithms and strategies to maintain temporal consistency and external consistency already discussed, we can summarize the timing design process of realtime database systems as shown in Figure 8.1. Before we start to design a real-time database system, a specification of the system is defined. The specification defines the timing requirements of the system such as the maximum period of a control signal output and an interval of a sensor data update. The timing requirements define ranges of some task periods. The specification may also define the maximum delay from a sensor input to a control signal output. This is external consistency. External consistency also defines ranges of task periods. The requirements of logical consistency and temporal consistency are also described in the specification. A concurrency control algorithm is chosen from the requirements of temporal consistency and logical consistency. The specification may determine the scheduling algorithm, such as RM and EDF. Otherwise, a scheduling algorithm is chosen in the design process. From the concurrency control algorithm and the scheduling algorithm, the schedulability condition is determined. Using the schedulability condition and the ranges of task periods, we can assign periods to tasks. By pursuing this timing design process, we can guarantee that timing constraints and external consistency are satisfied. Also, temporal consistency and logical consistency are satisfied if they are required. Our concurrency control algorithms are based on and extend the priority ceiling protocol [SRL90]. The priority ceiling protocol is proposed for a system using a static priority-driven 98 scheduling algorithm. The priority ceiling protocol has the single-blocking property and the no-deadlock property that are desirable in a real-time system. However, the protocol is not appropriate for real-time database systems because the protocol does not guarantee serializability. In order to maintain serializability, one solution is to use the two-phase locking protocol in conjunction with the priority ceiling protocol. However, this combination is an overprotection. In Chapter 3, we propose an alternative algorithm, the convex ceiling protocol, that can guarantee a predictable temporal behavior and serializability of schedules on systems using a static priority assignment. We prove that the algorithm has the single-blocking property and is deadlock free. The schedulability condition of the algorithm has also been derived. Compared with the combination of the priority ceiling protocol and the two-phase locking protocol, the convex ceiling protocol has an effect of reducing the worst case blocking lengths. Consequently, the convex ceiling protocol can achieve a higher processor utilization than the combination algorithm without violating timing requirements and serializability. The convex ceiling protocol assumes static task priorities. In Chapter 4, we prove that the rate monotonic priority assignment is optimal for the convex ceiling protocol. It is optimal in the sense that if a task set is not schedulable with the rate monotonic priority assignment the task set is not schedulable with any other static priority assignment using the convex ceiling protocol. This is the same sense of optimality as proved by Liu and Layland [LL73] for the rate monotonic algorithm on systems without shared resources. Also in Chapter 4, we extend the priority ceiling protocol to handle access types on data such as read and write. By distinguishing access types, the system can execute more tasks concurrently and achieve a better response time. Our algorithm is not limited to read and write access types but is applicable to more general access types. The convex ceiling protocols discussed in Chapter 3 and 4 assume a static priority assignment to tasks. We modify the convex ceiling protocol to handle a dynamic priority assignment, the earliest deadline first assignment, in Chapter 5. We prove that the modified protocol guarantees the single-blocking property, the no-deadlock property, and serializability. Also, the schedulability condition of the algorithm is discussed. Although our algorithm does not improve the schedulability condition over either the combination of the two-phase locking protocol and the dynamic priority ceiling protocol or the combination of the two-phase locking protocol and 99 the stack resource policy, our algorithm has a better average case behavior because of generally shorter blocking lengths. Chapter 3, 4, and 5 concern only timing requirements and serializability. In Chapter 6, we consider the other constraints: external consistency and temporal consistency. We show how to impose a bound on a task period in order to maintain external consistency. To maintain temporal consistency, we propose three strategies. One is the time stamp based strategy. The other is based on locking. The last one is an extension of the convex ceiling protocol. Between the last two strategies, the extension of the convex ceiling protocol gives shorter blocking lengths. External consistency gives a legitimate range of task periods. Also, other system design specifications restrict the task periods. Based on the range of task periods, we propose algorithms to assign a specific period to each task in Chapter 7. We use two objective functions. One is to maximize the weighted sum of external consistency. With this approach, external consistency of important data is maintained as much as possible. The other objective is to assign the periods so that every task has external consistency proportional to its weight. The criterion for choosing between these two strategies is the interpretation or the semantics of the weight assignment. 8.2 Directions for Future Research This thesis provides an initial investigation into designing real-time databases. Many other issues are still to be resolved before we can build practical and useful real-time databases. In our research, we assume the system has a single processor. In the real world, distributed systems and multiprocessors are increasingly popular and are sometimes a must. An obvious extension of our research is to incorporate the convex ceiling protocol into distributed or multiprocessor systems. Rajkumar et al. [RSL88] have proposed an extension of the priority ceiling protocol for multiprocessor systems. We need to consider the applicability of their strategy and possibly different strategies for the convex ceiling protocol. Recently, Kuo and Mok [KM91] have reported an interesting behavior of the rate monotonic algorithm. Even if a task set is not schedulable with a certain period assignment, the task set may be schedulable by shortening some of the periods. This phenomenon is counter-intuitive because shortening periods means increasing processor utilization. However, modifying certain 100 periods allow systems to avoid worst case scheduling combinations. Thus, increasing processor utilization may make the task set more schedulable in some cases. We need to extend our period assignment algorithms to exploit this behavior of the rate monotonic algorithm. The design methodology of real-time database systems is an important research area, but one which is not well explored. Therefore, general real-time database design methodology is a large future research topic. The materials we have presented in this thesis are a starting point and only a part of this research topic. We need an appropriate database model and an access structure for real-time database systems. Also, a language different from conventional query languages to describe queries in conjunction with timing constraints is necessary. Hardware architecture suitable for real-time database systems may be another fruitful direction of the research. 101 Bibliography [AGM88] R. Abbot and H. Garcia-Molina. Scheduling real-time transactions: a performance evaluation. In Proc. of the 14th International Conference on Very Large Data Bases, pages 1-12, 1988. [Bak90] T. P. Baker. Stack-based scheduling of realtime processes. In Proc. of Real-Time Systems Symposium, pages 191-200, December 1990. [Bak91] T. P. Baker. Stack-based scheduling of realtime processes. The Journal of RealTime Systems, 3(1):67-99, March 1991. [BHG87] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Publishing Company, Reading, MA, 1987. [BMHD89] A. P. Buchmann, D. R. McCarthy, M. Hsu, and U. Dayal. Time-critical database scheduling: A framework for integrating real-time scheduling and concurrency control. In Proc. of 5th Data Engineering Conference, pages 470-480, February 1989. [Bro79] J. R. Brown. The knapsack sharing problem. Operations Research, 27(2):341-355, March-April 1979. [CES71] E. G. Coffman, Jr., M. J. Elphick, and A. Shoshani. System deadlocks. Computing Survey, 3(2):67-78, jun 1971. [Che91] M.-I. Chen. Schedulability Analysis of Resource Access Control Protocols in RealTime Systems. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, Illinois, 1991. [CJL89] M. J. Carey, R. Jauhari, and M. Livny. Priority in DBMS resource scheduling. In Proc. of the Fifteenth International Conference on Very Large Data Bases, pages 397-410, August 1989. [CL90] M.-I. Chen and K.-J. Lin. Dynamic priority ceilings: A concurrency control protocol for real-time systems. The Journal of Real-Time Systems, 2(4):325-346, 1990. 102 [EGLT76] K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. The notions of consistency and predicate locks in a database system. CACM, 19(11):624-633, November 1976. [Gla80] R. L. Glass. The "lost world" of software debugging and testing. CACM, 23(5):264- 271, May 1980. [HCL90] J. R. Haritsa, M. J. Carey, and M. Livny. On being optimistic about real-time constraints. In Proc. of 1990 ACM Principle of Database Systems Symposium, pages 331-343, 1990. [HL92] C.-C. Han and K.-J. Lin. Scheduling distance-constrained real-time tasks. In Proc. Real-Time Systems Symposium. IEEE, December 1992. [HSTR89] J. Huang, J. A. Stankovic, D. Towsley, and K. Ramamritham. Experimental evaluation of real-time transaction processing. In Proc. of Real-Time Systems Symposium, pages 144-153, December 1989. [KM91] T.-W. Kuo and A. K. Mok. Load adjustment in adaptive real-time systems. In Proc. Real-Time Systems Symposium, pages 160-170. IEEE, December 1991. [LG75] H. Luss and S. K. Gupta. Allocation of effort resources among competing activities. Operations Research, 23(2):360-366, March-April 1975. [Lin89] K.-J. Lin. Consistency issues in real-time database systems. In Proc. of 22nd Hawaii International Conference on System Sciences, Software Track, pages 654-661, 1989. [LL73] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. JACM, 20(1):46-61, 1973. [LLS88] J. W. S. Liu, K.-J. Lin, and X. Song. A position paper on scheduling hard realtime transactions. In Proc. 1988 Workshop on Real-Time Operating Systems and Software, May 1988. [LS90] Y. Lin and S. H. Son. Concurrency control in real-time databases by dynamic adjustment of serialization order. In Proc. Real-Time Systems Symposium, pages 104-112. IEEE, December 1990. [LSD89] J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: Exact characterization and average case behavior. In Proc. of Tenth Real-Time Systems Symposium, pages 166-171, 1989. 103 [LSS87] J. P. Lehoczky, L. Sha, and J. K. Stronsnider. Enhanced aperiodic responsiveness in hard real-time environments. In Proc of Eighth Real-Time Systems Symposium, pages 261-270, 1987. [MC70] R. R. Muntz and E. G. Coffman, Jr. Preemptive scheduling of real-time tasks on multiprocessor systems. JACM, 17(2):324-338, 1970. [Mok83] A. K. Mok. Fundamental Design problems of Distributed Systems for the Hard RealTime Environment. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, May 1983. [NL91] H. Nakazato and K.-J. Lin. Interval assignment for periodic transactions in real-time database systems. In Proc. Seventh International Conference on Data Engineering, pages 378-385, Kobe, Japan, April 1991. IEEE. [RSL88] R. Rajkumar, L. Sha, and J. P. Lehoczky. Real-time synchronization protocol for multiprocessors. In Proc. of the Real-Time Systems Symposium, pages 259-269. IEEE, December 1988. [SG90] L. Sha and J. B. Goodenough. Real-time scheduling theory and ada. IEEE Computer, 23(4):53-62, April 1990. [SL90] X. Song and J. W. S. Liu. Performance of multiversion concurrency control algorithms. In Proc. 14th COMPSAC, pages 132-139. IEEE, October 1990. [SL92] X. Song and J. W. S. Liu. How well can data temporal consistency be maintained? In Proc. Symposium on Computer-Aided Control System Design, pages 275-284. IEEE, March 1992. [SRL90] L. Sha, R. Rajkumar, and J. P. Lehoczky. Priority inheritance protocols: An approach to real-time synchronization. IEEE Trans. on Computers, 39(9):1175- 1185, September 1990. [Sta88] J. A. Stankovic. Misconceptions about real-time computing: A serious problem for next-generation systems. IEEE Computer, 21(10):10-19, October 1988. 104 Vita Hidenori Nakazato was born on July 15, 1958 in Tokyo, Japan. He attended Waseda University, Tokyo, where he received the B.Engr. degree in Electronics and Communication Engineering in 1982. From 1982 to 1987, he worked for Oki Electric Industry as a hardware engineer. He developed digital signal senders and receivers for public telephone switching systems. He also joined the ISDN switching system project in Oki. He started his M.S. work at the University of Illinois at Urbana-Champaign in 1987 with a scholarship from Oki Electric Industry. He received his M.S. degree in Computer Science in August, 1989 and continued to work on the Ph.D. His Ph.D. degree in Computer Science from the University of Illinois at UrbanaChampaign was conferred in January, 1993. At the University of Illinois, he performed research on real-time systems under the direction of Professor Kwei-jay Lin. He is currently a researcher at Oki Electric Industry. His research interests include real-time systems, database systems, distributed computer systems, and algorithms. 105