Real-Time Operating Systems                         12/06/93









                      Important Aspects

                             of

                 Real-Time Operating Systems

                              

                        Ramah Muralid
                         Lee Patton
                         Nathan Ward
                              

                      6 December, 1993

                   George Mason University

                           CS 571

Introduction.  Real-time operating systems (RTOS) play an

important role in real-time systems. The timing constraints

in real-time systems affect both the hardware and software

architecture. There are several aspects of real-time

operating systems (RTOS) that are important for various

reasons due to the nature of real-time systems. These

aspects include:



  flexibility in size and functionality

  speed

  predictability

  resource management



The specific features that are provided by a RTOS and how

they are implemented affect these aspects. This paper

describes several RTOS features and how they affect these

aspects of RTOS.



Flexibility.  A wide variety of hardware configurations are

used in real-time applications. As a result, flexibility of

a RTOS in size and functionality is important so that the

operating system can be adapted to the needs of a variety of

applications. Some RTOS are designed to run on specific

platforms while others are more flexible. Venix is a real-

time version of UNIX that run only on 80x86 processors

[Small 41]. QNX, pSOS, VRTX32 and many other RTOS are

designed to be adaptable to many hardware configurations.

These OS are structured into components that can be added or

removed to fit the application's needs.

     QNX is a microkernel in which the kernel consists of

services for inter-process communication, low-level network

communication , process scheduling, and interrupt

dispatching [Hildebrand 2]. Examples of subsystems provided

by QNX include a file system manager and a device manager.

QNX implements these components as separate processes. This

allows the subsystems to be dynamically started, stopped or

replaced while the system is running [Hildebrand 4-6].

VRTX32 and pSOS can also be scaled to run with or without a

file system and other components. However, these RTOSs do

not implement their components as processes. To use these

operating systems, the operating system object code must be

linked with the application object code to produce an

executable program. The application code must set up some

operating system parameters and start the operating system

when the program is started. If a file system is not needed

for a particular application, it can be removed by relinking

the executable without the object code for the file system

and

     Microkernel operating system can add to the flexibility

of an system. A microkernel architecture is different from a

monolithic kernel in that the higher level operating system

services are treated like application processes. To access

such a service, the application passes a request to the

service through the kernel similar to how a message would be

passed between application processes. As a result the

service can dynamically be started/stopped or replaced. QNX

is an example of a microkernel. VRTX32 and pSOS are

monolithic operating systems where the operating system

services are part of the kernel [Marrin 78]. While

microkernels may provide additional flexibility, it may take

longer to access a service with a microkernel because of the

message passing overhead. QNX overcomes this by directly

copying messages to a process's memory instead of

implementing system queues [Hildebrand 3].

     Real-time systems need to be adaptable to changes in

system state including overloads and failures, changes in

system configuration and changes in task specifications.

Adaptability is important for real time systems if a task's

deadlines can be met only under a restricted system

state/configuration. Reliability and performance may be

compromised in this situation. If a system is adaptive, it

is not necessary to redefine the system or recompute the

resources and task allocation for every small change. This

reduces development and maintenance costs. When the system

is adaptive it becomes easy to maintain and expand the

system.

     Real-time systems often consists of multiple

processors. There are several ways that a RTOS can be

adapted to a multiprocessor system. Multiprocessor systems

can be categorized as centralized systems and distributed

systems. The processors in centralized systems are connected

in such a way so that the cost of inter-processor

communication is negligible. Inter-processor communication

in distributed systems is often through a slower medium that

makes the cost of inter-process communication significant

[Cheng 151].

     In centralized multiprocessor systems, a separate

operating system kernel may execute on each processor or the

operating system processing may be separated from the

application processing by dedicating some processors to the

operating system and others to application processing. The

pSOS operating system is designed to execute on each

processor in a system. pSOS makes the inter-process

communication between application tasks on separate

processors transparent to the application software [Williams

96]. Another approach is to separate the operating system

processing from the application processing. The Spring

operating system and the Power Series operating system are

examples of operating systems that run on dedicated

processors and dynamically schedule application processing

on other processors[Williams 99][Stankovic 64 1991].



Speed.  A RTOS needs to perform its functions quickly for

the system to be fast enough to meet the system's timing

constraints. Many RTOS available today help ensure that

deadlines will be met by reducing operating system overhead.

Operating system overhead consists of 1) interrupt latency

2) preemption latency and 3) context switching overhead.

     Interrupt latency is the amount of time that the OS

disables interrupts to perform critical functions. Most

embedded RTOS provide an interrupt latency on the order of

tens of microseconds. The MTOS operating system's interrupt

latency is only five to six instructions on a 80386

processor. Interrupts are disabled only when the kernel is

accessing a linked list required for work scheduling [Marrin

80].

     Preemption latency is the amount of time that the task

switching is disabled by a task during an OS service call.

This can produce priority inversion where a higher priority

task is ready to execute, but is delayed because a lower

priority task is executing and preemption is temporarily

disabled. Preemption latency can be reduced by making OS

services reentrant. This allows a higher priority task to

preempt a lower priority task without waiting for the lower

priority task to finish the OS service. The context for the

lower priority task can be saved and the lower priority task

can be immediately suspended. Operating systems that reduce

preemption latency in this way are referred to as

multithreaded operating systems because they allow multiple

thread of execution concurrently access the same OS service

[Marrin 78].

     Context switching overhead can be reduced by allowing

applications to be implemented as threads instead of

processes. A process has a rather large amount of context

information that must be saved during a context switch while

a thread shares most of it's context with other threads. As

a result, context switching and inter-process communication

can be more efficient between threads. However, threads

share the same virtual address space and can write over

information that belongs to a separate thread. Furthermore,

context switching and inter-process communication may occur

more often between threads if processes have been broken

into very many threads [Marrin 78]. Applications that use

threads are referred to as multithreaded applications. Note

that this is different from a multithreaded operating system

that allows several threads of execution (either an

application process or an application thread) to

concurrently access an operating system service. The Chorus

operating system provides uses application threads. Chorus

also defines "actors" which are a virtual memory area that

several threads can execute in. The actors can be on

multiple processors [Williams 100].



Predictability.  To ensure that the deadlines in a real-time

system can be met, it is important to be able to predict the

execution time of software in a real-time system.

Predictability involves foreseeing resource needs and

availability, before executing a single task, so that there

can be guarantee to achieve timing constraints. The

execution time under various situations for a system can be

determined if the worst case interrupt latency and

preemption latency are known [Marrin 80]. Many RTOS on the

market today publish these times.

     Real-time systems are being applied to more complex

applications where it is not possible to make accurate

predictions. Some new concepts in real-time operating

systems are developing as a result. Many RTOS available

today allow arbitrary waits for resources or events, and any

operation on the task is treated random. The article

entitled "The Spring Kernel: A New Paradigm for Real-Time

Systems" points out:



next-generation systems will include autonomous land rovers,
teams of robots operating in hazardous environments like
chemical plants and undersea exploration, intelligent
manufacturing systems, and the proposed space station. These
next-generation real-time systems will be large, complex,
distributed, and adaptive [Stankovic 62 1991].


     This article and others argue that new ways of looking

at RTOSs are necessary to meet these needs. Some of the

factors that are driving these needs are the fact that a

real-time system has a dual responsibility of producing

correct results and meeting deadlines while it does so.

Timing correctness (or the ability to meet deadlines)

depends directly on two factors: the application's time and

resource requirements and the availability of resources on

the execution platform in the given time. During execution,

these are overlapping issues (an application's requirements

and the availability of resources) which must be addressed

together when designing a system so that guarantees can be

made to ensure timing correctness is achieved. But before

guarantees can be made, you must be able to accurately

predict resource requirements. If predictions are

inaccurate, insufficient resources will be available when

they are needed and the computation won't complete within

the deadline. Additionally, resource availability is

affected by variations in system load (which may cause

overloads) and faults, both of which can cause exception-

handling software being invoked. All of this requires some

form of dynamic scheduling to be used. [Natarajan, 16]

     It is because of the requirement of having to meet

deadlines with certain constraints imposed on them that many

RTOSs fail to significantly aid the programmer when

developing a system. For example, while they stress fast

mechanisms like fast context switching and the ability to

respond to external interrupts quickly (as discussed under

speed above), they retain the main abstractions of time-

sharing operating systems. These include:



  Viewing the execution of a task as a random process where

  a task could be blocked at arbitrary points during its

  execution for an indefinite period of time. This, of

  course, can contribute significantly to missing a deadline

  because you lose the ability to predict when the task will

  finish.



  Assuming that little is known about the tasks beforehand

  so little (or no) semantic information about the tasks is

  used at runtime. Of course this assumption is false

  because each task in a real-time system is well defined

  and can be analyzed beforehand as to what its resource

  requirements are and the importance of the task in

  completing before a deadline.



  Trying to maximize throughput or minimize average response

  time. Throughput and average response time are not the

  primary metrics for real-time systems -- a system could

  have good average response time and miss every deadline,

  resulting in a useless system. [Stankovic 62 1991]



     All of these abstractions contribute to making the RTOS

non deterministic. And the more unpredictable an operating

system, the more difficult it is to ensure deadlines will be

met because of the problems a system designer has in

factoring in all possible events that may occur that would

impact a process's ability to meet its deadline.

     However, the Spring kernel attempts to remedy some of

these abstractions for RTOSs so that it relieve the system

designer of many of the scheduling problems involved in a

process meeting its deadline. One way it does this is by

classifying each task that is executed by its importance and

timing requirements. The importance of a task signifies the

value imparted to the system when the task satisfies its

timing constraint. A task's timing requirements may range

over a wide spectrum, including hard deadlines, soft

deadlines, and periodic execution requirements, while still

others may have no explicit timing requirements at all. So

based on each task's value and timing requirements, Spring

groups them into one of three classifications.



Critical tasks -- those tasks that must meet their deadline,

otherwise a catastrophic result might occur.



Essential tasks -- those tasks that are necessary to the

operation of the system, have specific timing constraints,

and will degrade the system's performance if their timing

constraints are not met. However, these tasks will not cause

catastrophe if they do not finish on time.



Unessential tasks may or may not have deadlines, and they

execute when they do not affect critical or essential tasks.

These include background tasks such as long range planning

tasks or maintenance tasks. [Stankovic 63 1991]



     In most cases, the ratio between critical tasks and

essential tasks is small and therefore a majority of

Spring's scheduling algorithm involves ensuring that

essential tasks meet their deadlines. To do this, Spring is

designed to run on a system consisting of five

microprocessors which constitute a node. One processor is

dedicated to running the kernel thereby off loading the

scheduling algorithm and other operating system overhead.

The other three processors are dedicated to applications.

This setup removes many uncertainties and speeds up the

execution of the critical tasks within a system because

applications processors do not need to respond to external

interrupts or execute operating system overhead. The last

processor is dedicated to managing the I/O of the system

which is a significant problem in real-time systems because

of the asynchronous nature of it and, in many cases,

processing must take place immediately after data has been

received. [Stankovic 64 1991] However, this particular

problem is beyond the scope of this paper.

     Tasks within the system are defined prior to execution

time when the compiler decomposes the real-time programs

into schedulable entities with precedence relationships,

resource requirements, fault-tolerance requirements,

importance levels, and timing constraints. [Stankovic 63

1991] System designers impart this information relative to a

process through a System Description Language (SDL). Then

the compiler merges the source code with the SDL into

runtime data structures for the kernel to execute. With this

information, the kernel is then able to make guarantees

about whether a task will meet its deadline or not. It does

this by separating guarantees into two main parts: an a

priori guarantee for critical tasks and an on-line guarantee

for essential tasks.

     All critical tasks are guaranteed beforehand (an a

priori guarantee). Resources are reserved for them either in

one of the dedicated processors or as a dedicated collection

of resource slices on the application processors. [Stankovic

67 1991] With this approach, the kernel is directly

addressing the issue of meeting a task's timing requirements

and ensuring the availability of resources at the correct

time for those tasks which always must meet their timing

requirements. In other words, since all information about

the execution of these tasks is known beforehand, the actual

execution of these tasks is predictable.

     However, because of the many essential tasks and their

many possible invocation orders, preallocating resources to

these types of tasks costs too much and is too inflexible.

This is where Spring uses a unique approach of an integrated

CPU and resource allocation scheduling policy to provide the

on-line guarantees for essential tasks.  Most other RTOSs

view these two items separately and this can cause problems

when trying to meet deadlines. For instance, using an

earliest deadline first algorithm may result in scheduling a

task which will later become blocked by a resource not being

available. With the Spring kernel's approach, this never

occurs because CPU allocation and resource allocation are

integrated. The heuristic scheduling algorithm tries to

determine a full feasible schedule using all knowledge about

a task as it arrives to be guaranteed and scheduled.

Additionally, when a new task is invoked, the scheduler

tries to plan a schedule for it so all tasks currently

running can also meet their deadlines. This way, the Spring

kernel  understands  the total system load and is able to

make intelligent decisions when a guarantee can't be made.

[Stankovic 68 1991]

     The system goes further by using an essential task's

importance when determining a schedule. Each task is

assigned a level of importance that may vary as system

conditions change. To maximize the  value  of executed

tasks, all critical tasks should meet their deadlines, and

as many essential tasks as possible should also meet their

deadlines. Those tasks that do not meet their deadlines

should be the ones that are designated as the least

important tasks in the system at that instance in time. To

meet this need, Spring's on-line guarantee algorithm has the

following characteristics:



  At any time, the operating system knows exactly which

  tasks have been guaranteed to meet their deadlines; what,

  where, and when spare resources exist or will exist; a

  complete schedule for the guaranteed tasks; and which

  tasks are running under unguaranteed assumptions. This

  allows the system to make intelligent decisions as to

  whether to guarantee a task or not.



  Conflicts over resources are avoided. This eliminates the

  random nature of waiting for resources found in

  traditional time-sharing operating systems.



  Dispatching and guarantees are separated, letting these

  system functions run in parallel. Therefore, when a

  process is being dispatched, the scheduling algorithm

  continues to run and make guarantees for other processes

  coming into the system without having to wait for

  processes to be dispatched after it is inserted into the

  system queue.



  By deciding whether a task can be guaranteed when it first

  arrives, there may be time to reallocate the task to

  another node via a distributed scheduling module that is

  contained within the scheduler. Therefore, a task may

  still possibly meet its deadline even if it is not

  guaranteed (e.g., if a task is not guaranteed, it could

  receive idle cycles within the node that it is executing

  on).



  Even though a task is guaranteed for its worst-case time

  and resource requirements, the kernel reclaims unused time

  and resources if the task finishes early. Therefore, it

  doesn't waste any resources unnecessarily. [Stankovic 67

  1991]



     Spring's approach should not be confused with the

static priority scheduling mechanism typically found in real-

time systems. For example, with static priority scheduling,

a designer may have a task with a short deadline and low

importance and another task with a long deadline and high

importance. It is logical to assume that many designers

would give the priority to the short deadline (i.e., use the

Shortest Deadline First algorithm). But if many external

interrupts occur or if the system load is increased

significantly (i.e., an overload condition occurs) then the

higher priority task may miss its deadline. By incorporating

a task's importance into the scheduling algorithm, it could

be seen that in most cases, the one with the shortest

deadline would be executed first. However, when an overload

condition exists, the algorithm is flexible enough to ensure

that the lower priority task does not conflict with a higher

priority task and cause it to miss a deadline. [Stankovic 69

1991]

     Overall, the Spring kernel uses new approaches to

ensuring that an operating system does not contribute to the

missing of deadlines. It lends itself to providing the

designer with capabilities they can use to ensure tasks meet

their timing constraints. It does this through new

approaches to how tasks are scheduled. This type of

scheduling algorithm is typically not found in many of

todays production RTOSs. These new approaches make the

operating system more deterministic in its approach to

executing processes. This allows designers to better gauge

how their applications will perform when implemented on the

target platform and helps ensure that those tasks specified

as critical will always meet their deadlines. This also

helps avoid the process of trail and error that must

sometimes be done when using current RTOSs because of

various operating system features which get in the way of a

task meeting a deadline or the inflexibility of the system's

scheduling policy. The Spring kernel considers all task

times and resource requirements and has the capability of

ensuring the resources are available when they are needed so

effective task guarantees can be made.

     Another issue which impacts the predictability of a

real-time system is how an operating system deals with

notifying the application software when deadlines can not be

guaranteed and what facilities are available to the

application to respond to this notification. There are a

number of ways for an operating system to handle this, and

the level of interaction between the operating system and

the application varies between operating systems. For

instance, Real-Time Concurrent C provides an else clause for

exception-handling that is invoked by the operating system

if a guarantee can not be provided. There are other useful

approaches. But the particular approach is not important as

long as there is a mechanism for the operating system to

know the resource requirements and priority of tasks as well

as a way for the application to know if faults occur.

[Natarajan 18-20]

Resource Management.  One other issue that RTOSs are facing

is Time Driven Resource Management. When many tasks are

waiting for access to a shared resource, the allocation

policy is typically to provide access in first-in-first-out

order where time constraints are not concerned. Time driven

policies need to be developed to make resources available in

time for each action of the sequence. In static systems,

timing requirements can be derived analytically as the

application has predefined requirements. In dynamic systems,

requirements and resource availability vary dynamically. So

the responsibility of matching resource availability and

requirements is typically shared by the operating system and

the application.

     Resource reclaiming is another issue arising as a

result of dynamic scheduling of tasks with respect to their

worst case computation times. In order to guarantee that

real time tasks will meet their deadlines in the worst case,

most real time scheduling algorithms schedule tasks with

respect to their worst case computation times. Resource

reclaiming refers to the problem of correctly and

efficiently utilizing resources left unused by a task when a

task executes less than its worst case computation time, or

when a task is deleted from the current schedule.

     One straightforward approach to resource reclaiming is

to reschedule the remaining tasks upon each task completion.

To maintain predictability, resource reclaiming can be

incorporated into the worst case computation time of a task.

Most scheduling algorithms have time complexities that

depend on the number of tasks to be scheduled. Thus they are

not suitable for rescheduling tasks to reclaim unused time.

Therefore, one of the challenging issues in designing

resource reclaiming algorithms is to reclaim resources with

bounded overhead, in particular, overhead that is not a

function of the number of tasks in the schedule.

     As stated above, resource reclaiming is implemented in

the Spring Kernel. When tasks arrive in a dynamic real time

environment, the operating system guides the tasks through

two identifiable phases, the results produced by one are

used by the other.



1. Scheduling phase - determines the feasibility of a set of

tasks given their worst case constraints and the current

system state, and the system then generates a feasible task

schedule.



2. Execution Phase - given a feasible schedule, the system

can reclaim resources and dispatch tasks. The resource and

the processor time reclaimed in the execution phase can be

used in the scheduling phase to produce a new schedule when

new tasks arrive. However, the task and system

characteristics impose certain restrictions on the way

resource reclaiming is done.



     Resource reclaiming in multiprocessor systems for tasks

with resource constraints is much more complicated, due to

the potential parallelism provided by these systems and the

potential resource conflicts among tasks. When the actual

computation time of a task differs from its worst case

computation time in a non-preemptive multiprocessor schedule

with resource constraints, the resulting run time

abnormality may cause some of the already guaranteed tasks

to miss their deadlines [Ramamritham 42]. There are resource

reclaiming algorithms like Basic Reclaiming and Reclaiming

with Early Start which take care of these issues and are

implemented in the Spring Kernel.

Conclusion.  Many of the aspects of RTOS that have been

discussed here can be applied to non real-time operating

systems as well. However, improvements in these areas are

often driven by the necessity for improvements in RTOS. For

instance, the modular design of RTOS that is necessary for

RTOS to be flexible enough to execute on a variety of

hardware configuration is a desirable design issue in any

operating system. Yet, many older operating systems were not

modular. UNIX is such an operating system that is also still

very popular in non real-time systems. UNIX provides several

features that are sometimes useful in RTOS. However, UNIX is

not modular enough to easily adapt to many hardware

configurations used in real-time systems. UNIX does not

provide other features necessary to be fast enough to meet

timing constraints in real-time systems and is not very

predictable. On the other hand, some features of RTOS may

not be useful in general purpose operating systems since the

goals of a general purpose system.

      Real-time versions of UNIX have been developed

nevertheless. Part of the motivation for the development of

the POSIX standard may be due to this need for sharing of

ideas between general purpose and real-time operating

systems. POSIX defines standards for UNIX including

standards for real-time versions of UNIX. Standards such as

POSIX may encourage the sharing of ideas between real-time

and non real-time operating systems.

     Major innovations in RTOS are developing because real-

time systems are being applied to more complex problems

where it may be very difficult to guarantee deadlines. The

concepts discussed earlier about the Spring kernel represent

some of the innovations in this area. The Spring kernel is

part of a research project and is not commercially

available. Many commercially available RTOSs have not

incorporate these innovations yet.



References:


S. Cheng. Scheduling Algorithms for Hard Real-Time Systems -
- A Brief Survey, Spring project conducted at the University
of Massachusetts, pp. 150 - 167, July 5, 1987.

D. Hildebrand. An Architectural Overview of QNX. In
Proceedings of the Usenix Workshop on Micro-Kernels & Other
Kernel Architectures, Seattle, April 1992.

K. Marrin. Multithreading Support Grows Among Realtime
Operating Systems, Computer Design, pp. 77-88, March 1993.

C.Shen, K.Ramamritham and J.Stankovic. Resource Reclaiming
in Real-Time, Proceedings of Real-Time Systems Symposium,
IEEE, pp.41-49, December 1990.

C. Small. Small real-time systems coordinate tasks over tiny
local nets, EDN, PP. 41 - 44, November 25, 1993.

J.Stankovic and K.Ramamritham. Hard Real Time Systems, In
Tutorial Hard Real-Time Systems, IEEE, pp. 1-10, 1988.

J. Stankovic and K. Ramamritham. The Spring Kernel: A New
Paradigm for Real-Time Systems,  IEEE Software, pp. 62-72,
May 1991.

N. Swaminathan and W. Zhao,  Issues in Building Dynamic Real-
Time Systems,  IEEE Software, pp. 16-21, September 1992.

T. Williams, Real-time Operating Systems Struggle with
Multiple Tasks, Computer Design, pp. 92 - 108, October 1,
1990.


