Original file date: 8-28-89
Published in Computer Language in November (cover feature).

BEGIN HEADER
A C++ Pre-emptive Multitasking Kernel
by John M. Dlugosz
rough draft
END HEADER



Last October, I presented the source code for a simple multitasking kernel 
in C.  The article has proved enormously popular, and has prompted this 
sequel.  The source I present now is different in two major respects, as you 
may have gathered from the title:  It is it supports pre-emption, and it is 
written in C++.

Those of you who read last year's article saw how simple a non-pre-emptive 
system could be made.  The task switching function had to simply save the 
stack pointer and load a different stack pointer-- the normal function setup 
and return sequence made sure the proper registers were saved, and not all 
the registers needed to be saved since the function call operated as a "choke
point" where the compiler assumes that the general purpose registers will be 
scrambled.  Further more, because processes only gave up control when they 
were good and ready, there were no race conditions (more on this later).

Allowing pre-emptive tasks opens up a whole new can of worms.  Taking care of
the problems of pre-emption is the bulk of the work in the library.  This 
article explains the pitfalls of pre-emptive systems and demonstrates
solutions for them.

First some background and preliminary design work.  Multitasking systems can 
improve the efficiency of the computer.  This is because the CPU can busy 
itself with other tasks while waiting on IO devices.  This requires a 
multitasking operating system, and cannot reasonably be done a part of an 
ordinary program (serial IO is a notable exception).  The other purpose of 
multitasking, the one addressed by this article, is to simplify writing
programs.  Many programs are best thought of as concurrent processes, so why 
not code them as such?


HEADINGB The Core

C++ is a great language to write this library in.  Some of the thorny 
constructs can be neatly encapsulated into a class.  The constructors and 
destructors can be used to full advantage to make sure features are used 
properly.  But there are some things that need to be done that cannot be 
expressed in C++:


 - Switching processes.
 - Assigning functions to be called on a hardware interrupt.
 - Allowing mutual exclusion in the face of pre-emptive multitasking.

All three, taken separately, are simple to do in assembly language.  Once 
these primitives are available, the entire library can be written in C++.  
Some additional functions are written in assembly language for practical 
reasons.

Although the code presented is for MS-DOS, it is easily adapted to other 
systems.  Once the hardware dependencies are hidden, the classes in the 
library can be used without knowledge of the underling hardware.  Even on 
systems with multitasking supported by the operating system, you could write 
these classes in terms of the available system calls.  Then you would be 
able to write multitasking programs that are portable across platforms, when
programs that rely on multitasking are among the most difficult to port.  
Most of this is a set of abstract multitasking classes.

Obviously, the main part of multitasking is being able to switch between
tasks.  A task switch entails saving the state (everything that is unique to 
that thread) and restoring another.  This is done in two distinct ways: as 
part of a pre-emptive task switch, and as a voluntary giving up of control.  
Because the stack layout is different in these cases, two different functions
are used.

Because this function is run as a result of a hardware interrupt, the 
environment at the time of the call is completely unknown.  No part of the 
function can be written in C++ because the environment assumed by the 
compiler (values in the segment registers, etc.) may be incorrect.  In fact, 
we don't even know if the stack is big enough, as the interrupted code may 
have been near the limit of its stack.  This makes writing such a function a 
particular challenge.

However, I want to write the complicated scheduler functions in C++, not in 
assembly language.  There is a simple, but slightly sneaky way to do this.

The hardware timer tick interrupt saves the state, and restores a different 
state.  But it does not decide which task to return to.  It always returns 
to the same task-- the one running the scheduler.  The scheduler, written in 
C++, decides what to run next and calls another assembly language function to 
transfer control.

All the assembly language functions are in PRIM.ASM.  This is how the task 
switch works:  The hardware timer interrupt occurs.  All registers except 
the stack pointer are pushed on the stack.  The stack pointer is placed in 
a static variable.  The stack pointer is than loaded with the stack of the 
scheduler, and the necessary registers restored.  The stack is set up so the 
old timer tick function can be called, and its IRET will return to a 
function called to_scheduler().  So the old interrupt code is jumped to and 
it pops out in to_scheduler() which cleans up the stack and returns-- 
returns to the scheduler, not the interrupted code.  It returns the value of 
the suspended process' saved stack pointer, so the scheduler can resume to 
it later.  The hardware timer tick suspended the code in progress and ran 
the scheduler.

When the scheduler finds a new task to run, it passes the saved stack pointer
to from_scheduler().  It switches to that stack, pops the saved registers, 
and returns (with an IRET) to the point where the timer tick interrupt that 
suspended it had occurred.

For non-pre-emptive switches, a process calls task_yield().  It is similar 
to what the interrupt handler does, but is simpler.  It saves the registers 
and jumps to to_scheduler().

HEADINGB Tasks

The machine language code to handle switching tasks is in place.  How do you 
start up a task in the first place, and how do you get rid of it when it is 
finished?

A task is abstracted in C++ with `class task'.  A global variable 
`current_task' points to the currently running task.  The constructor creates
a new task, the destructor destroys a task.  Other functions to deal with 
tasks can be members of class task.

Lets take a close look at the structure of a task.  First, class `task_head' 
is defined.  This is nothing but a pair of pointers, and methods to insert 
and delete such a structure in a linked list.  The class task is derived from
class task_head.  This way the head of a list of tasks can be dealt with out 
a lot of special cases.

There are several other classes all related to the task_head.  This shows 
some good C++ techniques for making sure the compiler enforces how I want 
an object to be used.  The listing of TASK.HPP explains it in some detail.

The task class itself contains storage for the saved stack pointer, which is 
what is needed to resume a task.  It also contains information needed by 
other parts of the system that need to be associated with a task.

In October's article, a task was started with a detached function call.  
What was novel about it was that the function could take a parameter, and 
returning from that function was a well-defined condition.  Both of these 
are are present in this new system as well.

The constructor of task takes parameters stating what the function to start 
up is, what its parameter is, where its stack is, and how big it is.  The 
constructor needs to set up the saved registers with the proper SS,SP,CS, 
and IP so the function will be called when this task is switched to, and the 
DS and Flags registers need to hold the correct values too.  The task's 
stack needs to be set up as if the function had just been called, with its 
parameter and a return address.  When that function returns, the function 
pointed to by the return address is called.

In October's code, allowing the detached function to return was a 
convenience and a debugging aid, and rarely done.  In C++, though, there are 
reasons to take more advantage of this.  If the task has any variables that 
require destructors, killing the process will not execute them.  This can 
lead to lost memory or worse.  In order to fully clean up, all functions must
complete normally and return to the caller.  Finally, the function that
started it all is reached.  What if it has destructors?  The natural thing 
is to have that function return too, and make that the well-defined way to 
terminate a process.  So, the return address is set up to a piece of code 
that accomplishes this.

There is one way to create a process-- the constructor for class task.  There
are two ways to destroy a process-- internally, when the root function of 
the process returns, or externally, by calling the destructor.  This is 
convenient because it allows auto variables of type task, and all child 
tasks will be terminated when the parent function returns.  You can also use
the `new' and `delete' operators for tasks.

If a task is destroyed by another task, it still needs to be shut down 
properly.  A flag in the task structure states that a task must be allowed 
to shut itself down rather then just being scrapped and never run again.  
If that is the case (which it will be unless you specify otherwise), then 
killing a task sets a 'killed' flag, disables pre-emption, and runs that 
task next.  That task should take note that the kill flag is set, and proceed
to shut itself down.  It can return all the way out, or it can call 
task_yield() when it is finished.  A task in its "kill run" will not be 
interrupted until it is finished.

Let's give this thing a spin.  The TESTS listing shows a simple program that 
uses the multitasking ability.  Notice how easy it is to set up.  Two 
functions to be executed at the same time are defined.  One prints 
"I'm Here!" over and over again.  The other one just counts.  You see, if 
task 1 is pre-empted while in DOS, task 2 could not output anything.  So 
task 2 just kills time.  When the loop is over, task 1 tells task 2 to shut 
down using a shared global.  Returning from the function shuts down the tasks.

The main program defines the two tasks, and call scheduler().  This is how 
the system is used-- after at least one task is defined (it could be static) 
the program calls scheduler() which shares time among all tasks.  When there 
are no ready tasks, scheduler() returns.

HEADINGB Semaphores

A multitasking program needs some way to coordinate efforts between its tasks.
Imagine a program stores data in a binary tree.  One task is inserting a node.
It is in the middle of the operation, and its time slice is up.  Another task 
now reads data from the tree.  But, the tree has dangling pointers because of
the uncompleted insertion.  This illustrates a big problem with pre-emption, 
and it pops up all over the place.

As a tentative solution, try having a flag variable.  If the flag is TRUE, it 
means some process is using the tree.  When a process wants to use the tree, 
it checks the flag.  If it is FALSE, it sets it and proceeds.  If it were 
already TRUE, it waits.  Looks easy, but there are two problems with this 
solution.

First, it will not work.  Pre-emption throws a monkeywrench in the plan.  Say 
a process wants to use the tree.  It reads the flag and finds it FALSE.  Now 
it is pre-empted.  Another process wants to use the tree, checks the flag, 
finds it FALSE.  Because this process just started its time slice, it can go 
quite a while before it is pre-empted, and it continues.  It sets the flag 
to TRUE and proceeds to use the tree.  After a while, it is pre-empted and 
the first process is resumed.  It sets the flag to TRUE and proceeds to use 
the tree, even though the second process is already using it.

This is known as a "race condition" and illustrates the main problem with 
pre-emption:  between a test and the next statement, something may happen to 
change the state of the system.  The flag variable is subject to the same 
problems as the tree itself was.

The entire read-modify-write cycle of the flag needs to be done in one 
unbroken operation.  This needs the cooperation of the multitasking system 
itself (actually, there are ways this can be done without special code, but 
they are very lengthy and clumsey).

Even if this were taken care of, the flag method is undesirable because of 
"busy waiting".  Say a process was waiting to use the tree, and had a 
statement similar to 
     while (tree_in_use) {/* wait*/} tree_in_use= TRUE;  
The process using the tree and this process waiting for the tree are 
executed in a round-robin fashion.  So periodically the process using the 
tree is pre-empted in favor of the process waiting for it to finish!  This 
is an absurd waste of time.

Because a flag variable was not good enough, the semaphore was invented.  
The semaphore overcomes the two points mentioned above.  In addition, it is 
generalized a bit more as to use an integer value instead of a boolean.  
This way the semaphore can be used to let n processes access, instead of 
just 1, where n is set when the semaphore is created.

Lets try the example again using a semaphore instead of a flag.  A process 
wants to use the tree, so it sends a `wait' message to the semaphore.  The 
`wait' method sees that the value is 1, so it decrements it to 0 and lets 
the process continue.  Later, another process wants to use the tree.  When 
it sends a `wait' to the same semaphore, it is already 0.  The process is 
suspended, and no further time is wasted on it until the semaphore is ready.
When the first process is done with the tree, it sends a `signal' message to 
the semaphore.  The `signal' method increments the value, and if there is a 
waiting process it is resumed.

When the process is resumed, it decrements the semaphore value and claims it 
for itself.  This means that in order to work properly the signal message 
must transfer control immediately to the newly unblocked process.  If it 
simply made it ready, it would be possible for some third process that got 
control first to see the value of 1 and claim it for itself.  This is gotten
around in a slightly different way:  Incrementing the value just so the 
restarted process can decrement it is not too efficient anyway.  So the 
`wait' message always decrements the value, and negative values show how 
many waiting tasks there are.  A `siginal' will increment the value, but 
will not bring it up to 1.  When the other task gets control later, it can 
assume that the semaphore now belongs to it-- since that is the only thing 
that could have restarted it.

So signaling a semaphore does not transfer control directly to the newly 
unblocked task.  Rather, a task is unblocked and placed at the end of the 
ready list.  The first process continues until its time is up.  Some time 
later, the process that was waiting is run again, and proceeds.  This allows 
both sender and receiver to keep their proper priorities.

The semaphore is implemented as `class semaphore'.  The constructor takes an 
argument that sets up the initial value of the counter.  Rather then using a 
default argument (of 1), a second constructor with no parameters is provided.  
This is to allow arrays of semaphores.  The constructor is rather simple, and 
it does not depend on anything else in the system, so you can have static
semaphores without worrying about what order the constructors are called in.

When a semaphore is destroyed, what happens to the processes waiting on it?  
One solution is to simply kill all waiting tasks.   But, in C++ you want the 
task to shut itself down properly and the "kill run" was introduced.  This 
paves the way for a different solution.  There are cases where a waiting 
task needs to be restarted, even though no `signal' was sent.  This is 
known as a `fault'.  The `wait()' method returns a boolean TRUE for a normal
condition, and FALSE if it is an abnormal restart.  Rather then killing the 
task, a `faulted' flag is set and the task itself can decide if that means 
it should shut down or whatever.  The semaphore wait() will return FALSE is 
the task is killed and started on a kill run, or if the semaphore was 
destroyed, or if some joker called unblock() directly.

The signal and wait methods require immunity to pre-emption.  The way to do 
this is to use a semaphore, right?  So how do you writethe semaphore code?  
Now we get to the heart of the problem.

The task-switcher uses a flag variable.  If the flag is non-zero, pre-emption
is disabled.  Now you can't just set it at the beginning of the function and
clear it at the end of the function since the flag may have already been set.
Rather, the flag is incriminated at the beginning of the function and 
decremented (restoring its original value) at the end.  Now, if 
incriminating the value is a read-modify-write operation, race conditions 
will crop up.  Imagine what would happen if the process were pre-empted 
after the read and before the write.  Looks like we are back in the same boat.

The solution is to use a CPU instruction that can increment the value in one 
unbroken operation (decrementing does not have the same problem.  Think about
it.).  On the PC, the INC instruction is available.  On RISC machines, there 
may be no such instruction and more elaborate schemes are needed, such as 
disabling interrupts or using a special instruction supplied for this
purpose.  Simply writing `preempt_inhibit++;' will probably generate the 
proper instruction.  But just to make sure the "optimizing compiler" did not
do something strange, I coded a little bitty function in assembly language 
to do this.  If you don't want the calling overhead for such a simple 
function, you can replace the call to increment_preempt_inhibit(); with 
the statement preempt_inhibit++; and check the generated object code to make
sure it generated suitable code.

Another reason for making a little shell function is because I want the 
variable to be in the Code segment so the interrupt handler can access it 
without changing any registers, and Zortech cannot handle `far' data items.

So a pair of function, preempt_off() and preempt_on(), are used by the system
on a primitive level.  They can also be used in your program for 
time-critical tasks that should not be interrupted once they start.  But you 
should avoid this in favor of using semaphores or non-pre-emptive tasks 
whenever possible.

Notice that most places that require pre-emption immunity do not use these 
functions directly.  Rather, a dummy class is used that defines a region free
from race conditions.  A similar class is used for semaphores, as described 
below.

HEADINGB Structured Semaphores

A semaphore is has more general purpose use then in the simple example about 
the tree.  They are quite useful.  But, you can see what would happen if 
they were misused.  If a wait was left out, race conditions could occur.  
Worse, when the matching signal was issued the count would be off.  If a 
signal was left out, the processes could remain blocked forever.  Using 
semaphore operations in loops or conditional statements, or putting the
wait and the signal in lexicly distinct locations can cause very hard to 
find bugs.

There have been many approaches to using a structured statement instead of 
the primitive semaphore.  An interesting device called a "monitor" (I think 
this term is vastly overused.  A monitor can mean anything now).  A monitor 
is a section of code containing procedures and data (hmm.  sounds like a 
class).  The data can only be accessed by the member functions.  The 
interesting part is that only one thread of execution can exist inside the
monitor.  The compiler takes care to assure this is the case, by generating 
semaphore calls at the entry and exit of each function.

Returning to the tree example, if a monitor was defined which contained the 
tree, all functions that access the tree must be members of the same monitor.
But only one line of execution is allowed in a monitor, so if task one is 
inserting into the tree and is suspended, task two will be blocked when it 
calls a function (found in the same monitor) to read from the tree.  The
details are hidden from the programmer.

The monitor is not very popular, since it requires extending the semantics 
of the language.  But, C++ is not just any language.  Lets take a look at 
the crucial aspect of the monitor:  the semaphore always has balanced 
wait/signal calls, and a shared data item cannot be accessed without 
automatically requiring mutual exclusion.

What can be done in C++ to bring mutual exclusion to a higher conceptual 
level?  Enter the automatic semaphore, class `resource_lock'.

The purpose of class resource_lock is to assure balanced wait/signal calls.
It brings scoping rules to the use of locking semaphores.  Boiling this down 
to its essence, you get the simple class in listing n.

The actual semaphore is defined elsewhere.  A semaphore is defined to go 
with each resource.  Creating an object of type resource_lock as an auto 
variable causes the resource named as the parameter to be locked (granted 
exclusive access to) while the variable is in scope.  The constructor and 
destructor is responsible for this action.

This is an extremely useful device.  It succeeds in bringing a higher level 
of abstraction to locking.  Rather then knowing to use a semaphore in a 
certain way, a high-level resource lock structure is used.  It can only be 
used in a structured way-- a `break' or `return' or `goto' will still get 
the resource unlocked.

The resource lock is similar to the monitor structure in principle.  But, 
the lock does not have to be around an entire function.  The lock can be 
placed to affect only the critical section of code.  C++ programmers are 
used to a distributed scope, as with member functions do not have to be 
grouped together lexically.  But this mechanism is deficient in one respect.
The compiler does not enforce that the lock must be used before the data is 
accessed.  A function, for example, could monkey with the tree without 
locking it first.  Even so, it does make programming clearer and reduce
errors.

The second point-- ensuring mutual exclusion before access-- can still be 
done.  In a well structured C++ program, data is abstracted with the 
operations performed on it into classes.  Rather than just any old function 
inserting something into the tree, an `insert' member function would be used.
Only the code of the insert method would be doing this.  When you create 
these classes, you can include a semaphore as part of the class, and put 
locking calls in the methods that manipulate the class's data.  Now, any 
function calling the `insert' member of a tree will be assured mutual 
exclusion from another process calling scan() on the same tree.  Since only 
member functions have direct access to data, only member functions should 
have to worry about race conditions.

The second example in TESTS shows the use of semaphores.  This time, 2 tasks 
are both printing.  They use a semaphore to get exclusive access to DOS.  
The first uses semaphores directly, and the second uses a resource_lock.

HEADINGB new and delete

Memory is a resource too, and subject to race conditions.  What happens when 
a task is in the middle of a malloc() and is pre-empted, and another task 
calls malloc() or free()?  The memory management function must assure mutual 
exclusion on the heap.  

This is easy to do, as you can define your own `operator new' and
`operator delete'.  In Zortech C++, `operator new' just calls `malloc()' 
and has some extra logic to take care of _new_handler(), and `operator delete'
just calls free().  The replacement functions are simple to write, and just 
add a resource_lock to gain exclusive access to the heap.  You need to 
include listing n in your program if any pre-emptive tasks use dynamic memory,
and those problems are solved.

HEADINGB Messages

Now that processes can coordinate with each other for purposes of sharing
data, they can communicate with each other.  For example, one process can 
write data in a buffer (after getting exclusive access via a resource_lock) 
and another task can read this data and act on it.  A good multitasking 
system will have direct way of doing this that handles worst-case conditions.

There are many ways to implement messages.  Messaging systems can have a 
variety of features.  Since pipelines are also available (described in next 
section) I want something rather simple.  The message class in MESS.HPP is 
simply a shared variable with access controlled by a pair of semaphores.  
The message buffer is a pointer.  One task places a value in this pointer, 
and another task reads it.  This is a coordinated process, as the reader and
writer meet at a point to exchange data.  This is sometimes called a 
rendezvous.

Since the memory is owned by the sender, the sending task cannot continue 
until the message is received.  After all, the memory may be an auto 
variable or even a compiler generated temporary, or a value that will be 
reused, and it is not safe to continue until the reader has copied it or 
finished with it.

So the sender must be blocked until the receiver is done with it, and the 
sender must be blocked before sending if there is already a message pending.  
The reader must be blocked if there is no message being sent.

Two semaphores are needed.  One prevents the sender from continuing until the
reader gives it the go-ahead:  The writer does a wait() and the reader does 
signal() when he is done with the message.  You also need another semaphore 
to block readers until a message has been sent.  Another issue is that I 
don't want a write to go through if there is already another message pending.
This can be done with the same semaphore as the writer is already using.

The implementation is in MESS.CPP.  The sender uses put() and is blocked 
until the message is read.  The receiver does a get() and uses the value, 
then does a done() to unblock the writer.  It is very important to remember 
to put in the call to done(), or you can leave a task blocked forever and 
the message variable locked up.  Besides put() and get(), you can use 
operator<< and operator>> as well.

Test 3 shows how several tasks can share information using themessage class.  
Two tasks are sending messages and one task is receiving them.


HEADINGB Pipelines

A pipeline is a bit different from a simple message.  A pipeline can be 
treated as a message queue.  Each send operation is buffered, and the sender 
is not blocked until the buffer is full.  Likewise the receiver can read 
from the buffer, and is not blocked until the buffer is empty.  This is a 
useful device for produces/consumer type problems, and as a buffered message 
system for servers and clients when no rendezvous is needed.

A pipeline has two ends, one for input, one for output.  In October's 
article, a pipeline could be written to or read from by anyone who knew of 
the pipe variable.  An alternative is to separate the two ends into 
different variables.  Then, a function may know about the input end but not 
be able to access the output end, or vice versa.  A larger advantage is that 
the pipe can be cut into two pipes, and two pipes can be joined together.
(cut() and splice() is not implemented here for the sake of brevity)

The two ends of the pipe must somehow refer to the same pipe.  The obvious 
way is to declare two variables, one for each end, and make a call to create 
the pipe that initializes both variables.  But this is against the spirit 
of C++, in which it is desirable to initialize a variable when it is defined.
Another solution is the one used in October's article, where pipes are opened
by name by two different processes.  Both ends that were opened with the 
same name referred to the same pipe.  In that implementation, the two ends 
so opened were interchangeable-- either could read or write.

A similar system could be used here.  But there is another way to manage a 
pipe.  If each process opens one end of a pipe, who closes it?  If one side 
closes it when it is done, what happens to the other end?  With separate 
types for each end, you could enforce a rule that the write end closes a 
pipe, and implement it so the read end still works until the end of the data,
at which point the pipe is actually destroyed.  Opening by name is a viable 
approach.

There are complications though.  What happens if the read end is destroyed 
first (or goes out of scope)?  Another approach that bypasses many problems 
is to create the pipe as a whole.  Then, each end is given to the process 
that uses it.  The pipe and the processes can all be created by the same 
function.  When both processes are complete, the parent function regains 
control and the pipe is destroyed by it.

This is the approach used.  A whole-pipe variable contains both ends, so a 
pipe can be created with a single definition.  The constructor creates the 
pipe, and the destructor gets rid of it.  The two ends can be separated, if 
desired, and sent their own ways.  Notice that this system can coexist with 
opening each end with a name.  If you are interested in this, see October's
article.

Four semaphores are used.  One is used to allow only one reader at a time, 
another to allow one writer at a time.  A third is used by those two tasks
to get exclusive access when data is being transferred.  The forth is used 
when a read (or write) is only partly complete and needs to be restarted 
after the other end has supplied data (or made room).  The listing of 
PIPE.CPP explains their use in detail.

You can have more than one task trying to read from or write to a pipe at 
the same time.  One request will go through and the next task is given a 
turn.  Semaphores take care of all the detail.  Notice that you can transfer 
larger blocks than fit in the pipe's buffer.

The last example shows pipes, and is also arranged differently than the 
others.  One task creates a pipe and the tasks that use the pipe.  When both 
writers are done, the tasks are destroyed.  If the reader is destroyed 
before the pipe, it would finish up printing anything left in the pipe on 
its kill run.  If the pipe were destroyed first, a fault occurs when 
receive() is called and the task quits.  Likewise, if the pipe is destroyed 
when the task is waiting for a writer to put something in the pipe, a fault
occurs in the middle of the read.  This is what actually happens in this case.

HEADINGB What Else?

This library is certainly not exhaustive.  First, priority levels are not 
implemented.  The pick_task() function is used by the scheduler to decide 
what to run next, and you can do plenty of experimentation with it.

Second, event flags are not covered.  Imagine you have a task editing a 
field, and other tasks doing other things.  The editing task gets a time 
slice, sees that there was no key pressed, and yields.  You can eliminate 
this busy waiting and improve response time by making the scheduler 
recognize that a key was pressed and running that task next, and not running 
it until then.  I hope to cover both of these topics (as well as some real 
programs) in a future article.

The library is water tight.  It is not air tight, and certainly not 
liquid-hellium tight.  There are still plenty of places where race 
conditions can occur.  In particular are constructors.  An object should be 
fully constructed before any task tries to use it.  This can be done by 
having one task that creates all shared objects an then all tasks using 
them, or by using semaphores.

For destructors, similar problems apply.  The pipe destructor is interesting,
though.  You can follow the example and have the destructor get an exclusive
lock on an object before destroying it.  That does not help any task that 
tries to access the object after it has been destroyed, though.

Like I said, pre-emptive multitasking can open a whole new can of worms.  
When misused, they can cause more problems then they solve.  This library 
does have many practical uses, as well as demonstrating some nice C++ 
programming techniques.

This text concentrates mostly on design, and the listings explain 
implementation in some detail.  They were meant to be read, and a 
lot of detail not in the article is in comments in the listings.

