Originally Published May 2000
The correct choice of a software architecture and its supporting operating system facilitates development and clears a path to the marketplace.
Today's software developers are faced with a serious challenge—how to produce a safe and reliable product in the shortest possible time frame. This is not a new problem; it has simply been exaggerated in recent years by pressures from the marketplace, and the medical device manufacturing industry certainly is not immune to those pressures. Many solutions have been sought by manufacturers, including throwing large budgets into software development tools and manpower. In some cases these approaches have worked, but often they have not.
This article shows that the software system architecture, coupled with the proper choice of an operating system (OS) that supports this architecture, is the key to faster development and more-reliable system operation in the field. Throughout the article, references to the word process generally refer to the concept of a software process (an instance of an executing program) rather than a medical process or treatment.
In addition, the term system will refer to a medium or large embedded system; typical examples might be a patient-monitoring system, a dialysis machine, or an MRI scanner. While many of the techniques presented here lend themselves well to systems of any size, several could not be applied to a smaller system such as might be implemented on an 8051 microcontroller. The hypothetical software example presented is designed with the idea that it will likely change in the future—few products today are only released once and never revised. Thus, the architecture chosen for the system lends itself well to future modifications.
To achieve faster integration, manufacturers must break their system problem into small components that allow for integration. Among its many advantages, componentization
While the advantages of breaking a system into components may be obvious, the proper software architecture to represent these components, and the communication mechanism between them, are not.
When designing a system, consideration must first be given to the medical task the system is intended to perform and the processes required to accomplish it. This information will dictate what types of hardware are required (e.g., what types of sensors), and what safety requirements will be placed on the system. This portion of the development forms the overall specification phase of the project and, though important, is beyond the scope of this paper. At some point during this decision-making process, however, design of the software system must begin. This software design should begin as early as possible so that the software's impact on the development of the overall system can be understood. For example, a manufacturer often selects hardware from among similar devices that will not have a bearing on the medical process or system cost. If the hardware is chosen blindly (i.e., without software consideration), however, equipment might be purchased that could considerably lengthen the software development time.
The first element to be considered in designing a software system is the choice of a proper software architecture—this element has serious ramifications on how easily, and how quickly, the system will be integrated later. While this decision should be based on factors such as easing system complexity and maximizing system responsiveness, it is often based solely on the prior experiences of the designers and the assumptions they hold regarding the weaknesses of various architectures. To design a system properly, however, each type of architecture must be looked at fully to understand its real strengths and weaknesses, and how it might fit with the system being designed.
Three basic software architectures exist for a medical device manufacturer's consideration: single-process, control-loop architecture; single-process, threaded architecture; and multiple-process architecture. Each is presented below and will be examined for its advantages and disadvantages. All of the architecture explanations assume the presence of an underlying OS and its associated system services.
Single-Process, Control-Loop. The single-process, control-loop architecture is probably the most well known method of system design. In this approach, the system consists of one large process that is broken down into logical subroutines and coordinated using one large control loop. During the execution of the control loop, events that have occurred since the last loop cycle are examined and processed by calling the appropriate set of subroutines. In addition to providing a logical breakdown of the system, this architecture offers several other advantages:
Given these advantages, this method would appear to be the ideal architectural choice, and in the case of a simple, small system, this is often true. Unfortunately, few systems today are either simple or small. When a system becomes even moderately complex, many problems can occur when attempting to integrate the components from this architecture model, even though each component may have worked perfectly during isolated development and testing. Difficulties include the following:
Single-Process, Threaded. Under the single-process, threaded architectural model, the system is coded as one large process with multiple threaded paths of execution throughout its code base. Threads execute independently of each other, and each thread has the ability to block while waiting for an event to occur without interfering with the execution of other threads. Thread execution scheduling is the concern of the OS, not the programmer.
In addition to the advantages enjoyed by the single-process control loop, threaded processes offer the following advantages:
On the other hand, threads do present their own problems. Like single-process control loops, thread-based architectures often fail during integration even though each component may have tested perfectly. These problems can often be attributed to the following factors:
Multiple-Process. Under the multiple-process model the system is coded as a series of separate, cooperating processes. Like threads, processes execute independently of each other, with each thread having the ability to block when waiting for a specific event to occur without interfering with the execution of other processes in the system. In fact, a process is defined as being made up of one or more threads, though support for more than a single thread is OS dependent. Although many items are similar, what makes this model different from the single-process, multiple-thread architecture is the protection that it provides: the memory space for a process is completely separate from that of all other processes.
In addition to the advantages enjoyed by single-process models, multiple-process models have the following advantages:
Of course, multiple-process models are not without potential problems:
Agent Process: A process that performs an action on behalf of another process.
AIO: Analog input/output; in this article the term refers specifically to an analog input/output interface board to measure or set analog device values.
Array Index: A variable indicating an offset into a data array. In this article the concern is that the value of this variable may exceed the actual number of elements in the array, pointing to some random memory space.
Blocked: Refers to a process that is not currently capable of executing; i.e., it is suspended, waiting for the completion of some external event.
Context Switch: The act of switching the processor from the execution of one process to the execution of another. This article is generally interested in the context switch time, or the time it takes to perform this switch; this is used as one measurement of OS performance.
DIO: Digital input/output; in this article the term refers specifically to a digital input/output interface board to read or set digital devices.
Dummy Code: Nonfunctional or partially functional code used for testing the correct operation of other processes or routines.
FIFO: First-in, first-out organization.
Heap space: Global data space of a process.
Kernel: The core component of the computer OS; this component is always loaded into the physical memory (RAM).
MMU: Memory management unit. This article refers to the MMU's ability to assign physical memory (RAM) to a particular process and prevent other processes from accessing this memory.
Mutex: A synchronization primitive that provides mutual exclusion to critical sections of memory, such as a data structure.
Pointer: A variable consisting of an address in the memory that contains a value or set of values.
Port: In this article the term is used to refer to a single analog or digital I/O channel.
Process: An instance of an executing program within the computer.
Semaphore: A synchronization mechanism using a counting integer, with the integer representing an abstraction such as a resource counter. A key to this mechanism is that the test of the variable and any modification of that variable must be done as an atomic operation; i.e., one that cannot be interrupted by another thread or process between the two operations.
Stack Space: Area in memory where the process stack is kept. This stack contains context information about each subroutine in a last-in, first-out (LIFO) organization; it includes the routine's return address, the arguments the subroutine was called with, and data local to the invoked subroutine.
State Machine: Formally a finite state machine, this is an abstract machine consisting of a set of states and a set of transition rules for moving between these states.
Subroutine Call Time: Overhead time required to enter subroutine code; i.e., the time it takes to place arguments on the stack and then jump to the code. The time is generally insignificant.
Thread: A portion of a process that can run independently of and concurrently with other portions of a process. Threads are very similar to processes except that they share certain mutual resources, such as the process global data space; processes, on the other hand, are completely independent of each other.
Thread Switch: The act of switching the processor from the execution of one thread to the execution of another. This article is generally interested in the thread switch time, or the time it takes to perform this switch; this is used as one measurement of OS performance.
Token Passing: A method of concurrency or resource control where a thread or process will not take an action (such as sending data) until it receives a "token" (i.e., some mutually agreed-upon signal) from another thread or process. Once the token is no longer needed, it is passed to another thread or process waiting for it, typically in a ring fashion.
The next issue that must be considered in designing a software architecture model is to determine how to communicate between the processes—or, more precisely, between threads in different processes, since a process comprises one or more threads. For purposes of clarity, however, this article limits processes to single-thread entities.
A common mistake is to assume that any OS will provide an adequate method of interprocess communication (IPC) for the task at hand, leading developers to choose their OS based on other factors, such as development tools. While these other factors are important to consider, designers need to remember that the IPC mechanisms provided by an OS must meet the requirements of the task at hand; the use of an inadequate mechanism may force the developer into design choices that can compromise system function or integrity during operation.
Signals. Signals are perhaps the simplest form of IPC. Limited to indicating only the occurrence of an event (signals carry no data payload), a signal acts as a software interrupt to the receiving process, causing it to execute special signal-handler code the next time that process is scheduled to execute. This final point is important: developers may assume that the delivery of the signal is sufficient for that process to be next to execute, but that is incorrect. Process execution is dependent on the scheduling algorithm of the OS, not the delivery of the signal.
The primary problem with signals is that they work as designed—as asynchronous interrupts. This limits their usefulness for general IPC, since one never knows when they will execute and must take steps to prevent data concurrency problems. Additionally, an OS may have a window of vulnerability between the entry point of the signal handler and the rearming of the signal; if the same signal is triggered during this time period, the process will be killed. Signals are useful, however, when a software interrupt is necessary for proper system operation; typical examples of this are user-initiated interrupts (typically via the keyboard) and notification of a math error.
Pipes. A pipe is a one-way, byte-oriented, FIFO communication channel between two processes; multiple pipes can be used if bidirectional communication is required. The problem with pipes is primarily one of misconception: it is often assumed that pipes are completely asynchronous. In reality, pipes can block either the reader or the writer. Writing to a pipe is asynchronous until the pipe is filled; at that point the writing process will block until there is room for more data. Similarly, the reading process will block if there are no data in the pipe to read. If the developer remembers this and realizes that pipes require an intermediate copy of the data to be kept, pipes are a useful IPC mechanism.
Queues. The use of queues is probably one of the most well known methods of IPC today. The concept is simple: each process creates one or more queues into which other processes can asynchronously write messages. Unlike pipes, however, queues are not simply FIFO byte streams:
On the other hand, queues have three primary disadvantages:
Queues can be implemented either internal or external to the kernel. Implementing the queue as an external process, however, offers an interesting advantage to developers: it permits them to replace the queuing mechanism with one customized for a specific task. An example would be a developer who needs to simultaneously notify several processes of a particular event. While he or she could send individual messages to each queue, it would be faster to send one message to a queue mechanism that internally distributes the message to the multiple queues— keeping only one copy of the original message and creating links to the message in each queue.
Shared Memory. Shared memory claims the distinction of being one of the fastest mechanisms to share data between processes. To use shared memory, a process defines a specific region of memory for use. Specification is defined by starting location and size. Other processes may then use this region by either directly receiving a pointer to this memory region or by using an indirect reference provided by the OS (such as a file system name) to eventually receive a pointer to the region. Shared memory provides the following advantages:
The disadvantages of shared memory include the following:
Synchronous Messaging. In addition to being one of the most powerful forms of IPC, synchronous messaging may be the most misunderstood. With synchronous messaging, processes communicate directly with each other via messages; accomplishing this task, however, requires that the processes synchronize with each other (i.e., both processes must be prepared for data transfer) before the transfer of data actually occurs. Only in an ideal case, however, would both processes be ready to transfer data at exactly the same moment, requiring either the sending or receiving process (depending on which is ready first) to block until the other process is also ready, at which point the data transfer occurs. This does not end the transaction, however, since the receiving process may wish to formulate a reply to the sender, so the sending process continues to remain blocked while the receiver processes the message. When that is complete, the receiving process replies to the sender (optionally with data), freeing the sending process to continue execution.
Depending on the OS, data transfer can take one of two forms: either the data is copied directly to a buffer in the receiving process, or a pointer is passed to the data in the memory space of the sending process. The second approach is possible because the first task is blocked and cannot access the data. For the purposes of this article, however, only the data-copy approach will be considered, because it does not permit potentially dangerous random access to the memory space of another process.
When first evaluating synchronized messaging, many developers dismiss it out of hand because of what they perceive as problems in its use. Arguments against synchronized messaging are generally made along three lines: message passing incurs more overhead than other forms of IPC, bidirectional communications are impossible due to deadlock, and the blocking of the sending process presents numerous insurmountable design problems. While these problems are possible given an inappropriate system architecture or an improper implementation of the IPC mechanism by an OS, none of these arguments should deter developers from at least evaluating synchronized messaging in relation to their design. In many cases, using the proper design techniques for synchronized messaging—in combination with other appropriate IPC mechanisms—can lead to easier implementation, better performance, and increased system reliability.
The first argument against synchronized messaging is that it is too slow—not only must the data be copied between processes, but the OS must also context switch between the processes. Both statements are true, though they fail to consider the impact of the OS itself. For example, at least one operating system, running on a Pentium 233-MHz processor, can perform a full context switch in 1 microsecond—faster than some operating systems can thread switch. Message transfer time can be equally fast, transferring a 20-byte message (a fairly typical message size) in an additional 1 microsecond. In a single transaction between two processes consisting of a 20-byte message and a 20-byte reply, 4 microseconds of overhead has been incurred—generally an insignificant time period in the overall scope of system timing requirements (two context switches would be involved here, the first from the sending to the receiving process, the second switching back to resume work). For small messages, message copying can actually be more efficient than a shared memory operation, depending on the OS. Additionally, each process has its own copy of the data to process, eliminating the potential contention and delay that can occur when using a single shared copy.
Does this mean message passing is always the best approach? Certainly not in the case when large amounts of data are involved—for example, a megabyte of graphics data—since copying all of this data would consume too much time. In this case, shared memory makes more sense. While the passing of large amounts of data may be prohibitive in many processes, a valuable alternative may be sending a message containing an offset that points to the data location in shared memory.
The second argument against synchronized messaging involves the problem of bidirectional communication: if two processes send a message to each other, the result will be deadlock. This does not mean that bidirectional communication is impossible; it simply means that it must be designed for. Solutions such as message timeout and token-passing schemes are possible, although this problem can also be solved using the IPC architecture itself.
Figure 1. Using auxiliary and central processes to eliminate deadlock.
The simplest method of solving the problem is to involve additional processes, as shown in Figure 1. In the auxiliary process model in the left diagram, process B has a second process associated with it, permitting process A to send its messages to the auxiliary process, and process B to send its messages directly to A. When process B wishes to examine its messages, it sends a message to its auxiliary process to retrieve the messages. No possibility of deadlock exists, as no two processes send directly to each other. Note that the auxiliary process never sends directly to Process B; this could lead to a circular deadlock.
The above solution is cumbersome, however, since the designer must keep track of which processes have an auxiliary task and which do not. Another solution would be to give every process an auxiliary task, but there is an even simpler method. A single task can be used where all processes can send and receive their own messages. This method is the central model shown on the right side of Figure 1. If this type of functionality sounds familiar, it should—it is a queuing process. While this solution (involving the use of a third process) does incur extra overhead, it is versatile and very fast.
The last common argument against synchronized messaging involves the difficulty of designing a system around a mechanism that causes the sending process to block. More accurately, though, the argument should be stated as the difficulty of applying traditional designs in a synchronized messaging environment. By looking at the problem differently, however, possible solutions readily become apparent.
Figure 2. Traditional and modified views of a processing pipeline.
Figure 3. Send- and reply-driven messaging.
A useful example is a traditional process pipeline, as shown in Figure 2. Data are passed from module to module after being processed. A synchronous messaging IPC would be problematic using this design—at some point, each process would block on the process ahead of it and be unable to receive an incoming message, backing up the pipeline. By redesigning the system to introduce agent processes, however, these issues are addressed. The pipeline changes into a set of server processes, while an agent becomes responsible for each element of the process as it passes through the logical, rather than the physical, pipeline. A controlling task also simplifies problems in the original design, such as a processing error that might require an element to skip some or all of the modules in the pipeline.
In fact, designers can use blocking to their advantage by intentionally blocking a process. Designers tend to envision messaging as being send driven—i.e., a client-server type relationship. It is often more useful to think of processes as being reply driven—blocked until the need for activation of the process arises. Both types of messaging systems are shown in Figure 3.
Reply-driven messaging has many uses: the examples from Figures 2 and 3 can now be combined to form a queue-and-dispatch system. In this case, a message would be sent to the master process requesting that a new element be entered into the process pipeline. If an agent process is available, it would immediately be dispatched with this information and begin processing that element. At the completion of all processing, the agent simply sends a message to the master process indicating that its task is complete and that it is again ready for work.
Is it better to use a send-driven or a reply-driven system? Most systems are a combination of the two, permitting designers to use the best mechanism to solve particular parts of the problem.
A final note about a message-passing system: since processes using message passing maintain private copies of a message, nothing prevents the message from being sent across a network as easily as it is sent within a single machine. This feature, of course, would require support from the OS.
Asynchronous Messaging. Some systems define asynchronous messaging systems in addition to queues. For example, the QNX system uses a mechanism known as a pulse to permit the originating process to asynchronously send up to one code byte and four data bytes; the receiving process then reads the message synchronously as a normal message.
Figure 4. Bidirectional communication via pulses.
When combined with synchronous messaging, pulses provide a powerful mechanism for deadlock-free bidirectional communication. As shown in Figure 4, systems can then be designed so that messages flow in a single direction and pulses are used when an event that must be propagated in the opposite direction is generated. If the message is small enough, as in the case of an analog input/output (I/O) driver that notifies a process when a new reading is available, the value may be included directly in the pulse itself. If the message is larger than can be contained in the pulse, however, then the pulse acts as a notification mechanism to let the receiving process know that the pulse originator has data for it. Upon receipt of the pulse, this process sends a message using normal synchronous messaging asking the pulse originator for the data. This type of notification mechanism is also useful for processes that cannot afford to block on a process for an arbitrary length of time, yet still must communicate with these less-trusted processes.
The final piece to be considered in software system design is the selection of the OS itself. When considering which system to select for a project, designers often assume that if the OS supports the selected software architecture, offers appropriate scheduling algorithms, and provides the necessary system services (timers, drivers, etc.), then it is sufficient for use in the product. This belief fails to consider the key elements of easy integration and reliable system operation, which are determined by the architecture of the OS itself.
Flat Architecture. While not all cpus provide an MMU, it is rare to find hardware today for a medium to large embedded system that does not contain an MMU. The reason is straightforward: an MMU, when used properly, can protect each process from typical programming errors, such as a stray pointer. Any attempt to write into the memory space of another process is caught by the MMU and prevented. The problem is that this type of protection requires support by the OS. Many systems, however, provide no support for this MMU feature— though because the systems work on a processor with an MMU it is often assumed that they do. These OSs provide a flat memory architecture.
Figure 5. A flat architecture provides no memory protection.
In a flat architecture, all of the software modules, including the kernel, are folded into the same address space (see Figure 5). This gives any process the ability to write randomly into the memory space of any other process, or even the kernel. When this occurs, a system crash occurs at best, and at worst, a terminally corrupted system is created.
In addition to the potential harm resulting from operating a system with random and unknown damage, it becomes extremely difficult to debug the problem, since it is nearly impossible to determine where the damage came from or when it occurred. Even the best development tools have difficulty when presented with this type of situation; as a result, it can take days, weeks, or even months to identify and fix the problem.
Additional problems occur when the modified modules are reintegrated into the system. Whenever a module is introduced, even a slightly modified one, the entire system must be relinked, producing a unique version of the system with a distinct memory map. If the relinked system suddenly fails during testing, however, where should blame be assigned? Using Figure 5 as an example, assume a one-line code change is made in application 2 and reintegrated into the system. If the system were to fail during testing, it would be logical to blame the failure on the change in application 2. Unfortunately, this may not be accurate—the bug could easily be a stray pointer in application 1. The stray pointer might not have been recognized earlier if it were writing into an unused area of memory, since this would have no effect on system operation. The problem would only occur when the software was relinked and the memory map was changed—application 1 would then inadvertently be writing into a critical memory area. This type of problem is extremely difficult to find because designers generally do not assume that previously working code is at fault.
The real risk in a flat-architecture model is that the entire system, not just a small piece of it, becomes subject to the errors of the worst programmer who ever worked on the project. This person does not even have to work for the same company—third-party code is just as suspect since it can interact anywhere within the system. All of this takes a severe toll on product integration time (as well as on the sanity of the developers).
Figure 6. Monolithic architecture provides memory protection for applications.
Monolithic Architecture. In an attempt to address the problems of flat architecture, some OS vendors have adopted the monolithic architecture shown in Figure 6. A distinct improvement in this architecture is that every application module runs in its own memory-protected address space. If an application tries to overwrite memory used by another module, the MMU will trap the fault, thus allowing the developer to identify where the error occurred.
At first glance, this looks enticing: the developer is immediately notified of memory access violations and no longer has to follow blind alleys looking for subtle bugs in the code. Unfortunately, this architecture only addresses half the problem— the application modules. All of the low-level modules—file systems, protocol stacks, drivers, etc.—remain linked to the same address space as the kernel. As would occur in a flat architecture, a single memory violation in any of the system routines of the monolithic architecture will crash the system. While this is certainly an improvement over flat architecture, it fails to protect the developer from significant problems that delay integration and cause future reliability problems.
Figure 7. UPM architecture provides memory protection for all software components, including OS modules and drivers.
Universal Process Model (UPM) Architecture. UPM architecture, as shown in Figure 7, implements only core services (interprocess communications, interrupt handling, scheduling) in the kernel. With this model, optional user-level processes provide all other system services, such as file systems, device I/O, and networking. Thus, very little code that could cause the kernel to fail is running in kernel mode. This architecture also allows user-written extensions, such as new drivers, to be added to the OS without compromising kernel reliability. This type of architecture leads to faster integration in several ways:
DESIGNING A SYSTEM
Having examined the design choices, it is time to design a hypothetical system—or, more accurately, a subsystem, as the detailed design of a system for an entire medical device would be too extensive to present in a single article. This example will consider a fluid-pumping system such as might be found in dialysis equipment. The example provided is not intended to describe an actual system; certain liberties have been taken to provide a simpler operational model. For example, redundant independent sensors are not employed as they normally would be where patient safety is concerned. This, along with other safety and functional factors, must be considered when designing an actual medical device, but they are not strictly necessary to demonstrate the general design principles.
For this example, the following functional requirements will be assumed:
Figure 8. Design of the fluid-pumping system.
Design Proposal. A designer's first concern should be to break the problem down into a reasonable number of processes: using too many requires the developer to spend a lot of time performing needless context switches, while too few forces the merger of components that do not necessarily relate to each other. Figure 8 shows the chosen design for the fluid-pumping system, including the communication paths.
There are several reasons behind each process and its communication paths. Note that if multithreaded processes are used, each process could have a receiving thread to handle reverse-direction messages, as opposed to using pulses for this purpose; in this example, however, only single-threaded processes will be used.
Safety Layer. The safety watchdog process in the design's safety layer provides an overall health check on the system. It performs two major functions in this example: resetting the hardware watchdog timer and monitoring to ensure that other software processes are still running. While the watchdog could also perform other functions, such as comparing critical values to ensure sane system operation, in this example that will be left to other processes in the system.
The method used for checking on the health of the other processes is a simple one—regular check-in. At regular intervals, all processes are required to send a pulse to the safety watchdog process; failure to receive a pulse from any other process within the specified time period causes a corrective action to occur. Corrective actions vary depending on which process failed and the severity of that failure; the action could restart the process involved, notify the operator of an error, generate an entry in an error log, shut down the equipment, or perform any combination of the above.
The normal mode for the watchdog process would be blocked and waiting for a message or pulse to process; these messages might come from the following:
Subsystem Control Layer. The pumping control process in the subsystem control layer is a state machine controlling the pumping subsystem. It brings the pump up to speed, monitors pump speed and flow rate, and determines how to handle out-of-boundary conditions (such as a low flow rate with a high pump speed). For this example, the pumping control process will also handle actions such as the pressing of an emergency off switch. Normally this type of control would be concentrated in a layer above the subsystem control known as the master control, which coordinates overall machine control and directs all subsystems to shut down properly. For this example, however, the pumping control subsystem will assume this responsibility. As with the safety layer, the normal mode for this process would be blocked and waiting for a message or pulse to process, with messages coming from
Upon receiving a message, the pumping-control process determines the next action for the subsystem and sends out the appropriate messages to the devices (e.g., "speed up the pump," "stop the pump"). The process then waits for another message, or possibly a timer if an action was not completed on time.
It can be argued that the communication paths to the pumping control process are inverted, and that this process should never block on a message send. The reasoning here is that it is only sending to trusted processes—those known to respond quickly. More likely, however, it matters little in which direction the communication occurs as long as a single direction is chosen.
Device Interface Layer. The flow and speed sensor interface, pump interface, and light-and-switch interface—all part of the device interface layer—provide an abstraction layer for the physical hardware. Though if may be tempting to let the pumping control process directly address the analog and digital I/O devices, there are several reasons to avoid doing so, including the following:
Why not create one giant program that abstracts all physical devices? This could be done, but there are two reasons why it is not a good idea. The first is simplification: if all device control were integrated into a single process, it would be huge and difficult to manage, and even capable of missing time deadlines. The second reason is that by separating functionality into multiple processes, different programming teams can be assigned to implement and test each subsystem—a vital component necessary for fast system development and integration, and one that would be unavailable were the system part of one large program.
In the present example, some developers might question the pairing of the flow rate measurement and rotational speed sensor interface into a single process. While there are other methods that could be used here, this setup is beneficial: these two measurements are tightly linked, and thus it makes sense to perform routine checking (such as out-of-boundary condition checking) in a single process rather than in pumping control. The pumping control process could be notified either when an error occurred or when a sought condition is reached (e.g., "pump is at speed"). Both methods are shown in this example, and eachhas its advantages; it is left to the designer to determine the type of organization that is best for the medical device.
Again, the normal mode for these processes would be blocked and waiting for a message or pulse to process; these messages would come from
Upon receiving a message, these processes determine the appropriate action. This includes converting values to a type necessary for further operations (e.g., converting voltage to flow rate) and forwarding this value on to the next appropriate layer. Forwarding can take many forms, whether it involves responding to a message with a value or initiating a message to the proper I/O board to change or query a value.
Following are some important notes on design:
Physical Hardware Layer. The AIO and DIO processes of the physical hardware layer, which control the actual I/O boards, constitute the physical hardware driver layer. A separate process is used for each type of board being used, permitting hardware to be easily replaced since the interface to external programs remains the same. Multiple boards of identical type can be driven by one driver process.
As with the previous process formats, the normal mode for these processes would be blocked and waiting for a message or pulse to process; these messages might come from the following locations:
Upon receiving a message, the process would determine the appropriate action; this would include beginning a data scan, replying with the current port value, or initiating a pulse to another process indicating that data is ready.
Display Layer. The display and poll processes, which are part of the display layer, are responsible for displaying information about the system on the user interface. Why are two processes necessary? Normally, one would expect that information to be displayed would be sent to the display process by the appropriate responsible process—e.g., the pumping control process would send information about whether it has begun pumping, or that the pump is up to speed, etc.
The problem is with the physical devices themselves—these are constantly changing values (at least in the case of analog I/O values) that need to be displayed twice per second, as specified by the pump's functional requirements. While the physical device interfaces could constantly send this information to the display process, that would be wasteful if the screen currently being displayed does not show those values. To solve this problem, the information is only polled for when it is being displayed on the screen. If the display process does this directly, however, it will not be ready to receive a message from another process (potentially in another subsystem) that needs to update the display—possibly causing that process to miss a critical time deadline (display is considered a fast, trusted process). The solution is to introduce the poll process, which acts as an agent for the display process.
When the display system starts operation, the display process blocks, waiting for a message to process. The poll process then starts, sending a message to the display process indicating that it is ready for operation. By not replying to the poll process, the display process prevents the poll process from further execution and retains control of the polling process. Display then sets a repeating timer that triggers at 0.5-second intervals. When the timer triggers, the display process replies to the poll process with specific instructions as to which processes to query for the real-time data currently being displayed on the screen. The poll process then queries those specific processes for the real-time data, forwards this information back to the display process, and waits for the next poll cycle.
The pumping control and safety watchdog processes need not be polled; information in these processes rarely changes and it is wasteful to constantly poll them for the same data. Instead, these processes can send changes directly to the display process when the data changes. The only concern is with the safety watchdog: if the display process takes too much time to process the message, the resetting of the hardware watchdog timer might get delayed, causing an inadvertent system shutdown. This is probably not a serious consideration since the display process is a fast, trusted process, but this possibility can nevertheless be prevented with the introduction of another pulse. Using this additional pulse, the display process is notified of a value change by a pulse sent by the safety watchdog; upon receipt, the display process simply directs the poll process to seek the new value on its next data-gathering pass.
It should be noted that this is not the only design method that could be used for this system—it is merely a valid possibility. Other IPC mechanisms such as a queue could be used, but with the trade-off of additional system overhead. Shared memory would also work in some areas. For example, device values read from the I/O boards and those calculated by the device processes could have been written to shared memory since these are write-once, read-multiple items. Of course, this only applies if they are single-word values (or less); anything larger would require the use of a data-synchronization mechanism to prevent possible concurrency problems. Shared memory might improve the performance somewhat, but probably not a lot— messages would still have to be generated to notify processes when data is available or has changed; otherwise, the processes would have to block on a semaphore (effectively the same as blocking on a message), or poll for changes (tying up the cpu).
Architecture, both in the software design and in the underlying operating system, is a differentiating factor for faster system development. Through the use of a process-oriented architecture coupled with a messaging IPC, programming teams working in parallel can develop a system where each component is tested in isolation and then quickly integrated into the completed product. This system also provides for a more flexible architecture, as individual components can be easily replaced as requirements change during system development. Furthermore, by coupling this type of architecture with an OS that provides full MMU support, the developer is assured that unintended data interactions cannot occur—problems such as a stray pointer will immediately be caught by the MMU and easily debugged.
The result is a faster time to market because manufacturers will not be fighting the typical system integration nightmares. Later, when the system is deployed in the field, these same mechanisms will provide for extremely reliable system operation, particularly when coupled with a software watchdog. Designing with this type of architecture also benefits future projects: each component need only be developed once, since any process can be reused in another project with the knowledge that it will act in exactly the same manner.
With development time being such a limited commodity to-day, wasting it using the wrong software architecture is an error that can be avoided. The proper design choices make this possible and yield benefits throughout the lifetime of the project.
Fletcher-Haynes, P. "Rx:QNX—COBE Remedies the High Cost of Blood Collection," QNXnews 11, no. 7 (1998): 7—12.
Isensee, G. "Stress-Free ECG—QNX Gives Burdick's Quest a Running Start," QNXnews 11, no. 2 (1997): 7—12.
Leroux, PN. The New Business Imperative: Achieving Shorter Development Cycles while Improving Product Quality. (Kanata, ON, Canada: Whitepaper, 1999).
Stevens, WR. UNIX Network Programming. (Englewood Cliffs, NJ: Prentice-Hall, 1990).
Stewart, DB. "30 Pitfalls for Real-Time Software Developers, Part 1," Embedded Systems Programming 12, no. 11, (1999): 32—41.
Stewart, DB. "More Pitfalls for Real-Time Software Developers," Embedded Systems Programming 12, no. 12, (1999): 74—86.
Tanenbaum, AS. Operating Systems—Design and Implementation. (Englewood Cliffs, NJ: Prentice-Hall, 1987).
Jeffrey Schaffer is a senior applications engineer at the Westlake Village, CA, office of QNX Software Systems Inc., an operating systems vendor headquartered in the Ottawa, Canada area. He has more than 20 years of experience working with operating systems, database internals, and system design.