Originally published December 1994
Mitchel H. Krause
Aside from their mandate to provide a safe and reliable product, manufacturers of computerized medical devices may have three very practical reasons for automating their software testing program: their product is too complicated to test manually, the time devoted to manual testing is cutting into potential profits, and current FDA requirements will be easier to satisfy with automated testing and documentation. If any of these factors motivates your company, this article will help you to sort out the issues to be considered and options available. Then, when the automated test program is in place, safer and more reliable products will follow.1 The sorting instrument presented is a maturity model that plots four levels of testing maturity in terms of the resources required to move from one level to the next. The model can be used to determine the level that best fits your company and its products.
THE SOFTWARE TESTING MATURITY MODEL
The software testing maturity model, shown in Figure 1, is similar to a software process maturity model that is familiar to many software engineers. It has been described by Watts S. Humphrey in his book Managing the Software Process,2 and has been cited by Frank Houston, a former FDA staffer, and Steven Rakitin in presentations to the Health Industry Manufacturers Association.3,4 The version shown here as Figure 2 is adapted from Rakitin's presentation. The process model adapts well to automated software testing because effective software verification and validation programs grow out of development programs that are well planned, executed, managed, and monitored. A good software test program cannot stand alone; it must be an integral part of the software development process.
Level 1: Accidental Automation. The first level of the software testing model--like level 1 in the software process model-- is characterized by ad hoc, individualistic, chaotic attempts to get the job done. Important information (for example, what to test) is not documented and must be extracted from in-house experts. Test plans are sketchy. Test results are not documented consistently. Schedules slip. Either products are delayed or testing becomes a cursory, poorly documented exercise. Management is uninvolved or uninformed.
This level has been designated Accidental Automation because the use of any automated tools or techniques comes about almost as if by accident and is not supported by process, planning, or management functions. Products released on the basis of such testing may well be accidents waiting to happen. Testing at this level may be appropriate only for a product that has no potential for harming the patient or user; it is never appropriate for a computerized medical device.
Level 2: Beginning Automation. The second testing level corresponds directly to Level 2Repeatable in the software process maturity model (see Figure 2). There are hundreds of capture-and-replay test tools on the market today that simply repeat the responses of a system under test.5 As in the process model, however, these tools have limited capabilities and lose their economic usefulness quickly as a product changes.
Level 2 testing is still dependent on information locked in the minds of in-house experts, although documentation is beginning to appear in the form of software requirements specifications (SRSs) and test requirements specifications (TRSs). However, in most cases, large portions of these documents are written after the fact and used to meet regulatory requirements rather than to direct the development and test processes. Writing them does, however, provide good practice for moving to level 3.
Level 3: Intentional Automation. At the third level, automated testing becomes both well defined and well managed. The TRSs and the test scripts themselves proceed logically from the SRSs and design documents. Furthermore, because the test team is now part of the development process, these documents are written before the product is delivered for testing. Consequently, schedules become more reliable. Level 3 is appropriate for many medical device manufacturers.
Level 4: Advanced Automation. The highest testing maturity level is a practiced and perfected version of level 3 with one major addition: postrelease defect tracking. Defects are trapped and sent directly back through the fix, test creation, and regression test processes. The software test team is now an integral part of product development, and testers and developers work together to build a product that will meet test requirements. Any software bugs that do occur are caught early, when they are much less expensive to fix. When testing is performed at this level, an FDA inspector can pick up any piece of product documentation and trace the development process all the way from the SRS that describes the feature to the test results that validate it.
A Checklist of Issues. How can these software testing maturity levels help a company to plan and implement an automated software test program? The answer to that question comes from careful consideration of four issues:
* What is the profile of your company and its products?
* What processes do you need to implement as part of an automated testing program?
* What kind of people do you need in order to create and run a testing program?
* Which automated software test products fit your profile and process?
Significantly, price is not on the list. That is because the cost of any one component, especially the test tool, becomes insignificant when it is compared with the potential payback. A well-planned and well-executed software test automation process will pay for itself many times over by ensuring fewer bugs and field fixes, shortening product development cycles, and providing labor savings. And, if you keep your ultimate goal in mind when defining processes, choosing staff, and buying test tools, your testing program will continue to yield a good return as you advance from one level of maturity to the next.
PROFILE: RANKING YOUR COMPANY'S PRODUCTS
Most computerized medical devices can benefit from some type of automated testing. In fact, Boris Beizer, who is probably the most well known expert in the field of software testing, has said, "As far as I'm concerned, manual testing is ludicrous and self-contradictory. It's based upon a fallacy. Anybody who thinks they can test manually, doesn't take into account the error rate in manual test execution."5 However, knowing what level of automation is appropriate requires a good understanding of your company's products.
The exercise described below will help you to create a test-level profile of your company and its products. The profile is a guide to how your company may benefit from an investment in processes, people, and automated software test products. The point scores at the end of each section provide a rough estimate of the level of software testing maturity you should strive to meet.
How Large Are Your Software Projects? As software projects increase in size, the resulting products become harder to test and at some point manual testing can no longer cover enough functionalities to ensure safe and reliable products. There are many ways to measure the scope of a software project, but a simple line count is a start:
* Score 1 if your product has fewer than 10,000 lines of code.
* Score 2 if your product has between 10,000 and 30,000 lines of code.
* Score 3 if your product has between 30,000 and 70,000 lines of code.
* Score 4 if your product has more than 70,000 lines of code.
How Complex Is Your Product? Systems with multiple inputs and outputs, graphics screens or printers, embedded processors, or multiple microprocessors are all candidates for the controlled sophistication of automated testing. If two or more interactive processors are used, the product probably presents integration and timing issues that cannot be tested manually. Similarly, if the product has an embedded processor, it may have functionalities that cannot be tested manually. In other cases, it may simply be impractical to test the system manually. Printers are one example of a common peripheral that is hard to test by hand. They not only accept commands and data from a software system, they also send back status and error signals to which the system must respond correctly. It is slow, inconvenient, and sometimes impossible for a tester to follow test plans that try to duplicate all the combinations of acknowledgment, system-busy, paper-out, baud rate, error, select, sensor, and other signals the printer might return. The input simulation provided by a sophisticated automatic test system can both speed up this process and make it traceable and reproducible.
Even testing of that seemingly ubiquitous input device, a keyboard or keypad, can benefit from using an automated test tool with simulation capabilities. Timing issues, especially, are nearly impossible to test manually. The fatal accidents in the mid 1980s that involved a radiation therapy machine are a good example of the kinds of problems that can occur. This particular machine had both therapy and diagnosis modes, and operators entered a series of keystrokes to switch the system from a high-energy to a low- energy mode. If the keystrokes were typed in too fast, however, the high-energy mode would remain in effect even though the operator would assume the change had been made. Later, when the system was activated, it sent a damaging and sometimes lethal dose of radiation into the patient.6 An automated test tool with simulation capabilities could have detected this problem early, before any harm was done. Keyboard simulations could have been set up to test the effect of varying keyboard input speeds. (The actual resolution of the problem involved many factors in addition to keyboard input speed; the report cited gives a full account of these accidents and their outcome.)
System outputs may also be tested more efficiently using automated methods. After an 8-, 10-, or 12-hour day, even the most conscientious human tester will fail to notice some errors or forget to document them. Other outputs either cannot be monitored manually or the testing may require nonintegrated measurement devices that may be difficult to set up and monitor. Finally, some potentially fatal software flaws may never show up during functional (black-box) testing. Detecting these problems requires an automated system that can use white-box test methods to look inside the system.1
* Score 1 if your product has a single processor and simple inputs and outputs.
* Score 2 if your product has a single processor and common inputs and outputs.
* Score 3 if your product has uncommon inputs and outputs or if it uses a graphics screen or printer.
* Score 4 if your product uses multiple or embedded processors that cannot be fully tested using black-box methods.
What Financial Risk Does Your Product Pose for the Company? Both loss of market share and exposure to liability claims can create substantial financial risks for medical device companies. Because all products have a life cycle, the more time a new product spends in the test-and-fix-and-retest cycle, the less time it will spend on the market. Also, when market entry is delayed, sales will be lost even if the product is better than its competition. Even greater losses can occur if a poorly tested product harms someone. The manufacturer will face costly FDA actions and product liability suits. In worst-case scenarios, the product may never return to the market and the company itself will fail.
* Score 1 if a malfunction or failure of your product poses no threat to the financial health of your company, from either liability claims or loss of market share.
* Score 2 if a malfunction or failure of your product presents a small but acceptable risk to the financial health of your company.
* Score 3 if a malfunction or failure of your product presents an unacceptable risk to the financial health of your company.
* Score 4 if a malfunction or failure of your product would cause irreparable harm to your company.
What Risk Does Your Product Pose for the Patient and Operator? Although concerns about size, complexity, and financial risk are important in all software projects, the bottom line for a medical device company is risk to patients and health-care providers. Medical products must be both safe and effective. That is, they must do what they are designed to do and, when something does go wrong, the malfunction or failure must cause no harm. The product's FDA classification and hazard analysis results may determine if automated testing should be implemented. If a computerized medical device is categorized as Class II or Class III, an automated software test program may be necessary to provide both the testing and documentation required. Similarly, if the product presents software-related hazards, an automated test program might help your company to verify, validate, and document the measures taken to mitigate those hazards.
* Score 1 if your product is FDA Class I and a hazard analysis has shown there is no possibility of its software causing harm to a patient or operator.
* Score 2 if your product is FDA Class I and a hazard analysis has shown there is a remote possibility of its software causing harm to a patient or operator.
* Score 3 if your product is FDA Class II.
* Score 4 if your product is FDA Class III.
Evaluating Your Scores. In its "Reviewer Guidance for Computer-Controlled Medical Devices," FDA supplies an approach to evaluating the scores assigned in this exercise: "When a level of concern is assigned for each functioning component of the software, the highest level of concern generated is that assigned to the software aspect of the device."7 Thus, if you want to ensure the long-term success of your company, aim for the level of automated software testing equal to your highest score in any category.
PROCESS: CONTROLLING TEST POLICIES AND PROCEDURES
If any one word sums up the regulatory demands being placed on medical device manufacturers, it is process. No matter how much effort goes into designing, testing, and manufacturing a product, an auditor will not be satisfied if the process is not written down, followed, and documented. Process-related expenses will be incurred regardless of the testing level achieved or whether or not the software test process is automated; however, they can vary significantly across the testing levels.
Level 1 Process Costs. When software testing is at level 1, process costs are hidden. They arise from not having a defined process and can be very high, indeed. Such costs can include those incurred by delayed product introductions, the need for frequent field fixes, and a generally ineffective product development effort.
Level 2 Process Costs. Surprisingly, process costs can be highest for a company that is testing at level 2, especially one that is contemplating a move to level 3 in the foreseeable future. The costs are high because at level 2 the company is probably just starting to evaluate its software testing needs and to put standardized procedures in place. It may have to experiment, hire consultants, and establish or expand job areas, such as regulatory affairs.
Process Costs at Levels 3 and 4. Although the two major forces behind process improvement--FDA regulation and the need for ISO 9000 certification--may affect any company, those testing at levels 3 or 4 almost certainly need to meet FDA software test requirements. Such compliance is expensive and time-consuming, but the good news is that creating and documenting procedures for an automated testing program is no more expensive than doing so for a manual one. In fact, use of an automated test tool with scripting, test identification, and automatic documentation capabilities can reduce costs by providing some of the framework and content required.
The FDA "Reviewer Guidance for Computer-Controlled Medical Devices Undergoing 510(k) Review" states that "FDA is focusing attention on the software development process to assure that potential hazardous failures have been addressed, effective performance has been defined, and means of verifying both safe and effective performance have been planned, carried out, and properly reviewed."8 In order to get marketing approval for any product, its manufacturer must prove to FDA that the product does what it is supposed to do and that it is safe. The way to do that is not only through clinical trials but also by documenting the process that was followed to make the product eligible for such trials.
In contrast, ISO 9000 certification is based on process alone. Because the products themselves are not certified, the certification authority is concerned solely with whether the process that created the product is traceable, repeatable, and documented. When the process is proven, the site responsible for making the product is certified. An ISO 9000 certification audit costs about $10,000 to $20,000, but that is only the barest tip of the iceberg. The total cost includes the resources required to evaluate the company's needs, get the appropriate procedures in place, have them audited and approved, and motivate personnel to use them.
If established procedures are being revised to accommodate automation, existing regulatory affairs and quality assurance personnel may need to devote two to four weeks each to the project. In addition, it may take a technical writer about a month to rewrite the policy and procedure manuals. Finally, occasional technical support will be required from software developers and test engineers.
PEOPLE: CHOOSING QUALIFIED TESTERS
No matter what type of testing a company does, manual or automated, experienced people are needed to create the test plans and write test scripts.
Level 1 People Costs. At test maturity level 1, testing is often limited to debugging. A programmer writes and debugs the product's software until everything seems to work correctly. Because only the programmer is involved, testing costs are hidden in the cost of development. Likewise, the potential benefits of better test practices are hidden in field-support and product- upgrade costs. Thus, level 1 people costs are essentially unknown.
Level 2 People Costs. In software testing programs at level 2, testing is recognized as a separate function. Test plans and scripts are generally written by an experienced product user or support person who may or may not have programming experience. In any case, the person performing this task must understand the SRSs and design specifications well enough to write a comprehensive test plan and test scripts. The scripts are then given to testers who run them and record the results. One option is to hire a group of low-paid, inexperienced users; another is to recruit testers in-house. Whoever the testers are, they must understand that their job is to try to break the system as well as to make sure it works right. Level 2 people costs may also include one or more high-level support people to coordinate test writing, supervise the testers, and edit the results. Also, since the labor that goes into setting up a capture-and-replay tool is not reusable, the cost of one test cycle must be multiplied by the number of test cycles expected.
People Costs at Levels 3 and 4. Automated testing plans are most often written by a software test engineer, who should also participate in product development meetings with design engineers to help build testability into the product. The test engineer's programming background combined with a familiarity with the product will ensure the creation of efficient tests that attack the weakest parts of the product. If the test tool has white-box test capabilities, the test engineer uses his or her knowledge of system internals to specify tests for functions that cannot be tested manually.
The test plan is then used to write the test script programs. This work can be done by the test engineer or given to application programmers. The level of programming experience required to write test scripts depends on the test tool used. Generally, the most versatile tools run on scripts written in some version of a common programming language, such as C. Other tools use simplified languages. In any case, at least one member of the test team must have some familiarity with writing a structured set of instructions. Because the automated testing tool runs the tests and creates the documentation, no costs are added for hiring testers or diverting in-house personnel to perform and document the tests.
PRODUCTS: CHOOSING THE RIGHT TESTING TOOL
The requirements of the product and process determine the selection of an automated testing tool. However, medical device manufacturers should beware of confusing development aids with automated software test tools. Companies can spend large sums on many kinds of debugging tools and in-circuit emulators and still not have an automated test program. A software development aid has done its job when the product, or product component, is debugged and seems to work. Automated test tools, on the other hand, are designed not only to verify the system, but also to stress it to the point that it will break in the lab before it can fail in the field and harm a patient or operator.
Level 1 Tool Costs. Although development aids such as debugging programs and in-circuit emulators may be used in level 1 test programs, no automated test tools are used. Therefore, there are no tool costs at this level.
Level 2 Tool Costs. Level 2 testing is the domain of simple capture-and-replay tools that employ rudimentary scripting capabilities and are often used to verify operator interfaces. Prices for such tools start at about $200 and can reach $5000 or more for the more-sophisticated models. The less-expensive, software-only versions are often intrusive; that is, they run on the same computer as the software application being tested. Because the tool and product occupy the same space, product timing and performance can undergo unpredictable changes. Even if no problems show up during testing, the product shipped is never exactly the same as the product tested. Capture-and-replay tools with integral capture hardware eliminate the problems associated with intrusiveness but retain another problem characteristic of such systems--inflexibility.
Because a capture-and-replay test suite for a graphic user interface (GUI) can contain thousands of captured screen images and consume megabytes of memory, the time it takes to gather these images is significant. Timing variations and the fact that GUI displays are seldom static can add even more time. Most significant, however, is the amount of time needed to recapture, integrate, and retest the inevitable changes caused by debugging and last-minute product upgrades. Thus, capture-and-replay tools should be used only for the simplest of products.
Tool Costs at Levels 3 and 4. High-level test tools can include several advanced capabilities in addition to capture and replay. The following are features to look for when purchasing tools:
* Scripting. The tool's test script language should be as functional as a high-level computer language, permitting the inclusion of files, libraries, loops, and conditional statements. It also should include aids to help debug the scripts themselves.
* Monitoring. A choice of intrusive software monitoring, such as that used in capture- and-replay tools, or nonintrusive hardware monitoring of system outputs may be available. An added high-level feature in the most sophisticated systems is direct-processor monitoring. With direct-processor monitoring, a connector similar to an in-circuit emulator pod is mounted on the processor and monitors the activity of the product under test. The test tool is nonintrusive because the connector never sends signals to the application being tested. It is also quite fast and accurate because it works at the processor level.
* Black-Box Simulation and Stimulation. A high-level tool should be able to emulate the actions of a human tester. Hardware is available that can simulate such product stimulations as keys being pressed, printers responding, tones being generated, relays opening and closing, and other analog or digital inputs. In short, advanced simulation capabilities should enable tests to run unattended.
* White-Box Simulation and Stimulation. The test tool should also be able to simulate and monitor the internal workings of the product tested. Such white-box testing capabilities permit testing of timing, integration, and resource issues that cannot be tested manually.
* Documentation. Automated test tools can log both test parameters and test results. If integrated into the software development process, a sophisticated system should be able to produce much of the documentation required by regulatory agencies.
Test tools suitable for testing at levels 3 and 4 cost from $15,000 to $75,000.
As described above, once you determine your company profile, perfect your processes, establish test specialists, and give the team members appropriate testing tools, your company can realize the benefits of automated software testing. When compared with manual programs, automation properly applied will result in higher-quality products, lower risks to your company and the patients you serve, faster regulatory approvals, and decreased time to market. The higher level you reach on the automated software testing maturity model, the more benefits you will realize. Whatever level you choose, however, keep in mind a major lesson of the last 30 years of computing: No matter what tools you buy, your largest investment by far will be in the processes and people you put in place to use those tools. Purchase automated software testing tools based on how they can maximize your investments in processes and people, not on the price of the tools themselves.
1. Weide P, "Improving Medical Device Safety with Automated Software Testing," Med Dev Diag Indust, 16(8):6679, 1994.
2. Humphrey WS, Managing the Software Process, Reading, MA, Addison-Wesley, 1989.
3. Houston F, "Software Development and Quality Assurance: FDA Expectations," in Proceedings of the 1992 HIMA Conference, HIMA Publication 93-5, Washington, DC, Health Industry Manufacturers Association, pp 4351, 1993.
4. Rakitin SR, "The Economics of Software Process Improvement," presented to the 1994 Medical Device Software Conference sponsored by the Health Industry Manufacturers Association, Washington, DC, May 1994.
5. Johnson M, "Dr. Boris Beizer on Software Testing: An Interview, Part I," The Software QA Quarterly, 1(2):713, 1994.
6. Leveson NG, and Turner CS, "An Investigation of the Therac-25 Accidents," Computer, July, p 23, 1993.
7. "Reviewer Guidance for Computer-Controlled Medical Devices Undergoing 510(k) Review," Section 2.0 Levels of Concern, Rockville, MD, FDA, Office of Device Evaluation, Center for Devices and Radiological Health, 1991.
8. "Reviewer Guidance for Computer-Controlled Medical Devices Undergoing 510(k) Review," Section 1.0 Introduction, Rockville, MD, FDA, Office of Device Evaluation, Center for Devices and Radiological Health, 1991.
Mitchel H. Krause is director of testing and quality control for B-Tree Verification Systems, Inc. (Minneapolis).