Diagnosing Medical Device Software Defects Using Static Analysis

Photo by iSTOCKPHOTO

26 Min Read
Diagnosing Medical Device Software Defects Using Static Analysis

Over the years, medical devices have become increasingly dependent on software. They have evolved from the use of a metronome circuit for early cardiac pacemakers to functions that include electrocardiogram analysis, laser surgery, and intravenous delivery systems that adjust dosages based on patient feedback. The software used in these devices must be free of defects because even a single undetected error could result in severe injury or even death of a patient.

Generally speaking, the more complex the software, the more likely it is to contain latent errors. Modern infusion pumps can contain tens of thousands of lines of code, while proton beam therapy machines may contain in excess of a million lines of code. Checking for defects in code bases of this magnitude is a challenging task, to say the least. The issue is likely to be compounded in the future as stand-alone devices begin to interoperate across networks, transferring operational data and patient information on the fly. Clearly, a new paradigm for safety assurance is required to help ensure that the possibility of any software defect or malfunction in these systems is minimized.

CDRH requires that manufacturers perform detailed verification and validation (V&V) for all software contained in their devices.1 Traditionally, the most common means for this V&V have been testing and code review. Unfortunately, these techniques, while effective in catching a large number of bugs, can never be guaranteed to uncover all possible defects in the software. Tests are generally carried out during the run-time execution of the programs. Because each run of the program takes only a specific path in the program, a finite number of tests can only check a limited set of possible execution paths through the entire system. For the purposes of this article, we refer to execution paths at the system level as opposed to the unit level, where complete testing is much more feasible.

Usually these paths cover only a small fraction of the total possible paths in the program. Code reviews, on the other hand, rely solely on the expertise of the reviewer and may not be efficient for large code bases. The efficiency of code review changes as the size of code bases increases due to the growing volume of execution paths. Regardless of the experience of a given team of developers, at some size, every code base exceeds a team's ability to review it using conventional, manual review.

To check all possible paths (or traces) in the program, a different approach, based on program flow analysis, is required. One way of accomplishing this is through the use of automated static analysis tools. This article presents static analysis as a technique to detect errors in medical device software. It examines the static analysis approach and its advantages and limitations. It also provides examples of specific defect types that static analysis can identify and discusses how the approach can be applied to verification of medical device software. It should be noted, however, that static analysis is intended to supplement and improve the effectiveness of existing best practices in testing. It should not be thought of as a substitute for device developers' current testing activities.

Validation versus Verification

Verification and validation are terms that are often used in software. However, it is important to understand the difference between these two distinct but complementary activities. Software verification provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase by checking for consistency, completeness, and correctness of the software and its supporting documentation.1 Validation, on the other hand, is the confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled.2

What Requires Verification and Validation?

Four types of software that require verification and validation are as
follows:1

•Software used as a component, part, or accessory of a medical device.

•Software that is itself a medical device (e.g., blood establishment software).

•Software used in the production of a device (e.g., programmable logic controllers in manufacturing equipment).

•Software used in implementation of the device manufacturer's quality system (e.g., software that records and maintains the device history record).

Static Analysis

Static analysis is a technique used to explore a given software system without actually executing the software. There are many different types of static source code analysis with many different purposes. They range from rigorous, formal, methods-based state-space exploration techniques, such as theorem proving and model checking, to simple syntax parsers such as Lint.3–6 In this article, we restrict the definition of static analysis to refer to the detection of potential run-time errors in source code, a technique that is most helpful in software verification. This type of static analysis can discover complex defects in the code through symbolic path simulation, the process of analyzing every possible execution path of the program.7

Historically, the only way to check all execution paths in the program has been through a (manual) code review. However, for large code bases, it is impossible to manually analyze all paths in the software. For example, a device with 200,000 lines of code was found to contain more than 240 (about 1012) individual paths during automated analysis of the code. Even if the developers could verify a path every minute during inspection, it would take 20,000 developers more than 100 years to analyze every path. Formal methods-based program analysis and the availability of inexpensive computational power have made it possible to exhaustively explore software systems and detect problems in all of these possible paths in a matter of minutes. This is accomplished by aggressively identifying similar paths that can be combined, hence reducing the total number of scenarios that are analyzed to accomplish complete path coverage.

Early static analysis tools relied mainly on a pattern-matching approach to identify dangerous code by applying a database of heuristics to distinguish unsafe code patterns from safe code patterns (e.g., “Do not call strcpy because oftentimes developers can introduce defects when making calls to strcpy” would have been identified as unsafe). This approach could scale to large code bases, but provided an inherently shallow analysis that did not take into account the semantics of the code being analyzed.

Although useful for uncovering some coding issues or examples of poor programming style, pattern-matching static analysis tools rarely found critical, hard-to-find defects caused by the interaction of multiple components across an entire code base and frequently suffered from high false positive and false-negative rates. (A false-positive is any result that a static analysis tool reports that is not actually a defect in the source code. A false negative, on the other hand, is any defect in the code that a static analysis tool does not report.)

In recent years, static analysis tools have evolved beyond simple pattern matching by focusing on path coverage, which allows them to uncover more defects with real run-time implications. By shifting focus from suspicious constructs to run-time defects, new static analysis technologies evaluate more of the intricate interactions within code bases (e.g., values of variables as they are manipulated down a path through the code, or the relationship between how parameters of functions are treated and the corresponding return values). To analyze code with this additional level of sophistication, these tools combine path-flow analysis with interprocedural analysis to evaluate what happens when the flow of control passes from one function to another within a given software system.

The combination of path-flow analysis and interprocedural analysis allows static tools to deliver 100% path coverage for multiple software defect types, something impossible with traditional testing, which explores only a particular path during execution. Without evaluating all possible paths, testing processes fail to achieve maximum code coverage—a measure that describes the degree to which the source code of a program has been checked.

By evaluating every possible path through the software, static analysis can identify defects early in the development process, when they are most cost-effective to eliminate. From a software life cycle perspective, defects become more costly to organizations the closer they are discovered to release time. Defects discovered later in the life cycle often require rework by developers and invariably delay software releases. The cost of fixing an error at this stage can be up to 100 times as high as it would have been during the development stage (the cost of fixing an error after a device is placed on the market can be 1000 times higher).8 Modern static analysis tools can help reduce these costs by providing the capability to detect a multitude of defects early in the software life cycle.

Defect Classes and Potential Consequences in Medical Devices

Typical static analysis tools rely on collections of individual checks that are designed and optimized to identify a particular type of defect. This section gives a sample of specific static analysis checks, along with the types of defects they identify and the potential consequence of these defects in typical medical device software. Most of the classes of defects listed here will likely sound familiar to developers, but will also highlight some of the real ramifications of these defects in medical devices to emphasize the potential severity of these problems. Medical devices can be written in many languages, including C, assembly, C++, C#, objective C, Java, etc. However, the majority of device software is programmed almost exclusively in C and C++. The examples in this section, therefore, address defect classes most commonly found in C and C++ programs.

Buffer Overflow and Underflow

u21c_57906.jpg

Figure 1. (click to enlarge) An example of buffer overflow.

Any statically or dynamically allocated memory buffer has a maximum number of data elements that it can store. If an attempt is made to write data into the buffer in a memory location that is beyond the scope of that buffer (either before its beginning or after its end), it can corrupt memory that may belong to another variable or instruction in the program. As an example, consider the code fragment in Figure 1. The function overrun_pointer defines an integer array buff, with 10 elements, indexed from 0 to 9. Next, it defines a pointer, x, pointing to the second location (buff[1]) in the array. The subsequent reference to x[9] is thus equivalent to buff[10] (a memory location beyond the defined scope of the array), causing a buffer overflow error.

Null Object Dereference

When developing medical device software in the C# language, a null object that references nothing is allowed. By itself, this convention is not a problem. However, when dereferencing an object, a System.NullReferenceException may be created. This will not necessarily lead to a system crash because the CLR will catch this exception, but many applications cannot recover from these types of exceptions.

u23d_57914.jpg

Figure 2. (click to enlarge) An example of null object dereference.

In Figure 2, if null is passed to dripA, it is assigned null and then subsequently dereferenced when getValues is called. In the case of dripB, the object is valid (i.e., not null) and the dereference succeeds.

Uninitialized Variable

u25d_57915.jpg

Figure 3. (click to enlarge) An example of uninitialized variable.

In most programming languages, when a variable is declared, it is optional to assign it an initial value. However, if the variable is not initialized when it is declared, it will contain a random value. If this variable is not subsequently initialized before its first use, this random value will be used instead of the intended initial value for that variable. As an example, consider the code fragment in Figure 3. The variable x is declared as the first statement of the function, but it is not given an initial value at the time of declaration. Depending on the value of the input parameter, c, specifically when c is equal to 0, the uninitialized value, x, will be read and returned to the caller of this function. That uninitialized value could be any random value as determined by what used to be on the stack when this function was called.

Inappropriate Cast

u2ba_57916.jpg

Figure 4. (click to enlarge) An example of inappropriate cast.

Many languages allow variables of one type to be cast to another type (e.g., int to float). If this cast is not appropriate, it can alter the value of the variable in unexpected ways. As an example, consider the Java code in Figure 4. A Shape class was added to the list and yet in GetTriangle, the object from c.get() is cast to a Triangle. A ClassCastException can be thrown as a result; programs rarely recover from this type of misbehavior.

Division by Zero

u2a9_57917.jpg

Figure 5. (click to enlarge) An example of division by zero.

In most languages, division by zero is an undefined value. Often, such an error halts execution of the program or yields unpredictable results of the computation. For example, consider the code fragment in Figure 5. In this simple example, the value of z may be zero if the input parameters x and y have the same value. This would lead to a zero in the division statement, causing the program to halt.

Memory Leak

u2fa_57918.jpg

Figure 6. (click to enlarge)

If a program allocates memory on the heap, it must be responsible for deallocating that memory once it has finished using it. Memory leaks occur when the memory allocated goes out of scope without the corresponding free statement. When too much memory is leaked, no more memory can be allocated, and the program cannot continue execution. For example, consider the code fragment in Figure 6. The problem with this piece of code is that there is some error checking that occurs on the parameter c, but that error checking occurs after memory has been allocated on the previous line. Unfortunately, in the error case (where the function returns –1), the memory that is allocated and stored in the local variable p is leaked because p goes out of scope. Notice that in this example, in the nonerror case, the memory is freed by calling the function free.

The defect types listed here and their potential consequences for medical devices are not intended to be an exhaustive list. However, these examples should illustrate how common programming errors can lead to unexpected and possibly critical device failures. What makes these errors harder to detect is the fact that they are generally unpredictable and may be triggered only when a specific sequence of inputs occurs. However, any one of them could result in an unintended outcome, leading to a catastrophic consequence for the patient (such as a system reset, misdiagnosis, mistreatment, or an abnormal termination of treatment).

The regular use of static analysis has proven to help software developers identify and eliminate many of these and other types of potential software defects. Moreover, by identifying these defects early in the software development life cycle, static analysis can help lower the cost of maintenance and reduce time required to debug code.

Selecting a Static Analysis Tool

The success of any implementation of static analysis in the development process is contingent upon two key factors: developer adoption and integration into the existing development work flow. (See the sidebar, “Static Analysis and the Software Development Life Cycle.)

The primary driver for developer adoption of static tools is the accuracy and relevance of results. First, the static analysis tool must be right most of the time. High false-positive and false-negative rates can compromise the deployment of a static analysis tool if developers stop trusting the analysis results or if they have to spend too much time correcting the results of the tool or configuring it to avoid its inaccuracies. Additionally, the tool must identify genuine defects that are relevant to the developer. Simply identifying a large number of coding standards as violations, while technically accurate, is functionally irrelevant for developers looking to verify the correctness of their code. Tools that lack the necessary combination of accuracy and relevance may disillusion developers and impede adoption.

To characterize the effectiveness of static analysis tools, this section lists a set of criteria to help medical device manufacturers select a tool for their organization. Using these criteria together with device-specific features, such as the size of the application, development platform used, etc., can help manufacturers identify and select the appropriate tools to verify their code.

The most important criteria for selecting a static analysis tool are precision, recall, and performance. These criteria can be thought of as quantitative factors, because they can be measured empirically for a given analysis. In addition to these quantitative factors, there are also a number of qualitative, less-tangible factors to assess, but equally significant when evaluating a static analysis tool. The various quantitative and qualitative factors for static analysis tools are discussed below.

Quantitative Factors

Precision. Precision quantifies how well the tool excludes false positives. Formally, precision is defined as

Precision = TP / (TP + FP),

where TP is the total number of true positives, and FP is the total number of false positives generated by the tool.

A tool has 100% precision if it never generates a false positive result. False positives, unfortunately, are an unavoidable by-product of static analysis. Although commercial tools have improved in recent years, many still suffer from false-positive rates that can cloud the overall analysis results. A large number of such false positives could result in needless effort for the developer and may result in genuine errors (true positives) being overlooked.

Recall. Recall is a measure of the ability of the tool to find real defects. Formally, it is defined as

Recall = TP / (TP + FN),

where TP is the total number of true positives, and FN is the total number of false negatives generated by the tool. Of course, it is very difficult to have a true sense of the total number of false negatives because one never really knows how many defects are in a given piece of software. To make this measure practical, it is useful to look at the defects discovered in the code base for some period of time after the code has been released (e.g., 90 days) to estimate the false-negative figure.

A tool has a 100% recall rate if it can uncover all potential defects in the code (i.e., generate no false negatives). Just like false positives, a large number of false negatives can negate the effectiveness of the static analysis tool. A false negative in the result implies that there is a potential defect left undetected in the code that could manifest itself during execution. However, achieving a 100% recall rate is rare, if not impossible, and may only be possible at the cost of a very high number of false positives.

Performance. Performance is a measure of the amount of computing resources the tool needs to compute its results. Based on the nature of static analysis, the time required by any tool increases in relationship to the amount of code being analyzed. Because many developers run their static analysis tool in both a desktop and a central build environment, analyses involving large code bases (millions of lines) and very large code bases (tens of millions of lines or more) could become too time-consuming, and thus impractical. It is desirable, therefore, that static analysis tools not take very long to analyze a given code base.

When analyzing large-scale applications, however, considering the performance of the tool alone may not be enough. A more useful metric in this case would be the total amount of effort expended in the analysis process. This effort needs to take into account the performance of the static analysis tool, the effort spent in configuring the build, as well as the manual review of results to detect false positives.

In the ideal case, of course, static analysis tools should have no false positives, no false negatives, and run in approximately the same amount of time as is required for compilation. Practically speaking, however, completely eliminating false positives and false negatives is not possible given the current state of technology. Most effective static analysis tools instead try to find the elusive sweet spot between false positives, false negatives, and performance to make them practical in everyday software development.

Qualitative Factors

Configurability. To account for natural differences in code bases and varying development environments (caused by varying compilers or processor types), static analysis tools should offer customization and tuning capabilities that allow developers to modify their tools' settings to achieve greater accuracy.

Typical configuration parameters may include specifying the search depth, processor type, and various preprocessor directives required by the application being analyzed. In addition, the tool may also provide the ability to fine-tune the analysis by modifying either the number of checks deployed or the settings specific to an individual check, such as the threshold for null pointer dereferences. The ability to configure the tool for a particular software or application allows developers to select the level of performance most appropriate for their application and leads to more-accurate and reliable results.

Integration with Existing Development Processes. A static analysis tool should require no significant changes to existing build environments or source code to ensure smooth integration with established development processes and tool chains. Tools that disrupt existing processes are often not used by developers because the tools do not conform to established workplace behavior. To successfully integrate with existing development environments, static analysis tools should be able to support multiple platforms (such as sparc, c-167), compilers (such as gcc and Microsoft Visual C++, as well as the many compilers available for embedded development), and integrated development environments (IDEs). The more tightly information from static analysis tools can be integrated with existing processes, the more likely it is to yield positive results.

Persistent Tracking of Defects. One of the common deployment pitfalls that device manufacturers face with some static analysis tools is dealing with the defects as they are reported over time. For example, if a static analysis tool reports 100 defects on Monday, and after fixing a number of these, another 100 defects are discovered on Tuesday due to problems with newly introduced code, how do developers track which defects are new and which have already been diagnosed? As code churns, answering this question correctly is a challenge for some static analysis technologies. Static analysis tools should be able to merge defects from one analysis run to the next, even in the face of code churn. This way, when a developer marks a certain defect as “to be fixed” or “false positive,” their work is preserved in the next build so they can focus on addressing the new issues discovered in their code. This persistence of status over time avoids time-consuming rework where developers are forced to sift through familiar false-positive results to find new defects in their most recent build.

Developer Prioritization of Defects. To help development teams prioritize defects in the overall context of their specific software systems, they should look for static analysis tools that offer multiple and customizable severity settings. Such settings help identify real defects, in addition to having the ability to assign a status to the defect (e.g., False Positive, Ignore, Real Bug, etc.). To mirror existing work flow, the developer should be able to assign an action to be taken as a result of the defect discovered in the code.

User-Defined Defect Detection. A static analysis tool should have the ability to allow developers to create new checks designed for their code base, or modify existing checks to make them more effective at defect identification. Custom defect detection is an important feature when looking for domain-specific versions of common defects. Organizations can find it useful to create custom checks that are capable of identifying variants of known defect types. These checks can also help ensure compliance with corporate or industry coding standards.

Accountability and Ownership of Defects. An effective static analysis tool should do more than find potential defects. Tools that simply identify defects can create a significant amount of management for their administrators and end-users. Effective static analysis tools should be able to understand which individual developer within an organization is responsible for introducing a defect into a given software system. Without this capability, defect assignment can become time-consuming and frustrating for developers who may be asked to correct defects in code fow which they have little or no familiarity. Automatic defect assignment also ensures that development managers can have a high-level view of the number of defects that each individual developer on their staff introduces. This can benefit teams by identifying opportunities for the training or mentoring of individuals prone to introducing a higher rate of problematic code.

Multithreaded Support. The changing landscape of hardware is generating the need for new sophistication with respect to static tools. This is because the emergence of multicore processors has introduced a new class of concurrency defects in software. To take advantage of multicore hardware, software developers are now required to create multithreaded applications. These application result in an exponential increase in the number of possible run-time scenarios due to the concurrent execution of multiple operations.

Concurrent execution creates new complexities in the software development process that can introduce hard-to-find, crash-causing software defects such as deadlocks, thread blocks, and race conditions. The challenge of creating reliable multithreaded applications is often compounded because many developers are unfamiliar with creating these types of applications. Given the continuing evolution of multicore hardware, static analysis tools should be able to handle multithreaded applications and should be able to detect concurrency defects such as race conditions, deadlocks, and thread blocks.

Conclusion

Medical device manufacturers should give serious consideration to using static analysis tools in their software development process. Automated static analysis tools can enable the testing of the software in medical devices in a way that has never been possible before. This more-comprehensive analysis can help determine potential defects in the source code, while at the same time help ensure that no new defects are introduced during code modification.

Used as part of the V&V process, static analysis can help form an effective argument in support of software safety when preparing a (safety) assurance case for a medical device. Reports generated from static analysis tools can be used to provide the evidence for these arguments.

The criteria listed in this article provide device manufacturers with a basic reference guide to selecting appropriate static analysis tools for their organization. However, given the relative strengths of different static analysis tools, it may not always be possible to use the same tool through all stages of the software life cycle. Often manufacturers may need to employ a combination of tools to provide complete source code analysis. For example, a manufacturer could use an IDE-based tool with a high recall rate to uncover defects at the developer desktop level, and a high-precision, scalable tool to check for anomalies during central build.

Finally, it should be noted that although static analysis offers a number of benefits to medical device developers, it may not address all of the needs or development concerns a manufacturer has regarding code quality. Manufacturers still need to investigate other development tools that can provide a complete tool chain for developers to use at different stages of the software development life cycle. These include architectural analysis, dynamic analysis, and software readiness analysis, among other technologies. Static analysis is most effective when used in combination with such development analysis tools and traditional V&V techniques. It must be viewed as a complement to, rather than a replacement for, traditional methodologies.

Raoul Jetley, PhD, is a research scientist at FDA's Center for Devices and Radiological Health/Office of Science and Engineering Laboratories. Ben Chelf is chief technology officer and cofounder of Coverity
(San Francisco).

 

References

1.Guidance for the Content of Premarket Submissions for Software Contained in Medical Devices, U.S. Food and Drug Administration; available from Internet: www.fda.gov/cdrh/ode/guidance/337.html.

2.General Principles of Software Validation—Final Guidance for Industry and Staff, U.S. Food and Drug Administration; available from Internet: www.fda.gov/cdrh/comp/guidance/938.html.

3.Guidance for the Content of Premarket Submissions for Software Contained in Medical Devices, U.S. Food and Drug Administration Center for Devices and Radiological Health (2005); available from Internet: www.fda.gov/cdrh/ode/guidance/337.html.

4.J Rushby, “Theorem Proving for Verification,” in M odelling and Verification of Parallel Processes, (Nantes, France: MoVEP 2k, 2000), Franck Cassez, ed.

5.EM Clarke, O Grumberg, and D Peled, Model Checking (Cambridge, MA: MIT Press, 1999).

6.SL Johnson, “A C Program Checker,” Unix Programmer's Manual (Murray-Hill, NJ: AT&T Bell Laboratories, 1978).

7.H Hampapuram, Y Yang, and M Das, “Symbolic Path Simulation in Path-Sensitive Dataflow Analysis,” SIGSOFT Software Engineering Notes, January 2006.

8.V Lakshmi Narasimhan, “A Risk Management Toolkit for Integrated Engineering Asset Maintenance,” in Proceedings of the World Congress on Engineering Asset Management (WCEAM), July 2006.

9.M Mantle and B Chelf, Gracenote and Coverity customer case study; available from Internet: www.coverity.com/html/research-library.html.

10.J Cooper and B Chelf, ip.access and Coverity customer case study; available from Internet at www.coverity.com/html/research-library.html.

11.M Ballou, Improving Software Quality to Drive Business Agility (Framingham, MA: International Data Corp., 2008).

Copyright ©2009 Medical Device & Diagnostic Industry

Sign up for the QMED & MD+DI Daily newsletter.

You May Also Like