Engineering Process Improvement through Error Analysis

Business

March 1, 1999

14 Min Read

Engineering Process Improvement through Error Analysis

Medical Device & Diagnostic Industry Magazine
MDDI Article Index

An MD&DI March 1999 Column

DESIGN CONTROL

Conducting analyses of previous projects can improve the effectiveness of review and testing procedures.

DA's emphasis on design controls has resulted in a renewed focus by manufacturers on evaluating design documentation, review, and testing efforts. While recognizing that the goal is always improved quality of the finished product, manufacturers today are confronted with questions concerning how much review and testing is appropriate and what criteria should be applied to determine adequacy.

The default approach of reviewing all specifications and conducting exhaustive testing for all design and development phases is based on the assumption that the most comprehensive procedures will find the most errors and afford the greatest quality improvements. However, manufacturers have realized that this approach is not practical—or even possible—for large development projects.

Realistic project management requires trade-offs and a knowledge of when increased review and test emphasis will provide the greatest benefits. Unfortunately, most manufacturers have no objective criteria with which to evaluate the adequacy of review and test activities and define alternative approaches that are more effective and less burdensome. In accordance with the FDA quality system regulation, if reviews and testing are to be conducted "where appropriate," then adequate criteria are required to justify the level that is performed.¹

Experienced quality engineers also realize that although removing errors is highly valuable in the design process, a far more efficient process would be to prevent them. This article discusses techniques that can be used to tailor review and test practices for maximum effectiveness and reduce the likelihood of errors being introduced during product design.

PROCESS-IMPROVEMENT STRATEGIES

Numerous models have defined approaches for assuring the quality of engineering design and development activities, including cleanroom inspection, program proofs, path coverage testing, and alpha and beta tests. No single technique has been a panacea for all companies and all applications. These generic solutions do not provide the process-improvement gains that can be achieved through internal analysis of project errors. Whereas generic models tend to suggest numerous quality practices without guidance on which technique is best suited for a specific technology or product, an internal process analysis is more focused.

R&D departments are constantly pressured to reduce a product's time to market and increase its functionality. Given this constant pressure, there is little time for additional tasks such as conducting an internal process analysis. However, failure to analyze the design and development process can lead to substantial productivity penalties. Recent reviews of system validation testing have shown that new product releases often duplicate the errors of previous releases. Emphasis on time to market often precludes the need to assess historical errors and define activities that may prevent repeating these same errors in the future.

Improvement in the design and development process is essential if process optimization gains are to be realized. Analysis of historical projects can identify selected areas in which reviews and testing can be performed earlier to detect likely errors. Areas in which reviews and tests find no errors should also be examined to determine the effectiveness of the tests, and ineffective tests should be eliminated.

ANALYTICAL RESULTS

A group of medical device manufacturers recently performed an analysis of design validation testing errors. The errors were analyzed and assigned to one of four categories:

Errors caused by changes in requirements (40%).
Errors caused by design (0%).
Errors caused by coding or schematics implementation (40%).
Errors caused by incorrect test protocols or misinterpretation by the tester (20%).

Errors Caused by Changes in Requirements. An example of this is when the implementation was consistent with the requirements that had been initially communicated to engineering, but the requirements had been changed based on new demands from the user. This category included changes that would in some cases be considered enhancements that were added after development started.

Errors Caused by Design. These are errors that required a change to the system high-level architecture. (Detailed design errors that could be solved with code changes were classified as code errors.) In practice, this category was seldom identified as the source of an error. As a result of critical schedule pressures, there was a reluctance to identify errors that required a modification of the design architecture and any significant amount of rework. Instead, the error was most often defined as an implementation error that could be corrected with a software patch—even though the optimal solution might have involved a redesign.

Implementation Errors. Defects in the system including deviations from requirements, system hang-ups, failure to handle input ranges, inconsistency in the user interface, and interface errors, to name a few, were classified as implementation errors. These errors were caused by new functions incorrectly implemented, defects in system interfaces, and defects in supporting programs such as operating systems. Operating system errors required code modifications and sometimes minor upgrades.

Errors Caused by Incorrect Test Protocols or Misinterpretation by the Tester. Testing-related errors occurred when the test procedure was incorrect in defining the expected results or when a tester misunderstood the requirement.

Because the products examined in the study were software-intensive, the errors identified during the validation testing were initially believed to be program coding errors. However, analysis showed that other factors were also significant. More errors were attributed to problems introduced during the requirements phase and test process than in the implementation phase. The major source of errors was not coding as initially assumed.

PROCESS-IMPROVEMENT STEPS: REVIEW ACTIVITIES

The error analysis showed that many problems result from aspects of the design and development process other than implementation. Requirement changes and inadequate test procedures caused most of the errors in the study. Based on the analysis, a new emphasis was placed on review and test activities to detect and prevent the identified types of errors early in the development process.

Review of Requirements. Although requirements reviews are recognized as one of the most effective ways to remove errors, they were frequently neglected in the haste to rush products to market. Requirements errors found during the requirements phase can be easily addressed by changing a document. However, if these errors are not found until validation testing, a significantly greater effort is necessary to correct, debug, retest, and document updates.

Even when requirements reviews were conducted, errors were often introduced when later changes or additions to the requirements were made. New requirements were often not given the scrutiny or review necessary to ensure feasibility and correctness.

Another common problem with requirements was that the desired implementation was not clear—ambiguous statements were misinterpreted by the implementing engineer or the person writing the test procedures.

The large number of requirements errors suggests the need for continued requirements management throughout the development process. Simply conducting a requirements review at the start of the design process is insufficient to control the numerous changes that are proposed as development progresses.

Design Review. In practice, few errors were attributed to design, and therefore design reviews were not emphasized. Most errors were fixable in code; only those that required hardware changes were considered to be design errors. However, this approach is not entirely accurate, since design complexity was obviously a source of errors, increasing the difficulty of implementation and testing and the likelihood of errors in subsequent releases.

Another factor that suggested the need for an increased emphasis on design reviews was the number of errors that could not be duplicated. These errors are believed to be the result of poor design, but inadequate data were collected to substantiate this conclusion. Although the sources of these nonrepeatable errors were not conclusively identified, they are thought to be related to functions such as timing architecture, internally developed communication routines, memory allocation, or circuit designs without adequate margins. Because the number of these errors was low, limited resources were devoted to further analysis.

Review of Code and Schematics. Although some level of informal code and schematics reviews had been used by the manufacturers, improvements could be realized, such as ensuring that reviews are conducted not only for the initial release but also for changes. Many new errors were introduced in regression testing when changes were made to fix problems found during integration and validation testing. The likelihood of new errors being introduced was high because of significant schedule pressure that precluded the more comprehensive reviews conducted during initial development. This explanation is also supported by a study in which the error rate for changed software code was found to be 7.9 errors per 1000 instructions, versus only 4.8 errors for newly developed code.²

Code and schematics review were often ineffective in identifying errors that were discovered later during validation testing. Review quality was inconsistent, determined primarily by reviewer expertise and the amount of time devoted to the review. To avoid this pitfall, manufacturers should design code and schematics review checklists to emphasize areas that have historically been the source of errors and to provide training in effective review techniques.

Review effectiveness was also hampered because the code or the schematic was difficult to read—particularly software code, as notations for schematics tend to be more standardized. In the absence of coding standards, different programmers use different styles. Adopting and enforcing a consistent coding style was recognized as one way to enhance review efficiency.

Most code modules and schematics were error-free during review and validation testing. Although it would certainly be desirable to determine which components were likely to have errors and to focus reviews on them, identifying these components is not a straightforward process and certainly merits further analysis.

Review of Test Documentation. Most manufacturers were surprised by the number of test procedure errors. Related testing issues included the effectiveness of the test procedures—the volume of the procedures versus the number of errors found—and their level of detail. In many cases, formal validation testing was not as effective in finding errors as was testing conducted without formal procedures. This suggests a problem with the emphasis provided in the test procedures—an undue focus on showing how the system works as opposed to uncovering errors.

Ensuring that technical reviews are conducted should increase the effectiveness of test procedures. In the study, the test procedures often were not reviewed for technical correctness as carefully as were the requirements documents and code. To ensure that reviews are effective, they must be conducted by someone knowledgeable about the system's operations.

Another problem was poor training on use of the system. Problems could be reduced if the individuals writing test procedures were given better information concerning system use and contacts with whom to discuss questions concerning implementation.

Developers should also receive copies of the test procedures as early as possible in the development process. When the developers in the study received copies in advance, they were able to incrementally test functions before integration. Such testing should yield a much higher success rate when the final validation tests are run, assuming that test-procedure preparation also occurs early in the development process and is not delayed until validation testing.

PROCESS IMPROVEMENT: TESTING ACTIVITIES

Process improvement steps were proposed to support testing practices that could identify errors as early as possible. These procedures included unit, integration, and validation testing. Specific steps to improve these areas include providing guidance on the type of testing to be performed and integrating review and unit test activities; ensuring that tests address external communication and interfacing systems; providing tests that address hard-to-find errors such as memory management errors, boundary checks, or simulation of high-load conditions; defining procedures that emphasize tests likely to identify errors; and changing the order of testing to do the most demanding procedures first so as to find the greatest number of errors as soon as possible.

Unit Testing. Unit testing is the testing of individual hardware or software units (such as software functions).³ If unit testing had been performed more effectively, several errors could have been identified earlier. A large variation existed among developers regarding unit-testing completeness and effectiveness. The first step toward improving the effectiveness of unit testing should be establishing clear guidelines on how such testing should be conducted, including examples and training.

Techniques that improve testing effectiveness include defining test cases for boundary conditions (minimum and maximum values) and error or exception conditions (such as hardware unavailable, divide by zero, or library routines not available). Increased emphasis should be placed on ensuring the completeness of unit testing for routines that implement safety-related requirements as defined by safety risk analysis.⁴

In some cases, errors could be more easily identified through review than through testing. For instance, it may be more effective to review the correct implementation of an algorithm than to try to exercise all data sets through testing. Guidance should be provided to help determine whether reviews, testing, or both should be conducted for a specific program unit.

Integration Testing. Integration testing evaluates the interaction between software and hardware units.⁵ In general, integration testing suffers from a lack of focus. The objectives for unit and validation testing are usually more clearly established than are those for integration testing. The first step toward improving the effectiveness of integration testing is to define which areas need emphasis. The errors analysis determined that integration testing should especially stress external interfaces, interfaces between subsystems, sequences of operation, and timing requirements.

Integration testing is also the stage in which difficult-to-find errors are caught, such as memory-management errors or errors caused by high load conditions. Tools should be identified that support integration tests, such as utilities to analyze memory management and check the consistency of the interface data provided to subsystems and external interfaces.

Validation Testing. Validation testing is the formal testing of a product before its release. The number of errors attributed to validation-test procedures was much larger than expected and was assumed to be related to implementation problems and not testing. However, the analysis showed that validation-test procedures must also be addressed in order to achieve the target productivity gains.

Of particular concern was the finding that the formally documented validation tests were less effective in identifying existing error conditions than were the ad hoc tests administered by experienced users. A redesign of the formal validation-test procedures was necessary to enforce the type of testing performed by more-experienced users. The success of the redesign requires an understanding of those tests most likely to identify errors. Proper training for those preparing the test procedures must also be provided.

The analysis also identified a problem with the order of the test procedures. Many errors were found by the most difficult test procedures, which were the last ones to be run. It is preferable to run the most demanding tests first, to find the greatest number of errors as quickly as possible. When errors are identified later in the test process, those test procedures that were done previously have to be reexecuted after the code or hardware is modified. Identifying errors early in the test process minimizes the amount of retesting necessary.

Another concern was poor management regarding the number of test procedures and the coverage of requirements. As new requirements were added or modified, new test procedures were continually added. It was not common practice to evaluate whether existing procedures could be reduced, and new test procedures often overlapped existing ones. Failure to streamline and integrate new and old tests resulted in a large overlap of procedures that, if eliminated, could reduce the duration of testing.

CONCLUSION

Analysis of errors from previous development efforts can provide significant insights for process improvement. Error analysis is more effective in identifying process improvement opportunities than are generic models that are not tailored to a company's specific applications and development needs. It can provide data on the types of errors that should be emphasized during reviews, guidelines on when reviews should be conducted, feedback on which types of testing should be emphasized at what stages of the development process, insight on how to streamline the number of test procedures to focus on those most likely to find errors, and updated information on areas to improve as applications and technologies evolve.

Error analysis is an inexpensive and highly effective tool to increase the efficiency of a manufacturer's design and development process. The time invested in conducting such an analysis can yield tremendous payback in terms of overall process improvement and reduced time to market.

REFERENCES

1. Code of Federal Regulations, 820 CFR 820.1(a)(3).

2. NF Schneidewind and Heinz-Michael Hoffman, "An Experiment in Software Error Data Collection and Analysis," IEEE Transactions on Software Engineering SE-5, no. 3 (1979): 277.

3. IEEE Standard Glossary of Software Engineering Terminology, IEEE Standard 610.12-1990 (Piscataway, NJ: Institute of Electrical and Electronic Engineers, 1990), 79.

4. Department of Health and Human Services, FDA, ODE Guidance for the Content of Premarket Submission for Medical Devices Containing Software, draft, September 1996.

5. IEEE Standard Glossary of Software Engineering Terminology, IEEE Standard 610.12-1990 (Piscataway, NJ: Institute of Electrical and Electronic Engineers, 1990), 41.

Daniel P. Olivier is president of Certified Software Solutions Inc. (San Diego), an engineering services company that specializes in support for design controls, verification and validation documentation, independent testing, software development, safety risk analysis, and compliance audits.

Illustration by Barton Stabler

Related Topics

Recent in Sectors

Related Topics

Recent in Product Development

Related Topics

Recent in Manufacturing

Related Topics

Recent in Regulatory & Quality

Related Topics

Recent in Digital Health

Related Topics

Recent in Business

Related Topics

DESIGN CONTROL

Editors' Choice