Reconciling Gaps Between Premarket Testing and Postmarket Results

Postmarket results can carry real implications for premarket testing processes.

July 27, 2015

10 Min Read
Reconciling Gaps Between Premarket Testing and Postmarket Results

William A. Hyman

Understanding how a medical device will perform during its actual use is a fundamental engineering, patient-safety and risk-management challenge. Such understanding also supports regulatory filings; FDA expects to see evidence of appropriate testing when assessing proof of safety and efficacy or attempts to establish substantial equivalence. Testing may include nonbiological lab testing, nonhuman biological testing, and, in some cases, some level of human experimentation.

In each case, it is appropriate to consider the degree to which the test methods and conditions are meant to simulate or be otherwise relevant to the actual conditions of use. For test conditions that ostensibly simulate realistic use conditions, there is the question of whether the conditions represent a worst-case scenario or a more benign challenge, or even if the conditions are too rigorous. Scope is also important; in the case of multiple interacting failure modes, performance to a predetermined level should be studied for a combination of failure modes rather than for one failure mode at a time, if the testing is meant to be realistic.

In addition, failure may occur at different periods of time. Testing the ability of surgeons to implant a device is one component; what happens to the device and patient after implantation is another, as is what happens during the device’s removal, especially when that removal occurs because of failure. Furthermore, the range of skill and training of the testing surgeons may not reflect the skill and training of the intended broader market.

This article will explore the dichotomy between premarket testing and postmarket performance, focusing on fatigue, catheter/guidewire testing, and human factors as examples.

Fatigue

Fatigue testing of an implanted device presents a clear example of the challenge of reconciling the differences between test conditions and clinical performance. To be meaningful with respect to clinical performance, a fatigue test should endeavor to simulate the dynamic loading environment to which the device will be subjected by real patients.

Important parameters to consider include how the device is implanted, the resultant geometry, and the load profile. These parameters should incorporate a reasonable range of deviations from perfection. If the test conditions are too lenient, the device may pass lab testing only to fail in clinical use. When unanticipated clinical fatigue failures occur, the relevancy of the testing is automatically called into question. A proper investigation would then focus on the root of these failures; clearly, some condition of the device or how it was secured and loaded differed from the test conditions. If some causative particularity of the device and its use cannot be identified, it must be concluded that the test conditions were, in general, not an adequate simulation, and that conclusions about what the testing showed were therefore incorrect. If, on the other hand, such particularities can be identified, they must be further investigated, understood and dealt with. Consider one common conclusion that is often reached quickly: that the surgeon did something wrong. At a minimum, it is important to determine exactly what was done incorrectly. Beyond that, it is important to investigate how such mistakes can be prevented going forward.

The discovery of unanticipated clinical fatigue failures gives rise to an interesting issue when another device is being brought to market via the 510(k) process and substantial equivalency needs to be established. Should the fatigue test used for the earlier product —a test that is now known to be falsely predictive of nonfailure—be used again on the new product? The logical technical answer is no, but the what-can-we-get-away-with answer is yes, since the limited objective is to show that the new product is no worse than the old, and passing the same fatigue test might seem to establish that. In addition, using the same test may make it possible to avoid openly addressing the fact that the old device has had more fatigue failures than originally predicted. Predicate creep also occurs here. If Device B was supposed to be as good as Predicate Device A but turned out not to be, should the predicate for Device C be the more demanding Device A or the less demanding Device B?

Fatigue testing can be used for purposes other than predicting actual performance, but the predictive limitations of the testing must be remembered. Comparative testing is one such use, as reflected, for example, in ASTM standards such as F1800—“Standard Practice for Cyclic Fatigue Testing of Metal Tibial Tray Components of Total Knee Joint Replacements.” According to the standard:

The loading of tibial tray designs in vivo will, in general, differ from the loading defined in this practice. The results obtained here cannot be used to directly predict in vivo performance. However, this practice is designed to allow for comparisons between the fatigue performance of different metallic tibial tray designs, when tested under similar conditions.

Similarly, F2345—“Standard Test Methods for Determination of Static and Cyclic Fatigue Strength of Ceramic Modular Femoral Heads,” notes:

In the fatigue test methods, it is recognized that actual loading in vivo is quite varied, and that no one set of experimental conditions can encompass all possible variations. Thus, the test methods included here represent a simplified model for the purposes of comparisons between designs and materials.

In addition, “the test data may yield valuable information about the relative strengths of different head and cone designs.” The use of the word “may” is of interest, in that it includes the possibility that the data may not yield valuable information.

Guidewire Maneuverability

A story from the catheter and guidewire industry is instructive with respect to the dichotomy between bench testing and clinical use. Catheters with single central lumens had long been tested for guidewire clearance by running the wire through the catheter while the catheter was held in a straight fixture. Clinical use supported this test, in that there were no complaints about guidewire passage. The manufacturer then developed a multilumen catheter with a noncentral lumen used for the guidewire. This configuration passed the straight bench test, but when used clinically, with three-dimensional bends and twists, the guidewire could not be easily moved.

Investigation showed that the distortion of the off-center lumen adjacent to two other lumens was much greater than that of a single central lumen. The use of a new three-dimensional bench test showed that the guidewire resistance was not acceptable; therefore, the design had to be modified. The problem was that the original bench test was not sufficiently demanding. Of course, going too far in the other direction—using an excessively tortuous test fixture—could lead to the rejection of a design that was, in fact, clinically acceptable.

Human Factors

Human factors testing is another arena in which the results of testing and real-world performance may diverge. The recent attention to cleanability of duodenoscopes offers an example here. FDA’s guidance document, “Reprocessing Medical Devices in Health Care Settings: Validation Methods and Labeling” addresses several key issues in making cleaning validation relevant to the clinical arena, including how dirty the scope should be before cleaning is attempted, who is doing the cleaning, and under what conditions.1 According to the guidance document:

The manufacturer should select an artificial test soil, the composition of which accurately represents materials that the device would likely be exposed to during an actual clinical use, and would create the greatest (worst-case) challenge to the cleaning process.

Furthermore, the document notes that “soil inoculations should mimic worst-case clinical use conditions.”

An example given here is to test combined blood and mucous loading rather than testing each separately. Multiple simulated uses of the same device are also recommended, and that “the cleaning validation protocols should use the shortest times, lowest temperatures, weakest dilutions, etc., for each step of the cleaning instructions.” For actual testing, FDA recommends that “study participants should be representative of the professional staff that would perform these actual reprocessing procedures,” that they should be wearing the personal protective equipment that they would wear clinically, and that the environment of use be simulated.

The testing and use of “safety” syringes provides another example of deviations between reported test results and actual clinical experience. It is well known that contaminated needle-sticks continue to occur clinically, despite the introduction of syringes designated as safety engineered. The principle issue here is that testing typically focuses on showing that the safety feature can be successfully activated when the device is used in a nonclinical setting, without clinical pressures, and without patients. The applicable FDA guidance document says that simulated-use testing should mimic “actual clinical use by using patient substitutes (e.g., instructional models) rather than actual patients.”2

The document then notes that fruit may be a suitable test vehicle, but I would suggest injecting a piece of fruit and activating the safety feature will provide only a weak model of the clinical environment. Fruit does not cry, fidget, move, or otherwise object, and there is no rush to get to the next fruit. FDA might agree, since the document goes on to say that “there are no standardized, validated methods to simulate clinical use of sharps injury-prevention features.” Because at least some devices that were tested and cleared using this guidance continue to cause sticks, it has been demonstrated that, for those devices, the testing was not an adequate demonstration of real-world clinical performance. As with mechanical fatigue, clinical incidents must be carefully examined. Why did someone get stuck by a device whose marketed purpose is to prevent sticks, a purpose for which it was tested? Blaming the user is a popular response, despite the fact that effective safety engineered devices are desired to overcome use-related errors in using nonsafety syringes.

In both of these examples, an additional question to ask is how much and what type of training the test subjects receive relative to how real clinical users are trained. One variable is whether the instructions for use are actually read by real users, especially given that clinical users are often happy to tell you they have never read (or know where to find) the instructions.3

Further, clinical personnel will sometimes assert they should not be expected to read the instructions, in part because there are so many devices and they are busy people.

Conclusion

Testing is an integral part of the medical device design cycle, and in many cases necessary for FDA premarket review. However, the testing of devices—and the positive conclusions drawn from such testing—are not always supported by what happens when devices are put into actual use. Taking a more rigorous approach to worst-case simulated testing—along with paying closer attention to why unanticipated failures occur—can reduce this discrepancy. Where simulated-use testing cannot be expected to provide a good prediction, meaningful clinical testing is needed, as is the gathering of rigorous postmarket data that goes beyond anecdotal incidents obtained from complaint handling. Furthermore, testing that has already been shown to be an inadequate predictor of clinical performance should not be acceptable in the ongoing clearance of new or modified devices.

References

1. FDA, “Reprocessing Medical Devices in Health Care Settings: Validation Methods and Labeling: Guidance for Industry and Food and Drug Administration Staff,” available from Internet: www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM253010.pdf.

2. FDA, “Guidance for Industry and FDA Staff Medical Devices with Sharps Injury Prevention Features,” available from Internet: www.fda.gov/downloads/medicaldevices/deviceregulationandguidance/guidancedocuments/ucm071755.pdf.

3. W.A. Hyman, “Medical Devices: Who Needs to Read Device Instructions?” in Patient Safety & Quality Healthcare [online] August 2014 [cited 24 July 2015]; available from Internet: psqh.com/july-august-2014/medical-devices-who-needs-to-read-device-instructions?highlight=WyJoeW1hbiJd.

William A. Hyman is professor emeritus of biomedical engineering at Texas A&M University (College Station, TX). Reach him at [email protected]

Sign up for the QMED & MD+DI Daily newsletter.

You May Also Like