Today, manufacturers routinely conduct summative (i.e., validation) usability tests to determine if their medical devices are safe to use “as is,” or if they require user interface design modification(s) to ensure safe use. Tests that go smoothly usually reflect a development team’s application of good human factors engineering. Conversely, tests that go poorly usually reflect at least a degree of neglect in terms of applying human factors engineering fully, particularly conducting formative usability tests ahead of the higher stakes, summative usability test.
Many manufacturers respond to poor summative usability test results by making superficial changes to their devices, comparable to applying a bandage to a wound that really needs sutures. For example, to reduce the chance that a device user will commit a safety-related use error, a developer might insert a warning into the device’s instructions for use (IFU), hoping the low-cost risk control measure works and eliminates the need for a more substantial and costly user interface design change.
Alternatively, or in addition to making IFU changes, a manufacturer might state in a given device’s IFU that all users should be trained properly before using the device. However, relying on training to serve as the primary risk control measure is a precarious strategy. First and foremost, stipulating that all users receive effective training might not accurately reflect reality. Consider, for example, the temporary nurse agency that sends a junior nurse into an understaffed hospital unit, resulting in the nurse doing his or her best to operate an unfamiliar device (e.g., infusion pump, hospital bed, glucose meter). In this scenario, the nurse will focus on helping an otherwise short-handed care unit, not spending his or her time learning more than basic device operations, perhaps with time-constrained support from an experienced colleague. Second, people are prone to forget at least some of what they might have learned in training.
Implementing superficial mitigations against user-interaction problems— use errors being the most troubling—often signals a manufacturer's underlying reluctance to make meaningful design changes that might eliminate the user-interaction problems altogether. Late-stage design changes, such as those that require revising and revalidating software code or changing hardware tooling, can be time consuming and costly, threatening development budgets and launch schedules. Moreover, late-stage changes can be an anathema to project managers, engineers, and designers who worked long and hard to develop the regrettably underperforming device. Therefore, a “make it pass” attitude can develop, creating an obstacle to changing the user interface of what the development team assumed to be a production-equivalent device.
And so, kudos to development teams that view poor usability test results as an important signal to revisit their design; to make their device safer by implementing meaningful user interface changes rather than reaching for bandages and perhaps running an arguably easy-to-pass, overly lenient usability test. Such teams accept that it is better to eliminate a design shortcoming as soon as they discover it— to recognize the “inconvenient truth” of it—rather than press forward with a compromised (i.e., flawed) device. The good news for them is that they probably have chosen the most reliable path toward a successful design validation, device approval, and commercial launch.
Manufacturers that have not yet adopted the product development strategy of fixing rather than patching fundamental user interface design shortcomings should consider it. In fact, the change in strategy just might be essential to commercial success in a new era. The 2010s are a time when regulatory bodies routinely reject (or withhold approval) of devices, particularly those intended for home use, which appear vulnerable to potentially hazardous usability problems or if the device’s use safety is indeterminate.
As discussed in regulatory guidance and standards, and texts (see the sidebar “Recommended Reading”), manufacturers should establish human factors engineering programs and procedures to ensure that intended users will operate a medical device safely and effectively. Therefore, this article focuses instead on lessons that manufacturers have learned (usually the hard way) about how to succeed at user interface validation. For simplicity, the lessons are based on manufacturers’ experiences in the United States and dealing with FDA. However, the same lessons should apply to user interface validation efforts and regulatory processes in many other countries.
Test a Production-equivalent Device. Summative usability testing should involve a production-equivalent device, or at least the version that the manufacturer intends to use on humans after receiving regulatory approval to do so.1 Methodologically speaking, testing anything less than a production-equivalent device carries the risk that subsequent changes to the device will introduce usability problems that could reduce the device’s safety and effectiveness. For example, changing a symbol’s color from blue to green, reducing a button’s height and width by a quarter inch, or rewording an on-screen prompt could change how users perform tasks, perhaps inducing a potentially harmful use error. This is not hyperbole. Small user interface design changes can dramatically influence user interactions with a device, sometimes improving them, but sometimes degrading them substantially.
Involve Representatives of All Distinct User Groups. Good luck to the manufacturer that conducts a summative usability test that excludes, either by intent or neglect, what FDA terms a “distinct user group.” FDA will almost surely call for additional testing to complete the picture of how well the intended users—all significant types—fare when using the device. For example, a manufacturer that only includes adult participants in a test of a home-use device might be directed by FDA to conduct additional testing with adolescents and children if they are identified as intended users. Or, the agency might ask the manufacturer to perform a supplemental test involving a layperson (e.g., an individual who helps a child, friend, or spouse use a device), and healthcare providers. The agency might even ask for a supplemental test involving device installers and maintainers if such workers perform safety-critical tasks with the given device. A third possibility is that FDA calls for further user group differentiation, such as a test that involves nurses that have different training and responsibilities, such as critical care nurses trained in advanced cardiac life support versus licensed practical nurses who deliver care to lower acuity patients living in their homes. Notably, the agency generally expects summative testing to involve at least 15 representatives of each distinct user group.2
Link User Tasks to Use-related Risk Analysis. Logically, a summative usability test focused on use-safety should require users to perform safety-related tasks. Extending the logic, the best way to develop a complete list of safety-related tasks is to consult the use-related risk analysis (e.g., failure modes and effects analysis). Therein, one should find a list of potential use errors and their associated risk ratings; the raw material from which human factors specialists can build a comprehensive set of realistic tasks for users to perform in a summative usability test. Subsequently, task performance during a usability test will assess whether the manufacturer has effectively controlled use-related risks. Lacking such a link between the risk analysis and user tasks, regulators would have no reliable basis for concluding that the test had focused on the tasks of greatest concern from a safety standpoint.
Determine Root Causes of Interaction Problems. It is imperative to determine, with reasonable certainty, the root cause of all use errors and other significant interaction problems that occur during a usability test. Accordingly, usability testing specialists should conduct the necessary observations and question test participants afterward to understand their perspective on what might have led them to err while performing tasks (e.g., press the wrong button, enter the wrong parameter value, connect a tube to the wrong port, forget to set an alarm limit, allow a fluid reservoir to run dry, and commit any other type of use error). Determining the root cause of interaction problems (e.g., an undersized button, small data input field, lack of connector color-coding, missing prompt, or hidden fluid reservoir and lack of fill-level display) is the first step toward fixing it.
Give Test Participants Representative Training. FDA expects that summative usability test participants will receive training if the intended device users, or at least some of them, will receive training before using a given medical device. For instance, they recognize that a perfusionist might be trained to use a new heart-lung machine; that a surgeon might be trained to use a surgical robot; and that a layperson might be trained to use an insulin pump. In fact, one FDA staffer unofficially equated withholding training from certain users who use certain types of devices as equivalent to asking a pilot to fly a new aircraft without prior orientation.
Therefore, a summative usability test of certain devices might include only trained test participants, or perhaps some trained and some untrained participants if training is common but not assured. In either case, it is important to deliver representative training rather than improvising something at the last minute, perhaps because training development has lagged device development. Hearing that a company is ready to perform a summative usability test, but has not yet developed the accompanying training is a red flag indicating that such testing is premature. Poor training can induce user-device interaction problems just as well as a user interface design flaws during a usability test, thereby sinking an otherwise well-designed device. That said, a manufacturer should provide representative training—training that is no more extensive and no better than they expect actual users to receive. The underlying lesson is not only to provide test participants good training, but also to make sure that real users receive good training. Of course, some devices are meant for use without prior training and, therefore, training test participants would be inappropriate.
Give Test Participants Good Learning Tools. Just as it is important to give users good training, it is also important to give them effective learning tools (e.g., quick reference card, user manual, online help), rather than ones quickly put together just prior to the test. Firstly, learning tools are technically part of a device design, and so one should be testing the best and final set, which ideally has been evaluated in prior formative usability tests. Secondly, poor learning tools are just as capable of spoiling a usability test as poor training. For example, unclear wording in a procedural guide could lead a test participant to perform steps in the wrong and potentially hazardous order.
Do Not Assist Test Participants. What’s the quickest way to cause a medical device to fail its summative usability test? Give test participants assistance with a task. FDA requires human factors specialists to declare an assisted task as a failure, and this posture is logical. A test participant that needs assistance is experiencing difficulty performing a task and could commit a potentially harmful use error. FDA does not want even well-intentioned, empathetic test administrators interfering with the test participant as he or she performs a task because the test administrator would not be available to offer assistance in an actual use scenario.
Write an Excellent Test Report. Suppose a manufacturer has performed a perfect test that produced sterling results. The manufacturer still faces the challenge of reporting the test approach and results coherently and succinctly. Lacking a good report, FDA and other regulators would struggle to find and interpret the test results. Keys to a good report include fully and clearly describing the basis for choosing the test participants and tasks, and describing the test participants’ task performance and safety-related impressions of the device with a strong emphasis on interaction problems and their root causes. One more key is to refrain from claiming that significant use errors pose an acceptable risk because other products on the market (e.g., the predicate device cited in a 510(k) application) have the same, unmitigated vulnerability. FDA is likely to reject such a claim, because a claim that a device is no less dangerous than other products already in use is not a safety claim but actually a danger claim.
Perform a Residual Risk Analysis. This lesson steps away from the subject of usability testing for a moment. It focuses on what a manufacturer needs to do with the test results, namely treat them as an input to a follow-up, use-related risk analysis. Indeed, FDA expects that manufacturers will perform a follow-up risk analysis of every use error as well as patterns of interaction difficulties, including close calls — cases when a test participant almost makes a mistake. This is not to say that FDA has a zero-tolerance policy regarding use errors. The agency seems to accept that users will make mistakes.3 However, they expect a manufacturer to analyze all mistakes and patterns of interaction difficulties to determine if further risk control measures are necessary. Importantly, do not discount a use error-inducing problem simply because it only happened once or a few times during a test. FDA has made it clear that claims such as “95% of the test participants performed the task correctly” are unacceptable. Ideally, manufacturers will identify and mitigate all critical user interaction problems during formative usability testing, making the summative usability test a successful exercise [It’s not fail safe.]. This is why FDA seems to strongly encourage manufacturers to conduct at least one and preferably several formative usability tests ahead of a summative usability test.
Do Not Force Users to Pay Attention to Learning Tools. FDA is critical of manufacturers that direct users to read an IFU before or during tasks performed during a usability test. The agency considers forced attention to learning tools to be artificial, distorting how users interact with a given medial device. Accordingly, it is best to make learning tools available to test participants in the same manner that they would be available in a real use environment and scenario.
Really Fix the User Interface. The last lesson learned is probably the most important one. That is, manufacturers should fix user interface shortcomings rather than put bandages on them, as discussed earlier. A bandage approach, such as adding warnings to an IFU or reformatting the IFU, rarely prevents use errors effectively. Sure, it is good to optimize an IFU and ensure it includes all appropriate warnings. But, there is no assurance that users will read it.
ANSI/AAMI HE75:2009, “Human Factors Engineering—Design of Medical Devices,” (Arlington, VA: Association for the Advancement of Medical Instrumentation. 2009).
IEC 62366:2007, “Medical Devices—Application of Usability Engineering to Medical Devices,” (Geneva, Switzerland: International Electrotechnical Commission, 2007).
M Wiklund, J Kendler, and A Strochlic, Usability Testing of Medical Devices, (Boca Raton, FL: CRC Press, 2011).
Since FDA changed the quality systems regulation in 1996 to address user needs, usability testing of medical devices has been transformed from an uncommon, value-add activity into a necessary and mission critical one.4 Accordingly, when a medical device fails a summative usability test, CEOs get very concerned, and for good reason. A failed test can substantially delay a device's regulatory approval and commercial launch because of the need to revise and retest the device. In some cases, a failed summative usability test has contributed to a company's ultimate demise. Therefore, medical device manufacturers have to bring their “A” game to conducting usability tests.
This means building a strong, in-house, human factors engineering capability or at least retaining human factors consultants to run usability tests. Taking either approach, manufacturers should consider the lessons learned by others, dispensing with the flawed impression that past is prologue when it comes to regulatory approval of devices that require safety-critical user interactions. Just because a regulatory authority such as FDA approved a manufacturer’s current generation pump, monitor, or injection device without a supporting validation usability test does not mean it will happen again. It is best to assume that it will never happen again and invest accordingly, fixing the given device rather than putting patches on user interface design problems that arise during a preliminary evaluation (optimally a lower-stakes, formative usability test). And to personalize the matter, would you like a clinician to operate on you or a loved one with a life-preserving device that is "patched," perhaps by including an important, safety-related guidance in its IFU? Or, would you like the device to have safety engineered directly into it? Yes, that is a rhetorical question, as most people would prefer the latter.
1. FDA draft guidance “Applying Human Factors and Usability Engineering to Optimize Medical Device Design, Section 10: Human Factors Validation Testing” (Silver Spring, MD: FDA, 2011).
2. FDA, draft guidance, “Applying Human Factors and Usability Engineering to Optimize Medical Device Design, Section 10.1.2: Test Participants (Subjects)” (Silver Spring, MD: FDA, 2011).
||Michael Wiklund is founder and president of Wiklund Research & Design Inc. (Concord, MA), a consulting firm offering user interface research, design, prototyping, and usability testing services. He has made substantial contributions to sections of AAMI HE75-200X addressing software user interface design, workstation design, and mobility. He has devoted his career to making a variety of devices more user-friendly, including heart monitors, dialysis machines, defibrillators, blood glucose meters, and wheelchairs. Wiklund is a member of MD+DI's Editorial Advisory Board. Contact him at firstname.lastname@example.org.