| News |
A group of studies would seem to paint conflicting pictures of the recall rates of medical devices approved via the 510(k) process. MD+DI takes a closer look at what might account for the differences.
In today’s world of fast-flying rhetoric, research studies are often presented as objective, scientific proof of a given argument’s validity. Emotional appeals can be dismissed and misleading statements put into proper context, but studies, with their dispassionate statistics and cold, hard fact-based conclusions, are often presented and accepted as the final word on a controversial subject. But what happens when multiple studies, each held up as something like scientific gospel, reach irreconcilable conclusions about the same subject? What happens when the numbers do not, so to speak, compute?
The device world saw just such a statistical collision come to a head in April, when the Senate Special Committee on Aging held a hearing examining the safety of medical devices, specifically regarding the ever-controversial 510(k) approval process. A variety of expert testimony was given, much of which centered on three studies that, when taken together, painted a confusing and conflicting picture. Two studies were referenced that deemed the 510(k) process to be safe, and one was referenced that deemed it to be dangerous.
The first study was written by Ralph Hall, a professor at the University of Minnesota Law School, and originally presented to the Institute of Medicine in July 2010. According to the written testimony he submitted to the Senate committee, Hall examined Class I recalls of medical devices between 2005 and 2009, finding that “over 99.5% of 510(k) submissions assessed during this study period did not result in a Class I safety recall.” He concluded that “510(k) regulated medical devices have an excellent safety profile.”
The second study, which was prepared by the Battelle Memorial Institute for AdvaMed in September 2010, reached a similar conclusion. The study found that, between January 2005 and May 2010, 77 devices that had been approved via the 510(k) process were removed from market via Class I recalls. Dividing this number against the total number of 510(k) approvals since 1998, the study authors concluded that the 510(k) process had a 99.8% safety rate.
The third study, which was written by Diana Zuckerman, PhD; Paul Brown; and Steven Nissen, MD, and published in the Archives of Internal Medicine in February 2011, reached a seemingly opposite conclusion. That study examined Class I recalls between 2005 and 2009 and determined that 71% (80 of 113) had been approved via the 510(k) process. “Our findings reveal critical flaws in the current FDA device review system and its implementation that will require either congressional action or major changes in regulatory policy,” the authors wrote.
If two studies assesses recall rates and conclude that the 510(k) process is safe, and another study assesses recall rates and concludes that the 510(k) process is unsafe, only one conclusion can be right. Right?
Not necessarily, says Anthony Hayter, PhD, a statistics professor at the University of Denver.
“I think, in reality, each conclusion can be correct,” Hayter says. He points out that the Hall and Battelle studies are making claims about overall recall rates, while Zuckerman et al. are making claims about the group of recalled products. “All three of those conclusions can be true at the same time,” he says.
Hayter was part of a group of statisticians who agreed to examine the three studies for MD+DI. All are unaffiliated with the medical device industry.
At the Senate hearing, Zuckerman and Hall questioned each other’s studies.
To derive the 99.5% safety rate, Hall’s study divided the total number of submissions between 2000 and 2009 (39,747) by 10 to get a yearly average. Hall then multiplied that number (3974.7) by 5 to get an approximation of the 510(k)s that were submitted in the five-year period being examined (19,873). Taking the total number of 510(k)s that resulted in Class I recalls between 2005 and 2009 (89) as the numerator and the five-year average of 510(k) submissions as the denominator, the study concluded that 0.05% of 510(k)s result in Class I recalls.
In her written testimony, Zuckerman objected to this methodology. “Submissions are not appropriate for use as a denominator because many devices that were submitted were not cleared, and even those that were cleared were not necessarily ever sold in the U.S.,” she wrote. “That makes Hall’s denominator much too large and his calculation of the percentage of 510(k) devices that were recalled much too small.” She also objected to the decision to draw the average number of submissions from a 10-year time period. She says this would result in the inclusion of an unknown number of devices that had already been recalled by 2005, as well as a number of faulty devices that had not yet been recalled by 2009.
Zuckerman credited the Battelle study for analyzing devices that were cleared and not merely approved, but objected to the denominator including devices from a 10-year period, because the numerator is limited to devices that were recalled in a five-year period.
Michael Lavine, a professor of mathematics and statistics at the University of Massachusetts-Amherst, says Zuckerman makes some legitimate points about Hall’s study. “I thought those were good critiques,” Lavine says. Regarding her assessment of the Battelle report, however, Lavine has some reservations. “I think Battelle maybe could have been a little more thorough,” he says, “but I don’t see [Zuckerman’s] critiques [about that] being as strong as the ones about Hall.
“It’s clear that the two time periods should not be exactly the same, because when a device comes to market, it may take awhile for us to realize what flaws there are,” Lavine says. “So you want to look at things that were approved for market some time before the recall. And I think there could be honest disagreements about what is the appropriate time period of approvals that we should look at. I think there are fair and honest differences about that.”
Zuckerman may have raised a fair point of dissent regarding Battelle’s choice of denominator, Hayter says, but he would have liked her to posit a better solution. “She can’t argue that Battelle’s question is not an important question to answer,” Hayter says.
Hall had an answer for Zuckerman’s criticisms at the Senate hearing. He pointed out that his study attempted to measure whether the recalls resulted from problems that would have been evident during premarket evaluation or whether the problems developed later.
“We think that’s critical because many recalls—the majority—have nothing to do with the premarket system,” Hall told the Senate subcommittee. “To analyze a premarket system using recalls that have nothing to do with the premarket system creates a result that has little validity.”
Hall also took issue with the way Zuckerman did not account for the larger picture of device approval rates. He reasoned that because about 10 times as many products go through the 510(k) process as go through the PMA process, it should be expected that there would be more recalled products that went through the 510(k). Hall also objected to what he saw as a linking of the recall classifications with the product classifications. “Those are two separate questions,” he told the Senate. “For example, you can have a very low-risk device that, because of a particular issue, has a very high risk to it.”
Former Duke University statistics professor David Peterson says that Hall’s assertions regarding the greater overall number of 510(k) submissions may represent a legitimate objection to Zuckerman’s analysis, depending on what question is being addressed. “If Zuckerman’s point is simply that most serious recalls come through the 510(k) process, it may be helpful or at least interesting to know that of all devices introduced into the market, an even greater proportion of them are approved through the 510(k) process, which is Hall’s point,” he says. “However, Hall’s point does not refute Zuckerman’s.”
Peterson explains that if Zuckerman is merely arguing that any devices that have gone through the 510(k) process have resulted in Class I recalls, “then it doesn’t really matter whether we throw in a denominator or not; the measure is still different from zero, and thus supports her point. All of the studies agree that there were such recalls.”
In her submitted testimony and later in her spoken testimony, Zuckerman seemed to stress that the main problem is that Class I recalls are occurring at all for products cleared through 510(k), which she says was meant for low-risk products.
Peterson and Lavine both took interest in one aspect of Hall’s study that seems to have gone relatively unnoticed in the broader debate. Hall concluded that approximately 55% of all recalls between 2005 and 2009 could be traced to postmarket issues. Citing this figure, Hall argued that “a majority of all Class I recalls involve problems or issues that arose after market release and could not be affected by premarket approval systems of requirements.” However, Peterson and Lavine both pointed out that this means nearly half of all recalls could be traced to premarket issues.
“While [Hall’s study] seems to come down pretty solidly on the side of saying that there’s really nothing much that needs fixing,” Peterson says, it “takes apart the results of the different processes in a way that suggests there are areas that could stand improvement, mostly having to do with devices that fail … because of design difficulties. And although Hall doesn’t seem to fasten on that particularly, it strikes me that that may be the most important result of all the studies…”
Lavine says Hall’s majority is only one in the most semantic sense.
“It’s true that it’s more than half,” Lavine says, regarding the number of recalls that can be blamed on postmarket issues, “but it’s not much more than half, so I didn’t find that very convincing.”
“I think he somewhat misinterprets his own message,” Peterson says of Hall.
The statisticians all say the three studies provide valuable information, though they varied as to just how useful they found that information to be.
Peterson says Zuckerman and Hall are “both reaching for an answer to a question that doesn’t make too much sense, which is, ‘The present system is okay or the present system is not okay.’ Because it seems to me almost axiomatic that the present system simply is, and it most likely can be improved, and that’s a more important issue than whether the present system is okay.”
The way the studies in question are presented is not unusual, says Brad Carlin.
“There’s only an objective truth if you and I can agree on what it is you want to measure,” says Carlin, a biostatistics professor at the University of Minnesota. Carlin, who did not examine the studies discussed at the hearing, says he “constantly” sees statistics manipulated in the course of rhetorical debate.
“People are not that numerate in general,” Carlin says. “So they rely on experts to tell them what this stuff means, and it’s easy to manipulate the data in your favor.”
Hayter cautions that it’s important to consider the whole picture.
“And for a given situation, you could just take out one set of numbers that would suggest one thing at first glance, and you could take another set of numbers and it might suggest another thing at first glance,” he says. “First glances can be deceptive.”
The Government Accountability Office (GAO) also presented a follow-up assessment of recommendations it made to FDA in 2009 regarding the 510(k) process. This testimony (which indicted FDA for clearing too many high-risk devices through 510(k) and failing to properly handle recalls) did not make claims about recall rates, and thus did not play a large role in the debate over the studies described above.
process improvement
This is an interesting article. Peterson’s argument - that Hall’s study is irrelevant because processes can always be improved - is academic. (Full disclosure: I haven’t read any of the studies.) Of course processes can always be improved, but do they HAVE to be improved? In order to manage costs - in any endeavor from healthcare to automobiles to space exploration - society has to agree on an acceptable level of risk. Is it one or two shuttle explosions out of 100 launches? Is it 1 or 100 drivers killed due to brake failures? I think this was the question that Hall was trying to answer: is it imperative that the FDA change the 510(k) process, or should we be spending our tax dollars and licensing fees on something more important? Hall apparently concludes that the recall rate for 510(k) cleared devices is acceptable. Instead of arguing over the statistics in the various studies, perhaps we should be arguing over what constitutes an acceptable recall rate and how we define it.