Gerard J. Prud'homme
For more specific information on clinical trials, refer to the topics below:
- INTERNATIONAL CONFERENCE ON HARMONIZATION
PRINCIPLES OF GOOD CLINICAL PRACTICE
- ESSENTIAL ELEMENTS OF A CLINICAL TRIAL PROTOCOL
From the enactment of the Medical Device Amendments in 1976 to the early 1990s, more than 600 medical devices were cleared to market through FDA's premarket approval (PMA) process. Only a very small number of these PMA applications relied on data from randomized clinical trials. Throughout that period, FDA's Center for Devices and Radiological Health (CDRH) accepted as common practice the use of historical controls in support of applications for medical device approval. In the past two or three years, however, CDRH has taken significant steps toward imposing more stringent requirements on clinical trials used in support of PMA applications, and clinical studies are now required much more frequently to support 510(k) applications.
Why has the randomized clinical trial recently become the reference standard for medical device clinical studies? One response is that the device industry needed to catch up to the requirements of good science, and that such trials offer the only way to control selection biases among treatment groups. A more jaundiced view is that the push for randomized clinical trials is principally an infusion of "drug science" into device clinical studies. The truth probably lies somewhere between these two views.
In the evolution of device clinical trials, 1993 was a watershed year. It was in March of that year that the "Final Report of the Committee for Clinical Review," commonly known as the Temple report, was issued.1 That report described a pattern of deficiencies in the product applications the committee had reviewed, and identified lack of attention to basic study design as the fundamental flaw. The report concluded that the deficiencies were so serious as to impede the agency's ability to make judgments about the safety and effectiveness or substantial equivalence of the devices described by the applications.
Notably, the Temple committee was overwhelmingly composed of physicians and biostatisticians from FDA's Center for Drug Evaluation and Research (CDER); hence the beginning of "drug science" in-fluence in 1993. Bruce Burlington, who was appointed director of CDRH that year, had spent the previous five years as deputy director of CDER's Office of Drug Evaluation II. Later in the year, Susan Alpert, who had previously been a medical officer at CDER, became acting director (and later director) of the device center's Office of Device Evaluation (ODE). Both Burlington and Alpert are strong proponents of well-designed clinical trials to answer questions about the safety and efficacy of investigational devices.
In a speech on March 31, 1993, Burlington commented that "the Temple report is not an indictment of the past, but a consideration for how we might do things in the future." Later that year, CDRH issued a more definitive statement about the future in a draft document entitled "Medical Device Clinical Study Guidance."2 The intent of that document was "to instruct sponsors on clinical trial purpose and process" and to provide "the elements of good clinical study design, conduct, and analysis." Through the guidance, FDA made it clear that randomized clinical trials would be the wave of future device studies. In the section on study design, for example, the document states: "Other methods of treatment assignment can be devised . . . but, unless an explicit randomization scheme is used, it is difficult to ensure that the resulting assignments are free from . . . possible biases."
To promulgate the message of the draft guidance, FDA cosponsored a videoteleconference, "The Principles of Good Clinical Study Design," in January 1994. In early 1995, CDRH issued a draft document, "Clinical Trial Guidance for Non-Diagnostic Medical Devices," which reiterated many of the principles stated in the September 1993 guidance document.3 Indications are that the agency is planning to release other clinical study guidances, including one for in vitro diagnostics.
Taken together, these developments demonstrate that the device center's perspective on what constitutes good science for medical device clinical studies has changed dramatically in the past few years. The emphasis on randomized, blinded studies and a bias against the use of historical controls have seemingly become preferred ODE policy.
The design of modern medical device clinical studies must be set in the general context of good clinical practices (GCPs) guidelines that have developed substantially over the past two or three decades. Although the World Health Organization issued a guidance document on GCPs several years ago, FDA's version of a GCP guidance is of much more recent vintage.4 In August 1995, the agency published a draft "Guideline on Good Clinical Practices" under the auspices of the International Conference on Harmonization (ICH).5 The objective of the ICH guidance is to provide a unified standard that will facilitate mutual acceptance of clinical data by regulatory authorities in the United States, the European Union, and Japan. The document defines good clinical practice as an international ethical and scientific quality standard for the "design, conduct, performance, monitoring, auditing, recording, analyses, and reporting of clinical trials that provides assurance that the data and reported results are credible and accurate, and that the rights, integrity, and confidentiality of trial subjects are protected." FDA's CDER and Center for Biologics Evaluation and Research were among the six ICH sponsors.
The ICH guidance describes 13 basic principles ranging from the premise that all clinical trials should be conducted in accordance with well-accepted ethical principles to the notion that quality assurance should be built into all aspects of the study (see box, this page). Although CDRH was not a sponsor of this document, it is likely that the device center will pay close attention to these guidelines as future medical device clinical studies are evaluated.
Because the scientific integrity of a clinical trial and the credibility of its data depend substantially on the design of the trial, FDA's investigational device exemption (IDE) regulations require sponsors to submit an investigational plan for any clinical study involving a significant-risk device. The most important element of such a plan is the study protocol, which details all components of the proposed study.
The study protocol should be designed to address all of the basic questions to be examined by the investigation. These include the specific objectives of the study, the controls to be used, the number of patients to be enrolled, the type of masking to be used, what follow-up information will be collected, and many other issues. Taken together, the essential elements of a study protocol form a unified plan to address all such questions in order to determine whether the investigational device has a particular clinical effect. As described below, these essential elements are well recognized among experts in clinical trials, and are referred to in both the September 1993 FDA guidance and the ICH GCP document.
Objectives. Ultimately, the regulatory questions that FDA must answer in determining whether to clear a device to market are whether it is safe and effective (for devices undergoing PMA review), or substantially equivalent to a predicate product (for devices undergoing 510(k) review). However, neither of these questions is sufficiently specific to be used in developing a clinical protocol.
The first and key element of any study protocol must be a definition of the primary objective of the study or the essential research question to be tested. The study sponsor must state clearly the objectives of the study, and must formulate a specific hypothesis that will be tested to determine if the investigational device is safe and effective. FDA and the device sponsor should always agree on the study hypothesis before the first patient is enrolled. A flawed hypothesis will lead to a flawed clinical trial, from which the data will be insufficient to support a marketing application.
In deciding on the study objectives and hypothesis, several questions must be considered. Is the goal of the trial to show that the device performs better than or equivalently to the control? Is symptomatic relief sought or is a marked change in the disease process desired? Is the device to be used as the sole treatment regimen or will it be used as an adjunct to specified conventional therapy? The hypothesis will also invariably be tied to the types of patients to be studied.
In drafting the study hypothesis, the sponsor should always focus on the claims that will form the basis for marketing the device. For Class III devices, most marketing claims must have some foundation in the clinical study. Accordingly, the manufacturer should consider the study hypothesis and its design as part of its overall strategy for determining what marketing claims are necessary to have a commercially viable device.
Type of Trial. The study protocol should describe the specific type of trial to be conducted. Although clinical studies can be conducted at a single site, FDA strongly prefers that multicenter studies be used for devices that will require PMA submissions. Studies may be blinded or open, although blinding offers certain theoretical advantages. Study designs may be parallel, crossover, or factorial in nature--or variations on these themes.
Probably the most common design for device clinical trials is the parallel design, in which patients are assigned to one of two or more study groups, given the intervention for that group (that is, treatment with the device or with an appropriate control), and then followed to determine the outcome. A variation on the parallel design occurs in studies that are designed so that the patient acts as his or her own control. In such studies, baseline measurements are taken of the patient, the treatment or intervention is applied, the patient is followed, and the same measures are repeated. The before and after measurements are then compared, enabling patients to act as their own controls.
In a crossover design, each patient is given two or more treatments in a specified order. For example, some patients will receive treatment using the investigational device first followed by the control, and other patients will be treated using the control first. To avoid creating a carryover effect from the prior treatment, a washout period intervenes between study periods.
Factorial designs are also sometimes used in medical device clinical studies. In such trials, patients are assigned to one of two interventions (e.g., a new device or an active alternative therapy), to a control, or to both interventions. This type of study design is useful for assessing whether either intervention alone is effective, or whether a stronger or detrimental effect occurs when a combination of both treatments is received.
End Points. End points, or response variables, should be defined clearly and precisely. The sponsor should select a set of outcome variables that are as informative as possible, clinically relevant, and least prone to bias. Defining the specific end points in advance facilitates the tailoring of the study design and calculation of sample size. To ensure that there is agreement on the appropriateness of the sponsor's choices, important end points should be discussed with FDA prior to the initiation of the study.
End points can be either objective or subjective, depending on the device and the particular indications being studied, but they should be capable of unbiased assessment. Quantitative or categorical variables can form the basis for an end point. Thus, a change from one discrete state to another (e.g., living to dead), from one disease stage to another (e.g., from active disease to remission), or from one level of a continuous variable to another (e.g., level of pain) may underlie an end point. Sometimes response variables may be formed by combining a group of specified, individual measurements. Such a strategy can be useful when any one event would be likely to occur too infrequently to be observed in a reasonable number of patients, or when a combination of measurements is needed to comprehend whether the patient has truly improved clinically (e.g., a composite arthritis index that combines scores for stiffness; grip strength; and pain, tenderness, or swelling of the joints). The sponsor must also determine when during the course of the study the primary end point is to be measured.
The primary objective of the study must be addressed by primary end points, which form the principal bases for determining whether the device is safe and effective. To minimize confusion about the outcome of the trial, it is wise to limit the number of primary end points to one or two. A patient's outcome relative to a primary end point often results in the patient's response to treatment being defined as a success or failure, or in the patient being categorized as a responder or nonresponder. Understandably, the calculation of required sample size is based on an analysis of the primary end points. These should be distinguished from secondary end points, which are designed to address secondary study objectives.
For proving the safety or effectiveness of a device, FDA generally recommends against the use of surrogate end points--outcomes that are not themselves readily discernible as a clinical benefit to a patient but that may be correlated to a clinical benefit. For example, an improvement in some laboratory parameter may demonstrate that the device is working, but may not be widely understood as a clinical benefit to the patient. In some cases, however, FDA may find surrogate end points acceptable. A reduction in serum cholesterol, for example, may be an acceptable surrogate end point because its relationship to a clinical benefit is well described in the scientific literature. Where a surrogate end point is to be relied upon, it is essential that the study sponsor obtain a clear agreement with FDA that such an end point is appropriate.
Patient Population. The population of patients to be included in the clinical trial must be described in the study's eligibility criteria. These criteria will have a determinant effect on the ability of the sponsor to recruit patients as well as on the capacity of the study results to be generalized. It is therefore essential that sponsors develop clear, unambiguous inclusion and exclusion criteria when planning a clinical trial.
In practice, the patients actually enrolled in a study form a subset of the population defined by its eligibility criteria. Results of a study can only be generalized legitimately to patients similar to those enrolled in it. Thus, if the only patients enrolled in a study are those with mild or moderate disease, it may be difficult to apply its results to patients having a severe stage of the disease.
Inclusion and exclusion criteria typically relate to subject demographics such as age and sex, as well as to the stage of the disease being studied, pregnancy status, the patient's history with regard to certain chronic diseases, use of concomitant medications, the likelihood that the patient will complete all follow-up, and the presence of other confounding factors. FDA wants to ensure that investigators do not select patients based on personal preferences, thereby limiting the applicability of the device or masking some unidentified exclusion criteria. Study sponsors should therefore establish some mechanism to ensure that all eligible patients are offered an opportunity to participate in the study. To ensure comparability between the device and control groups, inclusion and exclusion criteria should be the same for all patients in the study.
The group of enrolled patients can be either homogeneous or heterogeneous in composition. A homogeneous study population may make the assessment of efficacy more straightforward, because it does not include prognostically distinct subgroups. However, if eligibility criteria are defined too narrowly, recruitment may be hampered and study results may not be readily generalizable. Heterogeneous populations may afford an opportunity to discover whether the device is effective in different subgroups of patients, but may necessitate a larger sample size to account for those subgroups. Determination of the acceptable level of heterogeneity for a study population is important for study sponsors.
Formulation of eligibility criteria should be guided by a sponsor's concern for patient safety and the need to demonstrate efficacy. Patients who will likely benefit from the device are obvious candidates. Subjects for whom the treatment is thought to be harmful or who are likely to withdraw from the study prematurely are often excluded. Marketing claims also should be considered in drafting eligibility criteria. If the company's marketing experts say that a specific patient population is essential for the ultimate commercial success of the device, that population must be represented in the study.
Investigational Device. The protocol should contain a characterization of the investigational device design, as well as a description of its principles and means of operation. The sponsors should pro-vide instructions regarding the manner in which the device should be operated, and should define the duration, frequency, and extent of its application.
Control Group. In recent years, ODE's unmistakable message to device manufacturers has been that some type of control is essential to the conduct of a device clinical trial. Having a control group permits the study sponsor to reason that the observed improvement in the treatment group at the end of the trial is due to differences between the investigational device and the control, and not to other factors.
Because there may be a wide range of biological variations among study subjects, and because different patients may respond differently to any given intervention, use of an uncontrolled clinical trial usually makes it impossible to ascertain whether a new device has made a difference in outcome. A controlled clinical trial enables sponsors to compare the effects of an investigational device in one group of patients to the effects of a control in another group of patients. Use of a control group permits the safety and effectiveness of an investigational device to be more clearly observed and subsequently evaluated in comparison to another therapy.
Several types of controls can be incorporated into clinical trials. Under certain limited circumstances studies can make use of self controls, patients for whom certain clinical variables are measured before and after implementation of the investigational device, and who therefore serve as their own controls. Active concurrent controls are patients under the direct care of the study investigator who are assigned to an intervention other than the investigational device (e.g., a placebo or alternative treatment); for comparative purposes, data about these patients are recorded on case report forms (CRFs). Passive concurrent controls may receive an alternative intervention, but are not under the direct care of the study investigator. Historical controls are a prior series of patients who should be comparable to the device group patients and who may or may not have received an active intervention; by definition, they are nonrandomized and nonconcurrent controls.
Historical controls are no longer in favor at ODE because it is difficult to establish that they are comparable to patients in the device group, and often the desired follow-up measures are not uniformly documented among them. Moreover, if an improved device is compared with historical controls using an older similar device, improvement in complication rates may derive solely from improved surgical techniques or improved concomitant therapy, rather than from a clinically superior device. Consequently, FDA's unmistakable preference is for clinical trials that make use of active concurrent controls. In the agency's view, use of concurrent controls makes it easier to evaluate comparability at baseline between the control and device groups and to ensure that both groups will be handled similarly during the course of the study.
Without doubt, it will be very difficult in the near future to rely on historical controls for device clearance. If a sponsor believes that use of such controls is justified, detailed discussions with FDA are essential, especially concerning the issue of matching the controls with the active patients. If the proposed control group is one described in the scientific literature, FDA and the sponsor must agree on whether that literature provides sufficient detail about the control patients to provide an adequate basis for comparison.
Selecting controls for device studies is more problematic than for drug studies. Devices or medications selected as controls should have been previously approved by FDA. Examples of controls that can be used in clinical trials of medical devices include older versions of the same device, different devices, sham devices (devices that look the same as the test device, but do not deliver therapy), medications for the same intended use, other surgical procedures, and no therapy (that is, the natural history of the medical problem if left untreated). When selecting a control, manufacturers should bear in mind that it is often easier to obtain market approval from FDA when the new device performs better than an alternative therapy than when its performance is merely equivalent to the control.
Assignment of Intervention. At baseline--before treatment with the investigational device or the control begins--the control and device groups should be similar, so that differences in outcome may be reasonably attributed to the device being studied. Otherwise, it is often difficult to meaningfully compare the rates of therapeutic success in the two groups, even with statistical adjustment. Accordingly, it is essential that relevant factors be assessed at baseline to determine whether the treatment and control groups are comparable, and whether statistical adjustment is feasible if they are imbalanced.
Assignment of patients to the investigational device group and the control group in a systematic manner that avoids selection bias is an important aspect of sound study design. Selection bias occurs when patients with certain characteristics are more readily assigned--intentionally or unintentionally--to one treatment group than to another. The result of selection bias is that patients who are characterized by important prognostic factors may be disproportionately assigned to one group, thus confounding the interpretation of any differences in outcome between the groups.
It is generally accepted that randomization is the preferred method of assigning patients to study groups in order to minimize selection bias. Randomization tends to guard against imbalances between groups, protects against conscious or unconscious actions of study investigators that could lead to biased assignment of patients, and provides the probabilistic basis for most statistical analyses. Randomization can be carried out centrally by the sponsor, or locally by each study investigator.
Depending on the study, randomization procedures can be tailored to specific needs. For example, in block randomization an equal number of patients are assigned to the various treatment groups from a specified number of enrollees (e.g., randomizing three patients to the device group and three patients to the control group for every block of six patients). In most medical device clinical trials patients are assigned in equal numbers to the device and control groups, but this is not always necessary. If it is practically difficult to recruit an equal number of control patients or if there is a considerable body of knowledge describing how the control group is likely to behave, then a higher ratio of device patients may be appropriate.
Sample Size. In its 1993 report, the Temple committee observed that clinical trials in which sample size requirements were not carefully considered lacked statistical power or the critical ability to detect device effects of clinical importance.1 Because the number of patients to be recruited is significant to all phases of the clinical study, this issue must be considered early in the planning stage.
FDA frequently requires that sponsors provide a complete statistical justification for their proposed sample size. Sample sizes for studies of therapeutic devices typically vary from less than 100 to nearly 300 patients; for studies of diagnostic devices several hundred to more than 1000 samples may be needed.
Intuitively, the greater the anticipated clinically meaningful differences between patients treated with a new device and those treated with a control, the smaller will be the number of patients required to demonstrate those differences. Similarly, the smaller the anticipated differences, the larger will be the number of patients required to detect whether there is a real difference. When a company is designing a study to show that its device is equally effective as another, however, intuition may not be a sufficient guide for selecting an adequate sample size.
The sample size required for a clinical study is determined by testing it against the particular hypothesis stated by the sponsor. For example, the null hypothesis may be that the proportion of patients with a successful outcome in the investigational device group is the same as that in the control group. In testing this hypothesis through the clinical study, two types of error can be made. A Type I error occurs when the null hypothesis is incorrectly rejected--that is, when it is concluded that the test device is better, when in reality it is no better than the control. A Type II error occurs when the null hypothesis is incorrectly accepted--that is, when it is concluded that there is no difference even though the test device is better. The probability of this error is referred to as beta (ß), and 1 ß is the power of the test or the ability to detect a real difference of a specified magnitude between treatments. ODE usually prefers to see hypotheses tested at a 5% level of significance with at least 80% power.
Factors that affect determination of sample size include the primary end point to be analyzed, the selected size of Type I and II errors, and assumptions about the anticipated success rates in the device and control groups. When planned or anticipated, subgroup analyses, significant variations between study centers, prerandomization stratification, and study dropouts may also affect the proposed sample size. Whether a study is designed to show a difference in effectiveness or equivalence to a predicate device will also have an impact on sample size calculations.
Masking. If patients are aware they are receiving a certain treatment, they may imagine they are experiencing certain beneficial or adverse effects. If investigators know the intervention assigned, some subjects may not be followed as closely as others, or some adjunctive therapies may be disproportionately applied to one group of patients. To reduce the potential for bias arising from these sources, clinical trials often make use of masking, also called blinding.
Masking can take various forms in clinical trials. In an unblinded study, both the patient and study investigator know what treatment has been assigned. In a typical single-blind study, the investigator knows what treatment has been assigned, but the patient does not. In a double-blind study, neither the investigator nor the patient knows the assignment of intervention. Sometimes modified double-blind studies are conducted, where the study investigator responsible for implementing the device knows the treatment assignment, but the observer responsible for evaluating safety and effectiveness outcomes is unaware of the treatment assignment. Triple-blind studies include those where, in addition to the patient and investigator being blinded, the committee monitoring or analyzing the study results does not know the identity of the groups.