Design of Experiments: An Overview and Application Example

Medical Device & Diagnostic Industry
| MDDI Article Index

John S. Kim and James W. Kalb

A strategy for planning research known as design of experiments (DOE) was first introduced in the early 1920s when a scientist at a small agricultural research station in England, Sir Ronald Fisher, showed how one could conduct valid experiments in the presence of many naturally fluctuating conditions such as temperature, soil condition, and rainfall. The design principles that he developed for agricultural experiments have been successfully adapted to industrial and military applications since the 1940s.

In the past decade, the application of DOE has gained acceptance in the United States as an essential tool for improving the quality of goods and services. This recognition is partially due to the work of Genichi Taguchi, a Japanese quality expert, who promoted the use of DOE in designing robust products--those relatively insensitive to environmental fluctuations. It is also due to the recent availability of many user-friendly software packages, improved training, and accumulated successes with DOE applications.

DOE techniques are not new to the health-care industry. Medical researchers have long understood the importance of carefully designed experiments. These techniques, however, have not been applied as rigorously in the product and design phases as in the clinical evaluation phase of product development. The recent focus by FDA on process validation underscores the need for well-planned experimentation. Such experiments can provide data that will enable device manufacturers to identify the causes of performance variations and to eliminate or reduce such variations by controlling key process parameters, thereby improving product quality.

Properly designed and executed experiments will generate more-precise data while using substantially fewer experimental runs than alternative approaches. They will lead
to results that can be interpreted using relatively simple statistical techniques, in contrast to the information gathered in observational studies, which can be exceedingly difficult to interpret. This article discusses the concept of process validation and shows how simple two-level factorial experimental designs can rapidly increase the user's knowledge about the behavior of the process being studied.


The purpose of process validation is to accumulate data that demonstrate with a high degree of confidence that the process will continue to produce products meeting predetermined requirements. Because such capability is necessary to ensure that products perform safely and effectively, process validation is required by FDA's good manufacturing practices (GMP) regulation. For products that will be exported to the European Union, the International Organization for Standardization's ISO 9000 series of standards also requires that certain processes be identified, validated, and monitored.

Table I shows the sequence of events in the product development cycle that lead to process validation, along with the tasks to
be accomplished at each phase and selected tools to be used. As the table indicates, during the process development phase the process should be evaluated to determine what would happen when conditions occur that stress it. Such studies, often called process characterization, can be done by varying the key elements of the process (i.e., equipment, materials, and input parameters such as temperature, pressure, and so forth) and determining which sources of variation have the most impact on process performance. One proven method to determine the sources of variability is DOE.

The process should also be challenged
to discover how outputs change as process variables fluctuate within allowable limits. This testing is essential to learning what steps must be taken to protect the process
if worst-case conditions for input variables ever occur during actual manufacturing operations. Once again, an effective method for studying various combinations of variables is DOE. In particular, simple two-level factorial and fractional factorial designs are useful techniques for worst-case-scenario studies.


One traditional method of experimentation is to evaluate only one variable (or factor) at a time--all of the variables are held constant during test runs except the one being studied. This type of experiment reveals the effect of the chosen variable under set conditions; it does not show what would happen if the other variables also changed.

For example, blood coagulation rate could be studied as a function of ion concentration and the concentration of the enzyme thrombin. To measure the effect of varying thrombin levels, the ion concentration is held constant at a prechosen low level. Since there is variability in the coagulation time measurement, at least two experiments should be run at each point, for a total of four runs. Figure 1 shows the design and hypothetical results for such an experiment. The average effect of changing the thrombin level from low to high is the average at the high level minus the average at the low level, or

 (30 +  29)   (10 +   9) 
 __________ ­ _________ = 29.5­9.5 = 20.
     2          2

Similarly, to measure the effect of ion concentration, thrombin is held at its low level and another experiment is performed with ion concentration at its high level. Again, two runs are necessary to determine the average effect. Using the results shown in Figure 1, the average effect of ion concentration is

(20 +   22)   (10  +  9)
___________ ­ __________ = 21­9.5 = 11.5.
    2            2

After a total of six runs it is known that at the low ion concentration, the coagulation rate goes up with an increasing thrombin level, and at the low thrombin concentration, the coagulation rate goes up with an increasing ion level. But what would happen if both variables were at their high level? If the effect of each factor stayed the same, the result would be a simple combination of the two effects. For the above example, such an assumption would result in the sum of the low-level average and the two high-level averages, or

9.5 + 20 + 11.5 = 41.

It was Fisher's idea that it was much better to vary all the factors at once using a factorial design, in which experiments are run for all combinations of levels for all of the factors. With such a study design, testing will reveal what the effect of one variable would be when the other factors are changing. Using a factorial design for the blood coagulation example, as shown in Figure 2, running a test with both variables at their high level yielded a rate of 60, not 41 as
previously estimated. If the goal of the study were to maximize coagulation rate, it would
be important to discover this synergistic
response, and it could not be detected with the one-factor-at-a-time experiment.

Another advantage of the factorial design is its efficiency. As indicated in the figure, only one run would be needed for each point, since there will be two runs at each level of each factor. Thus, the factorial design allows each factor to be evaluated with the same precision as in the one-factor-at-a-time experiment, but with only two-thirds the number of runs. Montgomery has shown that this relative efficiency of the factorial experiments increases as the number of variables increases (see bibliography, page 88). In other words, the effort saved by such internal replication becomes even more
dramatic as more factors are added to an

Calculation of the Main Effects. With
a factorial design, the average main effect
of changing thrombin level from low to
high can be calculated as the average response
at the high level minus the average response
at the low level, or, using the data from
Figure 2,

(60 +   30)    (20  +  10)
__________   -  _________  = 45­15 = 30.
     2            2

Similarly, the average main effect of ion concentration is the average response at the high level minus the average response at the low level, or

(20 +    60)     (10 +  30)
____________  -  _________  = 40 ­ 20 = 20.
     2             2

The fact that these effects have a positive value indicates that the response (i.e., the coagulation rate) increases as the variables increase. The larger the magnitude of the
effect, the more critical the variable.

Estimate of the Interaction. A factorial design makes it possible not only to determine the main effects of each variable, but also to estimate the interaction (i.e., synergistic effect) between the two factors, a calculation that is impossible with the one-
factor-at-a-time experiment design. As shown in Figure 2, the effects of thrombin at the low and high levels of ion concentration are 20 and 40, respectively. Thus, the effect of thrombin concentration depends upon the level of ion concentration; in other words, there is an interaction between the two variables. The interaction effect is the average difference between the effect of thrombin at the high level of ion concentration and the effect of thrombin at the low level of ion concentration, or

 ____   = 10.


A two-factor, two-level factorial design is normally set up by building a table using minus signs to show the low levels of the factors and plus signs to show the high levels of the factors. Table II shows a factorial design for the application example. The first column in the table shows the run number for the four possible runs. The next two columns show the level of each main factor, A and B, in each run, and the fourth column shows the resulting level of the interaction between these factors, which is found by multiplying their coded levels (­1 or +1). Columns 5 and 6 show the actual values assigned to the low and high variable levels in the design. Test runs using each of these four combinations constitute the experiment. The last column contains the responses from the experiment, which in Table II are the data from Figure 2. Filling in this column
requires the hard work of running each experiment and then recording the result.

In some studies there may be more than two important variables. For example, pH level has an important influence on coagulation rate and could be a third factor in the example experiment. The resulting three-factor, two-level design is shown in Table III and Figure 3. With three two-level factors, eight experiments will be required, and there will be four replicates of each level of each factor, further increasing the precision of the result. There will be three two-factor interactions and a three-factor interaction to evaluate. Usually, interactions involving three or more factors are not important and can be disregarded.

As in a two-factor experiment, the average effect of each factor can be calculated by subtracting the average response at the low level from the average response at the high level. Using the data from Figure 3, the effect of thrombin would be

(30 + 60 + 40 + 70)   (10 + 20 + 20 + 30)
 ________________  -  ________________  =30.
        4                     4

Table IV lists all of the effects in the blood coagulation experiment.


It is well recognized that the planning activities that precede the actual test runs are critical to the successful resolution of the experimenter's problem. In planning an experiment, it is necessary to limit any bias that may be introduced by the experimental units or experimental conditions. Strategies such as randomization and blocking can be used to minimize the effect of nuisance or noise elements.

Consider what would happen in the application example if the evaluation of co-agulation rate were sensitive to ambient
temperature and the temperature rose during the experiment. If the test runs were performed in the order in Table III--all of the low-pH combinations followed by all of the high-pH ones--the effect of the temperature change would be assigned to pH, thereby confusing an unknown trend with a design factor. By randomizing the order in which the test combinations are run, researchers can eliminate the effects of unknown trending variables on the results of the experiment.

Blocking can be used to prevent experimental results being influenced by variations from batch to batch, machine to machine, day to day, or shift to shift. In the eight-run, three-factor study, for example, let's assume there was only enough thrombin enzyme in a batch for four mixes. Let's also assume any batch-to-batch difference could affect the conclusions. Then the two batches could be assigned to the two coded levels (­1 or +1) of the three-factor interaction, which is shown as ABC in the design illustrated in Table III. This strategy is called blocking a factor on ABC. (ABC can be used as the blocking factor because the three-factor interaction is regarded as unimportant.) Because the study design is balanced, each batch of thrombin will be used the same number of times for each level of each factor. Thus, its influence is averaged out and is removed from the analysis.

In Figure 4, the two levels of the blocking factor, ABC, are shown as circles and squares. When 20 was added to each circle (the low level of ABC) and the effects of each variable recalculated, the results did not differ from those shown in Table IV. The effect of thrombin, for example, became

(60 + 70 + 30 + 80)   (20 + 50 + 30 + 20)
  ________________  -   ________________   =30.
         4                     4

Since only the ABC effect changed by the magnitude of the difference between batches, the batch-to-batch difference had been successfully removed from the experiment by its inclusion in the experimental setup. Without this blocking, the determination of the variables' effects would have been less precise, or missed altogether.


One disadvantage of two-level factorial designs is that the size of the study increases by a factor of two for each additional factor. For example, with eight factors, 256 runs would theoretically be necessary. Fortunately, because three-factor and higher-order interactions are rarely important, such intensive efforts are seldom required. For most purposes, it is only necessary to evaluate the main effects of each variable and the two-factor interactions, which can be done with only a fraction of the runs in a full factorial design. Such designs are called Resolution V designs. If there are some two-factor interactions that are known to be impossible, one can further reduce the number of runs by using Resolution IV designs. Table V compares the number of runs in full and fractional factorial designs with from two to eight variables. For the earlier example of eight factors, one can create an efficient design that may require as little as 16 runs.


Another disadvantage of two-level designs is that the experimental runs cannot detect if there are curvilinear effects in the region of optimum settings. To check on this possibility, every factorial design should include a center point at the zero (0) level of all the factors. If curvature is present, the response at this point will be much larger or smaller than the response expected from the linear model. Figure 5, for example, shows a response that has a maximum in between the two-level factorial design.

If curvature is present, the factorial design can be expanded to allow estimation of the response surface. One way to do this is to add experimental points. The central composite design shown in Figure 6 uses the factorial design as the base and adds what are known as star points. Special methods are available to calculate these star points, which provide desirable statistical properties to the study results. The result of such an expanded design is usually a contour plot of the response surface or a surface plot, such as Figure 7, which clearly shows a maximum.


Carefully planned, statistically designed experiments offer clear advantages over traditional one-factor-at-a-time alternatives. These techniques are particularly useful tools for process validation, where the effects of various factors on the process must be determined. Not only is the DOE concept easily understood, the factorial experiment designs are easy to construct, efficient, and capable of determining interaction effects. Results are easy to interpret and lead to statistically justified conclusions. The designs can be configured to block out extraneous factors or expanded to cover response surface plotting.

Those implementing a DOE strategy will find that computer software is an essential tool for developing and running factorial experiments. The Design-Expert program was used to create the response surface in Figure 7, for example. Other user-friendly DOE software includes BBN, CADE, Design-Ease, JMP, Statistica, and Statgraphics. Finally, for those who wish to learn more about DOE, a bibliography has been included here.


Box GEP, and Draper N, Empirical Model Building and Response Surfaces, New York, Wiley, 1987.

Box GEP, Hunter W, and Hunter JS, Statistics for Experimenters, New York, Wiley, 1978.

Montgomery DC, Design and Analysis of Experiments, 3rd ed, New York, Wiley, 1990.

Ross PJ, Taguchi Techniques for Quality Engineering, New York, McGraw-Hill, 1988.

John S. Kim and James W. Kalb are the
director, corporate statistical resources, and the senior applications scientist, respectively, at Medtronic, Inc. (Minneapolis). Kim is also a member of the
MD&DI editorial advisory board.

Originally published March, 1996

500 characters remaining