New Interval Offers Confidence—Without Limits

Confidence intervals are serious business.

John Zorich

August 9, 2010

25 Min Read
New Interval Offers Confidence—Without Limits

Recently, FDA told a medical device start-up company that, in regard to the company’s proposed clinical trial, “The equivalence of the device to the predicate can be demonstrated if the confidence interval for the difference in the mean values for the tested parameter excludes a difference larger than 20% from the predicate.” Unfortunately, the company could not meet that FDA mandate because in order to reduce the width of the confidence interval so that it would achieve that exclusion, a much larger number of patients was required than the company could afford to evaluate.

The term confidence interval was coined and first published by J. Neyman in 1934, when he applied it to binomial as well as variables data. He described confidence intervals as ranges “in which we may assume are contained the values of the estimated characters of the population.”1 As applied to a “proportion . . . of individuals in the sample” (which in this article will be represented by PS) that has been derived from an unknown “proportion . . . of individuals in the population” (PP in this article) whose “distribution . . . is then a binomial,” he defined the confidence interval for PP as having the form PLPSPU, where PL is the lower confidence limit, PU is the upper confidence limit, and PP is assumed with a specified level of confidence to be somewhere in the interval PLPU.2 A few months later, the first rigorous method for calculating such an interval was published.3


There is only one generally accepted method for calculating confidence intervals for variables data; that method involves t tables and the standard error of the mean, as described in any basic statistics textbook. However, there are many methods in use for calculating confidence intervals for binomial data, each such method resulting in different confidence limits and different interval widths.4,5


The reason that there are many binomial methods is partly historical and partly theoretical. The historical part is that, prior to the widespread availability of computers, binomial calculations were difficult. For that reason, simpler-to-calculate alternatives were developed; as was said several decades ago, “To calculate [large-sample binomial] probabilities would be an almost insurmountable task. Therefore, some method of approximation must be used.”6 The theoretical part is the lack of agreement on criteria for judging an interval (this is discussed in more detail later in this article). That disagreement was present from the birth of the interval concept in 1934: Neyman’s description for the interval was the equivalent of PLPSPU, whereas Clopper & Pearson’s description was PL  < PS < PU; notice the use of ≤ versus <.2,7

Some of those many binomial confidence interval methods involve formulaic calculations. For example, what is commonly referred to as the Wald formula uses a binomial standard deviation coupled with a standard normal distribution Z-table to calculate an approximate confidence interval. Wald intervals are based on the fact that when sample size is large or when PS is near 50%, the normal distribution is a reasonable model of the binomial, even if such an application is “not completely accurate.”8 Other methods involve trial and error. For example, in order to calculate each confidence limit for what is commonly called the Exact binomial confidence interval, repeated attempts must be made to determine the proportion that yields a cumulative binomial histogram probability of exactly half the chosen significance level (this is discussed in more detail later).

It has been said recently that the Wald interval is “in virtually ­universal use.”9 Similarly, the National Institute for Standards and Technology (NIST) “e-Handbook of Statistical Methods” Web site states that the Wald formula is the “confidence [interval] expression most frequently used.”10 And Wald’s was the only method included in a binomial calculation spreadsheet issued in 1998 by CDRH for general use in handling clinical trials data.11 By contrast, the Exact interval is the sole method provided in some mainstream statistical software programs (e.g., Statgraphics),12 and the Exact is the only non-Z-table method for calculating binomial proportion confidence limits that is mentioned on the NIST Web site;10 indeed, some statisticians (e.g., Agresti and Coull) refer to the Exact method as the gold standard.13

The implicit if not explicit focus of interest in a binomial confidence interval is its largest and smallest values, i.e., PU and PL. For example, a clinical trial might be considered successful only if the lower confidence limit on the outcome success rate is larger than a protocol-specified value. Because of such focus, this article introduces a new criterion for comparing the validity of confidence interval methods (since 1934, many other criteria have been proposed).14 The new criterion is this: Are the confidence limits of the interval reasonable? If it is unreasonable to conclude that PL and PU could be PP, then it is unreasonable to use the confidence interval method that generated them. And being reasonable is what even J. L. Fleiss has urged: “A confidence interval for a statistical parameter is a set of values that are . . . reasonable candidates for being the true underlying value.”15

Reasonable Confidence Intervals and Limits

Before defining what it means to be reasonable, it is important to understand the basis of the definition. A test of reasonableness is equivalent to performing a binomial test of significant difference using what Fleiss has called the “traditional statistical approach” for an “inference for a single proportion.” It involves “calculating the probability, assuming the null hypothesis holds, of obtaining the outcome that actually occurred, plus the probabilities of all other outcomes as extreme as, or more extreme than, the one that was observed; and rejecting the null hypothesis in favor of the alternative hypothesis if the sum of all these probabilities—the so-called p-value—is less than or equal to a predetermined level, denoted by α, called the significance level.”16

Based on that approach, a confidence interval is reasonable only if such a test results in a conclusion of “not statistically different” when the test compares the observed sample proportion (PS) to either of the two most extreme values in the interval, namely the upper and lower confidence limits (PU and PL, respectively). In terms a bit more mathematical, unreasonable is defined as follows (for sample size = N, observed number of successes in that sample = K, and K/N = PS): In regard to the probability distribution histograms derived from limits PL and PU, if either of the distribution tails in which K is found represents a cumulative probability of occurrence of less than or equal to α/2, we conclude that PS is statistically significantly different from the limit that generated that distribution, and that, therefore, the confidence interval PLPU is unreasonable. 

Figure 1. Example of a probability distribution derived from an “unreasonable” lower (a) and upper (b) confidence limit on an observed 10 successes in a sample size = N.

In that definition, we use α/2 rather than α because we are performing a two-sided test twice. The first test determines whether a random sample proportion differs from PL; the second test determines whether that random sample proportion differs from PU; in each case, we focus on tails that represent α/2 of the distribution (see Figures 1a and 1b). That is the standard way to approach such tests.10,17–19

It is important to note that the test of significance just described is performed using the probability distribution histograms generated from the two confidence limits (PL and PU) rather than being performed using the single probability distribution histogram generated from the observed sample proportion (PS). Such an approach is used because PL and PU are together considered to be the “null hypothesis;” based on the new criterion described above, the question to answer is this: Is it reasonable to assume that the random sample proportion PS could have been obtained from a population that had a proportion equal to either PL or PU (i.e., could either of them be Fleiss’s “true underlying value,” PP)?15,17,18

This next discussion examines the reasonableness of Wald limits and explains how to calculate them. As demonstrated on the NIST Web site,20 calculation of the upper and lower confidence limits of a Wald (normal approximation) binomial confidence interval uses the following formula:

PS ± Zα/2 × SDPS,

where PS is the observed sample proportion, Zα/2 is the two-tailed value from a normal distribution Z-table at the chosen α significance level, and SDPS is the binomial standard deviation for the observed proportion, calculated as the square root of PS(1 − PS)/N. The limits of such an interval can be calculated as shown below, using Microsoft Excel (subscript W indicates the Wald method, Normsinv and Sqrt are MS Excel functions that output Z-table values and square roots respectively, asterisk (*) is the MS Excel symbol for multiply, and the other terms are as defined earlier):

PUW  = PS + Normsinv(1– α/2) * Sqrt (PS * (1 − PS)/N)
PLW  = PSNormsinv(1– α/2) * Sqrt (PS * (1 − PS)/N)

Because the normal approximation assumption becomes less valid as the sample size becomes smaller or as PS departs farther from 0.500 (50%), the Wald calculation is typically restricted to situations in which both of the following are true: N(PS) > 5 and N(1 – PS) > 5.  Even FDA’s spreadsheet includes the warning “minimum [N(PS), N(1 – PS)]. . . must be > 5 to use normal approximation.”11

Figure 2. Probability distribution for binomial proportion. Binomial proportion = PLW = 0.041201116.

Let’s examine the reasonableness of Wald confidence limits for the following situation: α = 5% (and therefore confidence = 1 – α = 95%), sample size = N = 100, successes = K = 10, PS = K/N = 10/100 = 0.1, and N(PS) = 10.
The resulting limits are PLW = 0.041201116 and PUW = 0.158798884. The question is: how reasonable are they? Let’s focus just on PLW (the lower limit). The probability distribution for a population whose proportion equals that PLW is shown in Figure 2. Notice that the probability of occurrence of the observed sample result or a more extreme result (i.e., K ≥10) is less than α/2 = 2.5% (in fact, it is 0.8%); it can be concluded that PS is statistically significantly different from PLW. Based on our definition of reasonableness, it is therefore unreasonable to conclude that PLW = PP. If it is unreasonable to conclude that PLW = PP (at α = 5%), then it is unreasonable to consider PLWPUW to be the 1 – α = 95% confidence interval. As evidenced by this example, Wald intervals and confidence limits can be unreasonable.

To examine the reasonableness of Exact limits, it is essential to know how to calculate them. What are sought (by trial and error) are two proportions, one larger and one smaller than the observed sample proportion (PS); each must have a cumulative binomial probability of “exactly” α/2 for obtaining the observed sample result or a more extreme value (i.e., 0 to K, or K to N). An MS Excel spreadsheet can be used to calculate the limits as accurately as, for example, Statgraphics, to at least a billionth of a probability unit (equal to 9 places to the right of the decimal point); how to do so is described on the NIST Web site.10 The following is a generic example of applying the NIST/MS Excel method (α, N, K, PP, PS, PU, and PL are as defined above; subscript E indicates the Exact method; and P is a proportion sought):



PUE  =   the value of P (P > PS) needed to ensure that the MS Excel function Binomdist(K, N, P, True) outputs a probability value of α/2 precisely (to the desired number of significant digits).

PLE  =   the value of P (P < PS) needed to ensure that the MS Excel function Binomdist(K – 1, N, P, True) outputs a probability value of 1 – α/2 precisely (to the desired number of significant digits).

Figure 3. Probability distribution for binomial proportion. Binomial proportion = PLE = 0.049004689.

Those formulas, applied to the situation evaluated previously (N = 100, K = 10, PS =  0.1, and α = 5%), result in the following Exact limits: PLE = 0.049004689 and PUE = 0.176222598. The probability distribution for a population whose proportion equals that PLE is shown in Figure 3. Notice that the probability of occurrence of the observed sample result or a more extreme result (i.e., K ≥ 10) is precisely equal to α/2 = 2.5%; the entire histogram bar representing K = 10 (the value observed in the sample) is found in the 2.5% tail of the distribution; it can therefore be concluded that PS = K/N is statistically significantly different from PLE. Based on our definition of reasonableness, it is therefore unreasonable to conclude that PLE = PP. If it is unreasonable to conclude that PLE = PP (at α = 5%), then it is unreasonable to consider PLEPUE to be the 95% confidence interval. As evidenced by this example, Exact confidence intervals and limits can be unreasonable.

Two formulas for the Reasonable confidence limits are being introduced in this article; the first uses the binomial distribution and the second uses the beta distribution. The limits of a “reasonable binomial confidence interval” are defined as follows (where the subscript R indicates the Reasonable method):

PUR  =     the value of P (P > PS) needed to ensure that the MS Excel function Binomdist(K − 1, N, P, True) outputs a probability value of α/2 precisely  (to the desired number of significant digits).

PLR  =     the value of P (P < PS) needed to ensure that the MS Excel function Binomdist(K, N, P, True) outputs a probability value of 1 – α/2 precisely  (to the desired number of significant digits).

Notice that the first term in the MS Excel functions for Reasonable limits is changed by a value of 1 from its corresponding Exact function (in the PU definitions, the change is from K to K – 1, and in the PL formulas it is from K – 1 to K). That was done to ensure that PS is not statistically significantly different from either PLR or PUR. In effect, those two formulas identify the widest possible confidence interval such that the observed Sample proportion is not statistically significantly different from any point in the interval, most especially the highest and lowest points, namely PUR and PLR (the meaning of statistically significantly different was discussed briefly earlier, and it will be discussed in more detail later in this article).

Figure 4. Probability distribution for binomial proportion. Binomial proportion = PLR = 0.056207020 (a) and Binomial proportion = PLR = 0.056207020 (b).

If those Reasonable method formulas are applied to the situation evaluated previously (N = 100, K = 10, PS = 0.10, and α = 5%), they result in the following Reasonable limits: PLR = 0.056207020 and PUR = 0.163982255. The probability distribution for a population whose proportion equals that PLR is shown in Figures 4a and 4b (p.79). Notice that, in Figure 4a, the probability of occurrence of the observed sample result or a more extreme result (i.e., K ≥ 10) equals approximately 5.5 %; in Figure 4b, the entire histogram bar representing K = 10 is found in the (1 − α/2)%  body + lower_tail (i.e., in the lower 97.5% of the distribution); therefore, the observed sample proportion (PS = K/N = 0.10) is not statistically significantly different from PLR, and therefore PLR could possibly be PP. Similarly, as seen in Table I, it is reasonable to conclude that PUR could possibly be PP, because the observed sample result or a more extreme result (i.e., K ≤ 10) is equal to approximately 4.9% and the probability of K ≤ 9 is precisely 2.5%. Therefore, because both PLR and PUR could possibly be PP (at α = 5%), it is indeed reasonable to consider PLRPUR to be the 95% confidence interval.

Table I. Assuming a population has PXX proportion of successes, this table lists the probability for the listed range of  K successes, in a sample of size N = 100.

At first glance, the Wald method has an advantage over Reasonable and Exact ones: Calculation of Wald limits can be performed without trial and error and therefore can be automated using simple computer applications such as MS Excel functions. Upon further investigation, it can be seen that the beta-distribution formulas that have been used to approximate Exact limits 21 without using trial and error can be modified to also approximate Reasonable limits. As shown in Table II, the following MS Excel formulas approximate the output of the Reasonable binomial formulas shown above, to at least a millionth of a probability unit (equal to six places to the right of the decimal point) (α, N, K, PUR , and PLR are as defined earlier, and BetaInv is an MS Excel function):

PUR  (Beta)   =   1 − BetaInv (α/2, NK + 1, K)
PLR (Beta)   =   1 − BetaInv (1 − α/2, NK, K + 1)

Table II. Reasonable 95% confidence limits for K number of successes in sample size N = 100.

The most extreme values that PS can take are 0.0 and 1.0. When PS equals 0.0 precisely, no method can calculate a lower confidence limit (PL), because proportions (i.e., probabilities) lower than zero are undefined; and when PS equals 1.0 precisely, no method can calculate PU, because proportions above unity are likewise undefined. Not surprisingly, the PL for PS = 1.0 and the PU for PS = 0.0 are called one-sided limits.

Such one-sided limits can be calculated by the Exact method but not by Wald or Reasonable ones. Wald limits for PS = 0.0 or 1.0 cannot be calculated because in either case the binomial standard deviation itself equals 0.0, no matter what the sample size is (recall that SDPS = square root of PS(1 − PS)/N). Similarly, Reasonable limits cannot be calculated because if K = 0 (PS = 0.000), then the PUR formula value K − 1 is meaningless; and if K = N (PS = 1.000), then the PLR formula always results in a probability of 1.000 and therefore can never equal the sought-after value of P (P < PS).

Figure 5. Wald, Exact, and Reasonable 95% confidence limits when N = 100 and sample proportion (PS) = 0.10.

On the other hand, Reasonable and Exact methods (but not the Wald method) share the following advantage: they can calculate two-sided confidence limits for any PS greater than zero and less than unity, no matter how small or large (e.g., if sample size is a million and K = 1, then PS = 0.000001 = 1.000E-6; and PLR = 0.242E-6).

A common criterion for evaluation of confidence interval methods is coverage. That term refers to the percentage of time the interval can be expected to include PP (the “true underlying value”). Typically, that percentage is determined experimentally4,5,14 by generating confidence intervals for thousands of random samples drawn from populations of known proportions (i.e., known PPs), and then determining what percentage of those intervals contain the corresponding PP. As seen in Figure 5, Reasonable confidence limits are completely contained within Exact ones, and therefore Reasonable intervals can be expected to have slightly less coverage; experimental results support that conclusion (see Table III).

Table III. Percentage of 95% confidence intervals that include PP. Each percentage is based on 10,000 random samples of N = 100, drawn from a population of proportion = PP, using Statgraphics Centurion XV. The beta distribution was used to calculate Exact and Reasonable intervals.

The distinctiveness of Reasonable intervals and their limits is not apparent when the limits are plotted in the still-common manner introduced in 1934 by Neyman (see Figure 6).22 However, alternative plotting methods (see Figure 7) clearly demonstrate that Reasonable intervals are narrower than Wald or Exact ones. As a result, only Reasonable intervals are narrow enough to have upper and lower limits that are truly reasonable. As Neyman insisted, “Confidence intervals should be as narrow as possible.”23

Recommendations

With variables data, a test of significance is mathematically equivalent to use of a confidence interval; i.e., a borderline value being compared with the sample result is considered significantly different if it is outside the sample’s confidence interval, but nonsignificant if inside.24 Likewise, a borderline binomial proportion is classically viewed as being inside or outside the sample proportion’s confidence interval. But such a view is misleading for proportions because they are based on counts. Because counts can take only discrete values, the observed result’s probability distribution histogram bar appears as if it spans the border between significance and nonsignificance when the histogram is based on a borderline value.

Figure 6. Upper and lower 95% confidence limits (N = 100).

Figure 7. Lengths of confidence intervals (PUPL). N = 100 and confidence = 95%.

For example,  as discussed previously, in the case of N = 100 and observed result K = 10  (see Figure 3), probability distribution histograms based on PPLE = 0.0490 result in the histogram bar for K = 10 being in the upper 2.5% tail of the distribution. Similarly, if the histogram is based upon PPLR = 0.0562, then K = 10 is in the 97.5% body + lower_tail of the distribution (see Figure 4b). The problem is that histograms based upon P values between PLE and PLR result in the observed-K histogram-bar being neither fully in the 2.5% upper tail nor fully in the 97.5% body + lower_tail (see Figures 8a and 8b). In such cases, how do you objectively conclude significance or non-significance? Either conclusion could be viewed as unreasonably subjective and arbitrary. A solution, introduced in this article, is to always use not one but two confidence intervals: the Exact and the Reasonable.

Before explaining that solution, a brief history is useful (in the following discussion, “Lδ” is the likelihood of obtaining the observed result δ, assuming the null hypothesis is true). The concept of statistical significance has undergone much change since the precursors to modern tests of significance were developed in the 1800s. In that century, δ was not considered significant unless it was extremely unlikely.25 As decades passed, the requirement for significance ­became less extreme. By the 1930s, it could be said that “it is conventional among certain workers to adopt the following rule: If Lδ ≥ 0.05, δ is not significant; if Lδ ≤ 0.01, δ is significant; if 0.05 > Lδ > 0.01, our conclusions about δ are doubtful, and we cannot say with much certainty whether the deviation is significant or not until we have additional information”26 (the original text uses a different symbol than L). In effect, the region between 0.01 and 0.05 was considered a zone of uncertainty (a term not in the original text).

Figure 8. Probability distribution for “borderline.” Proportion = 0.0493 (vs. PLE = 0.0490) (a); proportion = 0.560 (vs. PLR = 0.562) (b). The bar representing K = 10 successes has an individual probability of 1.6%; thus almost all of its area (15/16) is in a tail of size 2.6% – 0.1% = 2.5% (a). The bar representing K = 10 successes has an individual probability of 2.9%; thus part of its area (1/29) is in a tail of size 2.4% + 0.1% = 2.5% (b).

On the basis of that background, perhaps the best solution to the problem of borderline proportions is to consider the range of values between PLE and PLR, and between PUR and PUE, to be zones of uncertainty (see Figure 9). Using that approach, and assuming the sample size has the power to detect a clinically significant difference, if the null hypothesis proportion (PNH) being compared with the study result (PS) is outside of the study result’s Exact confidence interval (i.e., PNHPLE or PUEPNH), you can claim statistical significance. If PNH is inside the corresponding Reasonable interval (i.e., PLRPNHPUR), you can claim statistical nonsignificance. However, if PNH is in either of the zones of uncertainty (i.e., either PLE < PNH < PLR or
PUR < PNH < PUE), then the results are statistically inconclusive. 

FDA Notes

A search of the Web site FDA.gov finds recent submissions, panel opinions, and guidance documents that include a variety of binomial confidence interval methods, the most common being the Exact and the Score (the Score formula is an elaborated classic Wald, from the perspective that it uses a binomial standard deviation and not one but a few copies of a normal distribution Z-table value; Score confidence limits are found midway between Exact and Reasonable ones). In a 2007 statistical guidance document, FDA recommends “score confidence intervals, and alternatively, Exact (Clopper-­Pearson) confidence intervals.”27 Exact versus Score methods were compared briefly in a 2009 FDA publication. It noted, “An advantage with the Score method is that... it can be calculated directly. Score confidence bounds tend to yield narrower confidence intervals than Clopper-Pearson [i.e., Exact] confidence intervals, resulting in a larger lower confidence bound.”28 It is interesting to note that that recommendation was based on ease of calculation rather than on theoretical correctness.

Figure 9. Exact and Reasonable 95% confidence limits when N = 100 and sample proportion (PS) is 0.10.

In 2003, Medtronic Neurological received approval for its Active Dystonia Therapy (deep brain stimulation system) by using the classic Wald method almost exclusively. “Exact 95% confidence intervals were used when the # (%) of patients was 0 (0%) because the normal approximation to the binomial does not provide a confidence interval. In every other case, the normal approximation to the binomial was used to calculate confidence intervals” even though in very many of those cases the N(PS) > 5 criterion (mentioned above) was not met.29
The new methods proposed in this article have not yet been used in any regulatory submission to FDA, Health Canada, or a notified body, as far as the author has been able to ascertain.

Conclusion

The most commonly used binomial confidence intervals can have confidence limits that are statistically significantly different from the random sample proportion from which they were derived. Therefore, such confidence limits are unreasonable to consider as being the proportion of the population from which the random sample was taken. A more defensible choice of confidence intervals is presented in this article, namely a Reasonable confidence interval for a binomial proportion, the extreme values of which are Reasonable confidence limits.

Although such an interval is slightly narrower than other intervals and thus offers less coverage, it is more reasonable to use because it is the widest possible range that contains no values that are statistically significantly different from the sample proportion on whose basis the interval was calculated. Alternatively, an even more reasonable approach is to use a combination of Exact and Reasonable limits, coupled with the concept of zones of uncertainty.

References

  1. J Neyman, “On the Two Different Aspects of the Representative Method,” Journal of the Royal Statistical Society 97, no. 4 (1934): 562.

  2. Neyman (1934): 589.

  3. C Clopper and E Pearson, “The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial,” Biometrika 26, no. 4 (1934): 409.

  4. LD Brown et al., “Confidence Intervals for a Binomial Proportion and Asymptotic Expansions,” The Annals of Statistics, 30 no. 1 (2002): 160–201.

  5. RG Newcombe, “Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods,” Statistics in Medicine 17 (1998): 857–872.

  6. H Bancroft, Introduction to Biostatistics (New York: Hoeber Medical Division of Harper & Row, 1957): 106.

  7. Clopper (1934): 404.

  8. H Motulsky, Intuitive Biostatistics (New York: Oxford University Press, 1995): 18.

  9. Brown (2002): 160.

  10. NIST/SEMATECH e-Handbook of Statistical Methods, last updated: 7/18/2006; available from Internet: www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm.

  11. Two Group Confidence Interval & Power Calculator, version 1.6 (Rockville, MD: FDA, March 26, 1998).

  12. Statgraphics Centurion XV, version 15.2.12 (Warrenton, VA: StatPoint Inc., 1982–2007).

  13. LD Brown et al., “Interval Estimation for a Binomial Proportion,” Statistical Science 16, no. 2 (2001): 117.

  14. MD deB Edwardes, “The Evaluation of Confidence Sets With Application to Binomial Intervals,” Statistica Sinica 8, (1998): 393–409.

  15. JL Fleiss et al., Statistical Methods for Rates and Proportions, 3rd ed. (Hoboken, NJ: Wiley, 2003): 22.

  16. Fleiss (2003): 18–19.

  17. JL Phillips Jr., How to Think About Statistics, revised ed. (New York: Freeman, 1992): 62–64.

  18. W Mendenhall, Introduction to Probability and Statistics, 5th ed. (North Scitueate, MA: Duxbury Press, 1979): 231–232.

  19. NIST: www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm.

  20. NIST: www.itl.nist.gov/div898/handbook/prc/section2/prc24.htm.

  21. K Krishnamoorthy, Handbook of Statistical Distributions with Applications (Boca Raton, FL: Taylor & Francis Group, 2006): 38.

  22. Neyman (1934): 590.

  23. Neyman (1934): 563.

  24. Motulsky (1995): 106–117.

  25. SM Stigler, The History of Statistics: The Measurement of Uncertainty before 1900 (Cambridge, MA: Belknap Press of Harvard University Press, 1986):  300ff.

  26. JF Kenney,  Mathematics of Statistics (Part One & Part Two) (New York: D. Van Nostrand Co., 1939): Part Two, 117.

  27. Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests, Draft Guidance (Rockville, MD: FDA, March 13, 2007): 23.

  28. Assay Migration Studies for In Vitro Diagnostic Devices, Draft Guidance (Rockville, MD: FDA,  January 5, 2009): 31.

  29. Summary of Safety and Probable Benefit, Humanitarian Device Exemption (HDE) Number: H020007 (Rockville, MD: FDA, April 15, 2003): 9.

John Zorich is an independent consultant and contractor in the areas of regulatory compliance and statistical methods.

Sign up for the QMED & MD+DI Daily newsletter.

You May Also Like