An article reports the following results for a study designed to test the efficacy of a drug
by comparing two groups (treatment
vs placebo) of 15 patients each:
the observed (raw) difference
D=+1.52 in favour of the treatment,
a "Student t test":
t=+0.683,
q=28 degrees of freedom, p=0.50,
nonsignificant.
I would be interested in an interval estimate (frequentist confidence interval, or
fiducial-Bayesian credible interval) in order to assess if the inefficacy of the treatment has really been
proved.
Can I get an interval estimate for the true difference?
Yes! For a
100(1-α%
interval, it is sufficient to know
t1-α/2:
the
(1-α/2 upper
point of the Student distribution with
q degrees of freedom.
The 100(1-α% interval estimate
(frequentist or fiducial-Bayesian interval) for the true difference
δ can be
immediately deduced:
[ D - (D/t)t(1-α)/2
, D + (D/t)t1-α)/2
]
We find here for
α = 0.05 and
q=28 degrees of freedom
t0.975= +2.0484, hence
the 95% interval
[-3.04,+6.08]
(of course it is assumed that
D and
t are computed with appropriate accuracy).
Interpretation
This interval can be interpreted as
a 95% "frequentist"
confidence interval or as a 95% "fiducial-Bayesian" interval.
Student's example (1908)
In his original article about the "t test", Student illustrates his test for an inference
about the difference between the additional hour's sleep gained by the use of two soporifics.
The observed average (raw) difference is
D=+1.58.
In modern statements, we compute the t test statistic
t=+4.06 (with
q=9 degrees of freedom).
We find here for
α = 0.05
and
q=9 degrees of freedom
t0.975= +2.2622,
hence the 95% interval
[+0.70,+2.46]
(of course it is assumed that
D and
t are computed with the appropriate accuracy).
Interval estimate and significance test
A formula equivalent to the
previous formula is
[ D ( 1 - t1-α)/2/t ) ,
D ( 1 + t1-α/2/t )
]
If t = t1-α/2,
the t test is "exactly significant" at two-sided level
α
(p=a)
Û
the interval is [0,2D] (if D>0) or [-2D,0] (if D<0).
If t > t1-α/2,
the t test is significant at two-sided level α
(p<a)
Û
the interval does not include 0.
This is the case in the Student's example: the p-value is p=0.003.
If t < t1-α/2,
the t test is non significant at two-sided level α
(p>α)
Û
the interval includes 0.
This is the case in the 'placebo' example: the p-value is p=0.50.
Conceptual confusions
Even experts in statistics are not immune from
conceptual confusions.
For instance, Rosnow and Rosenthal (1996, page 336*)
interpret the specific interval [0,+0.532] as "a
77% [
frequentist] confidence interval"
(given
D=+0.532 and the one-sided
p-value for the usual t test
p=0.115, hence 77%=(1-2×0.115)100%).
If we repeat the experience,
2D and the
p-value will be different,
and, in a long run repetition, the proportion of intervals [
-2D,0] or [
0,2D]
(according to the sign of
D)
that contain the true value of the difference will not be 77%.
Clearly, 77% is here a
data dependent probability, which needs a Bayesian approach to be correctly interpreted.
[*Computing contrasts, effect sizes, and counternulls on other people's published data:
General procedures for research consumers.
Psychological Methods,
1, 331-340.]
Remark: Student and the interpretation of the p-value
Student wrote in 1908:
"the probability is .9985 [1-p/2] or the odds are about 666 to 1 than 2 is the better soporific".
This is clearly a
Bayesian (ou
fiducial) statement, and
certainly not an
orthodox frequentist statement!
Be careful!
It is only in the fiducial-Bayesian framework
that you can state:
"there is a 99.85% chance that the true difference is positive"
and
"there is a 97.5% chance that it is larger than +0.70".
If you adopt the frequentist framework, you must ban any colloquialism such as
"I am 95% confident that the true difference lies between +0.70 and +2.46" that
gives to understand that the confidence level may be a measure of uncertainty
after the data have been seen, which it may not be.
'Interaction' example
Consider an experiment involving two crossed factors
Age and
Treatment,
each with two modalities.
The means of the four experimental conditions (with 10 subjects in each) are respectively 5.77
(a1,t1), 5.25 (a2,t1), 4.83 (a1,t2) and 4.71 (a2,t2).
The interaction effect can be characterized by the difference of differences:
D = (5.77-4.83) - (5.25-4.71) = +0.40
The ANOVA F ratio for this effect is
F=0.47, p=0.50
(with 1 and
q=36 degrees of freedom).
Given the property that the F ratio for a contrast is the square of the t statistic,
we replace D/t with the absolute value of D/square-root(F).
We find here the 95% interval
[-0.78,+1.58]
(of course it is assumed that
D and
F are computed with appropriate accuracy).