An article reports the following results for a study designed to test the efficacy of a drug
by comparing two groups (treatment

*vs* placebo) of 15 patients each:

the observed (raw) difference

**D=+1.52** in favour of the treatment,

a "Student t test":

**t=+0.683**,

**q=28** degrees of freedom, p=0.50,

**nonsignificant**.

I would be interested in an interval estimate (frequentist confidence interval, or
fiducial-Bayesian credible interval) in order to assess if the inefficacy of the treatment has really been
proved.

**Can I get an interval estimate for the true difference?**

Yes! For a

**100(1-α%**
interval, it is sufficient to know

**t**_{1-α/2}:
the

**(1-α/2** upper
point of the Student distribution with

**q** degrees of freedom.

The 100(1-α% interval estimate
(frequentist or fiducial-Bayesian interval) for the true difference

*δ* can be

**immediately** deduced:

###
[ D - (D/t)t_{(1-α)/2}
, D + (D/t)t_{1-α)/2}
]

We find here for

**α = 0.05** and

**q=28** degrees of freedom

**t**_{0.975}= +2.0484, hence
the 95% interval

**[-3.04,+6.08]**
(of course it is assumed that

**D** and

**t** are computed with appropriate accuracy).

###
Interpretation

This interval can be interpreted as

a 95% "frequentist"
confidence interval or as a 95% "fiducial-Bayesian" interval.

##
Student's example (1908)

In his original article about the "t test", Student illustrates his test for an inference
about the difference between the additional hour's sleep gained by the use of two soporifics.
The observed average (raw) difference is

**D=+1.58**.
In modern statements, we compute the t test statistic

**t=+4.06** (with

**q=9** degrees of freedom).

We find here for

**α = 0.05**
and

**q=9** degrees of freedom

**t**_{0.975}= +2.2622,
hence the 95% interval

**[+0.70,+2.46]**
(of course it is assumed that

**D** and

**t** are computed with the appropriate accuracy).

###
Interval estimate and significance test

A formula equivalent to the

previous formula is

###
[ D ( 1 - t_{1-α)/2}/t ) ,
D ( 1 + t_{1-α/2}/t )
]

If **t = t**_{1-α/2},

the t test is **"exactly significant"** at two-sided level
α
(**p=****a**)
Û
the interval is [**0**,**2D**] (if **D>0**) or [**-2D,0**] (if **D<0**).

If **t > t**_{1-α/2},

the t test is **significant** at two-sided level α
(**p<****a**)
Û
the interval **does not include 0**.

This is the case in the Student's example: the p-value is **p=0.003**.

If **t < t**_{1-α/2},

the t test is **non significant** at two-sided level α
(**p>α**)
Û
the interval **includes 0**.

This is the case in the 'placebo' example: the p-value is **p=0.50**.

###
Conceptual confusions

Even experts in statistics are not immune from

*conceptual* confusions.
For instance, Rosnow and Rosenthal (1996, page 336*)
interpret the specific interval [0,+0.532] as "a

**77%** [

*frequentist*] confidence interval"
(given

**D=+0.532** and the one-sided

**p-value** for the usual t test

**p=0.115**, hence 77%=(1-2×0.115)100%).
If we repeat the experience,

**2D** and the

**p-value** will be different,
and, in a long run repetition, the proportion of intervals [

**-2D,0**] or [

**0,2D**]
(according to the sign of

**D**)
that contain the true value of the difference will not be 77%.
Clearly, 77% is here a

*data dependent* probability, which needs a Bayesian approach to be correctly interpreted.

[*Computing contrasts, effect sizes, and counternulls on other people's published data:
General procedures for research consumers.

*Psychological Methods*,

*1*, 331-340.]

###
Remark: Student and the interpretation of the p-value

Student wrote in 1908:

**
"the probability is .9985 [1-p/2] or the odds are about 666 to 1 than 2 is the better soporific"**.
This is clearly a

**Bayesian** (ou

**fiducial**) statement, and
certainly not an

*orthodox* frequentist statement!

###
Be careful!

It is only in the **fiducial-Bayesian** framework
that you can state:
"there is a **99.85%** chance that the true difference is **positive**"
and
"there is a **97.5%** chance that it is **larger than +0.70**".

If you adopt the **frequentist** framework, you must **ban** any colloquialism such as
"I am **95% confident** that the true difference lies between +0.70 and +2.46" that
gives to understand that the confidence level may be a measure of uncertainty
*after the data have been seen*, which it may not be.

##
'Interaction' example

Consider an experiment involving two crossed factors

*Age* and

*Treatment*,
each with two modalities.
The means of the four experimental conditions (with 10 subjects in each) are respectively 5.77
(a1,t1), 5.25 (a2,t1), 4.83 (a1,t2) and 4.71 (a2,t2).

The interaction effect can be characterized by the difference of differences:

**D = (5.77-4.83) - (5.25-4.71) = +0.40**
The ANOVA F ratio for this effect is

**F=0.47**, p=0.50
(with 1 and

**q=36** degrees of freedom).

Given the property that the F ratio for a contrast is the square of the t statistic,
we replace D/t with the absolute value of D/square-root(F).

We find here the 95% interval

**[-0.78,+1.58]**
(of course it is assumed that

**D** and

**F** are computed with appropriate accuracy).

##

##