ERIS - Travaux statistiques

Equipe Raisonnement Induction Statistique

Some questions

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

The abuses of interpretation of significance tests

Consider an experiment involving two crossed factors Age and Treatment, each with two modalities. The means of the four experimental conditions (with 10 subjects in each) are respectively 5.77 (a1,t1), 5.25 (a2,t1), 4.83 (a1,t2) and 4.71 (a2,t2).
The following typical comments, based on ANOVA $F$ tests, are found in an experimental review:
"the only significant effect is a main effect of treatment (F[1,36]=6.39, p=0.016), reflecting a substantial improvement'';
and again
"clearly, there is no evidence (F[1,36]=0.47, p=0.50) of an interaction".

It is strongly suggested to the reader that it has been demonstrated both a large main effect of treatment and a small interaction effect.

Do you agree with these conclusions?

There is nothing of the kind!

The difference between the two observed treatment means is:
       d = (5.77+5.25)/2 - (4.83+4.71)/2 = +0.74
We deduce the interval estimate" (95% confidence interval or credible interval):
       [+0.15 , +1.33]

The interaction effect can be characterized by the difference of differences:
       d = (5.77-4.83) - (5.25-4.71) = +0.40
We deduce the interval estimate" (95% confidence interval or credible interval):
       [-0.78 , +1.58].

This clearly shows that it cannot be concluded both to a substantive difference between treatment means and to a small, or at least relatively negligible, interaction effect (and more again to a null interaction)..

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

There is interval and interval!

In an introductory statistical textbook, in a serie for the "grand public", whose goal is to give the reader the possibility to "access the deep intuitions in the field", one can find the following interpretation of a confidence interval for a proportion.

"If in an opinion poll of size 1000, the observed proportion P is equal to 0.613, the proportion π to estimate has a probability 0.95 of lying in the range: [0.58,0.64]"

Do you agree with this interpretation?

If you are not (again) a Bayasian and if your real intuition is that interpretation is, either right, or perhaps wrong but in any case desirable, you must seriously ask yourself if you are not a Bayesien "without knowing it".

In the frequentist framework the possible values for the parameter cannot probabilised. If, as in this example, the bounds computed for the observed sample are [0.58,0.64], the event "0.58<π<0.64" is true or false (because π is fixed), and we cannot give it a probability (other than 1 ou 0).

The correct interpretation of the 95% confidence interval is the following:
"95% of computed confidence intervals for the set of all samples (all samplest that it is possible to draw in the population) contain the true value π".
Each interval in isolation has either a 0 or 100% probability of containing the true value.

Ironically, it is the natural (Bayesian) interpretation of confidence intervals in terms of "a fixed interval having a 95% chance of including the true value of interest" which is their appealing feature.

! The difference between the two interpretations is not semantical.

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

I have the test statistic, can I get an interval?

I have find an article that report the results of a study designed to test the efficacy of a drug by comparing two groups (treatment vs placebo) of 15 patients each. The gives the observed difference d=+1.52 in favour of the treatment, and a "Student t test": t=+0.683, 28 degrees of freedom, p=0.50, nonsignificant.
I would be interested in an interval estimate (frequentist confidence interval, or fiducial-Bayesian credible interval) in order to assess if the inefficacy of the treatment has really been proved.

Is it possible?

Yes!

For a 100(1-α)% interval, it is sufficient to know t{(1-α)/2}: the (1-α)/2 upper point of the Student distribution with q degrees of freedom.
The 100(1-α)% interval estimate (frequentist or fiducial-Bayesian interval) for the true difference δ can be immediately deduced:

[ d - (d/t)t{(1-α)/2} , d + (d/t)t{(1-α)/2} ]

We find here for α = 0.05 and q=28 degrees of freedom t{0.975}= +2.0484, hence the 95% interval (of course it is assumed that d and t are computed with the needed accuracy):

[-3.04,+6.08]

Compute an interval for a contrast between means
from your data...

This interval can be interpreted as a 95% "frequentist" confidence interval or as a 95% "fiducial-Bayesian" interval.

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

Even statisticians...

Consider the results of a study designed to test the efficacy of a drug by comparing two groups (treatment vs placebo) of 15 patients each.
The drug is to be considered clinically interesting by experts in the field if the unstandardized difference between the treatment mean and the placebo mean is more than +3.
The observed difference is d=+1.52. The difference is non significant (t=+0.683, p=0.50).

What conclusion would you draw for the efficacy of the drug?

Answer spontaneously (without computation)

From a normative viewpoint, the task involves the following simple and general result:
the 100(1-α)% interval estimate (frequentist or fiducial-Bayesian) for the true difference δ is approximately

d ± 2(d/t) hence here [-2.93,+5.97]

This very simple approximation is generally sufficient (the exact interval is [-3.04,+6.08]). This straightforward and easily interpretable result should theoretically prevent the abusive interpretation of a nonsignificant result as "proof of the null hypothesis".

Clearly here the data cannot lead to conclude to the inefficacy of the drug (because of the great variability observed.

However, in front of this situation 84% of the professional applied statisticians and 85% of the psychologists (all with experience in processing and analyzing experimental data) concluded inefficacy.
More again, they were told that initially the experiment was planned with 30 subjects in each group and that the previous result presented here was in fact an intermediate result. Then they are asked if they make the decision to stop the experiment, and conclude:
More than half of them perceived the situation as very favorable and decided to stop (57% of the statisticians and 53% of the psychologists).

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

What is the probability of finding again...

In a study that compares an experimental condition to a control condition, a difference +1.82 between the two moyennes has been observed. The difference is significant at two tailed level 0.05: t=+2.09, 19 degrees of freedom, p=0.05.

(1) What, for you, is the probability that, in a replication of the experiment, the observed difference will be positive?
(2) What, for you, is the probability that the observed difference will be positive, and the result of Student's test will be at least significant?

Answer spontaneously (without computation)

From a normative viewpoint, since there is no a priori information external to the experiment, it seems reasonable to base the prediction on the data only.
Hence, the fiducial-Bayesian answer is:
(1) 0.92
(2) 0.50

The majority of investigated psychological researchers underestimated the first probability and overestimated the second probability.
The striking finding is that half of researchers gave numerically close values for the two probabilities. About one third of the subjects even gave exactly the same answer.

Killeen's prep is the probability of finding again a same-sign difference in a replication. Hence, it answers the first question.

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

A problem with socks

A pair of socks is (blindly) draw from a drawer in which there are a pair of red socks and a pair of green socks.
Consider the following results:
Result 1: A pair of socks that match (two red or two green) is obtained
Result 2: A pair of socks that do not match (one red and one green) is obtained

Do you think there is:
1) more chance of obtaining result 1
2) more chance of obtaining result 2
3) an equal chance of obtaining the two results

Answer spontaneously (without computation)

The correct response is: 2) more chance of obtaining result 2

There is more chance of obtaining two socks that do not match (one red and one green).
Number the socks in the drawer: 1 2 3 4
There are 6 possible different drawing: 12 1 3 1 4 2 3 2 4 34

hence four chances out six of obtaining two socks that do not match.

If you answered "an equal chance of obtaining the two results" (equiprobability bias), you are part of the majority.
You are in good company, since a referee, expected to be expert in the field, wrote us:
"A pair of matching socks is blindly drawn from a drawer containing two pairs of different socks. But with to red and two green socks, the probability of drawing two matching socks is equal to drawing one red and one green, p=.5."

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random

There is random and random

Consider the two following events:

"The fact that a pair of socks that match is obtained from a blindly draw of two socks from a drawer in which there are two pairs of different socks"

"The fact that a planted seed germinates or not"

Do you think that randomness is involved or not in each of these two events?

Answer spontaneously

Of course, there is no "good response"!

Three groups of subjects have been questioned: college undergraduates students, researchers in psychology, and researchers in mathematics and statistics.
A large majority of individuals agree for the first item [socks] and consider it as random because "it is possible to compute 'easily' a probability". However this item is less often considered as random within the PSY group than within the two other groups.
In contrast individuals are divided for the second item [seed]. Two main conceptions have been observed: either randomness is involved because "a probabilistic reasoning is involved", or randomness is not involved because "there is a great part of determinism" or because "causal factors can be identified".

A specificity of the mathematicians is that some of them explicitly referred to two kinds of randomness: a "mathematical" randomness when it is easy to compute an objective probability (typically "the socks"), and a randomness "when ignorance" when it is not easy to compute a probability by lack of available standard probabilistic model (typically "the seed").

The abuses of interpretation	There is interval and interval!	I have the test Can I get an interval?
Even statisticians...	What is the probability of finding again...	A problem with socks	There is random and random