ERIS - Statistical researches

The statistical researches of ERIS concern the methods for analysing experimental data. The privileged application fields are experimental psychology and clinical trials in medecine and pharmacology. The specificities of these fields are, one one hand that complex experimental desings are generally used, with precise objectives,and on the other hand that experimental results must be accepted by a large community.

1. Criticisms of usual significance tests

"The test provides neither the necessary nor the sufficient scope or typeof knowledge that basic scientific social research requires."(D.E. Morrison & R.E. Henkel)

Although the use of Null Hypothesis Significance Testing (NHST) has been criticized by the most eminent and the most experienced scientists, both on theoretical and methodological grounds, it is always required in most scientific publications as an unavoidable norm. Our conclusion is that the use of NHST is a socially adapted but methodologically unsuited use of an inadequate tool promoted through misleading guidelines of standard textbooks.

Méthodologie de l'analyse des données expérimentales - Étude de la pratique des tests statistiques chez les chercheurs en psychologie, approches normative, prescriptive et descriptive
And... what about the researcher's point of view?
L'usage des tests statistiques par les chercheurs en psychologie: Aspects normatif, descriptif et prescriptif
Fisher: Responsible, not guilty

The abuses of interpretation of significance tests

Consider an experiment involving two crossed factors Age and Treatment, each with two modalities. The means of the four experimental conditions (with 10 subjects in each) are respectively 5.77 (a1,t1), 5.25 (a2,t1), 4.83 (a1,t2) and 4.71 (a2,t2).
The following typical comments, based on ANOVA $F$ tests, are found in an experimental review:
"the only significant effect is a main effect of treatment (F[1,36]=6.39, p=0.016), reflecting a substantial improvement'';
and again
"clearly, there is no evidence (F[1,36]=0.47, p=0.50) of an interaction".

It is strongly suggested to the reader that it has been demonstrated both a large main effect of treatment and a small interaction effect.

Do you agree with these conclusions?

Time for new publication guidelines?

"Habit is habit and not to be flung out of the window by any man,but coaxed downstairs a step at a time." (Mark Twain)

Especially in psychology, changes could be the consequence of the Task Force on Statistical Inference charged by the American Psychological Association of studying the role of NHST in psychological research.

[Wilkinson, L. and Task Force on Statistical Inference, APA Board of Scientific Affairs (1999) - Statistical Methods in Psychology Journals: Guidelines and Explanations. American Psychologist, 54, 594-604.
Azar B. (1999) - APA statistics task force prepares to release recommendations for public comment. APA Monitor Online, 30, 5.]

Aller au delà des tests de signification traditionnels: Vers de nouvelles normes de publication

"The essence of science is replication: a scientist should always be concerned about what would happen if he or another scientist were to repeat his experiment." (Guttman).

In 2006, TheAssociation for Psychological Science introduced in the "author guidelines" of Psychological Science, a new norm of publication:

Statistics

Effect sizes should accompany major results. In addition, authors are encouraged to use prep rather than p values (see the article by Killeen in the May 2005 issue of Psychological Science, Vol. 16, pp. 345-353).

Killeen's p_rep ("probability of replication") now routinely appears in Psychological Science.

More about prep...

New difficulties with confidence intervals

"It would not be scientifically sound to justify a procedure by frequentist arguments and to interpret it in Bayesian terms." (H. Rouanet)

Confidence intervals could quickly become a compulsory norm in experimental publications. However, for many reasons due to their frequentist (Neyman and Pearson) conception, confidence ntervals can hardly be viewed as the ultimate method.
Indeed the appealing feature of confidence intervals is the result of a fundamental misunderstanding. As is the case with significance tests, the frequentist interpretation of a 95% confidence interval involves a long run repetition of the same experiment: in the long run 95% of computed confidence intervals will contain the "true value" of the parameter; each interval in isolation has either a 0 or 100% probability of containing it.
It is so strange to treat the data as random even after observation that the orthodox frequentist interpretation of confidence intervals does not make sense for most users.

Et si vous étiez un bayésien "qui s'ignore"?
Isn't everyone a Bayesian?
And if you were a Bayesian without knowing it?

There is interval and interval!

In an introductory statistical textbook, in a serie for the "grand public", whose goal is to give the reader the possibility to "access the deep intuitions in the field", one can find the following interpretation of a confidence interval for a proportion.

"If in an opinion poll of size 1000, the observed proportion P is equal to 0.613, the proportion π to estimate has a probability 0.95 of lying in the range: [0.58,0.64]"

Do you agree with this interpretation?

If you are not (again) a Bayasian and if your real intuition is that interpretation is, either right, or perhaps wrong but in any case desirable, you must seriously ask yourself if you are not a Bayesien "without knowing it".

In the frequentist framework the possible values for the parameter cannot probabilised. If, as in this example, the bounds computed for the observed sample are [0.58,0.64], the event "0.58<π<0.64" is true or false (because π is fixed), and we cannot give it a probability (other than 1 ou 0).

The correct interpretation of the 95% confidence interval is the following:
"95% of computed confidence intervals for the set of all samples (all samplest that it is possible to draw in the population) contain the true value π".
Each interval in isolation has either a 0 or 100% probability of containing the true value.

Ironically, it is the natural (Bayesian) interpretation of confidence intervals in terms of "a fixed interval having a 95% chance of including the true value of interest" which is their appealing feature.

! The difference between the two interpretations is not semantical.

Criticisms of usual significance tests	The Bayesian therapy	Development of alternative inference methods	The Bayesian Analysis of Comparisons	The likelihood principle A need to rethink
Study of new distributions	Other application fields	Adaptative designs	Statistical inference and causal analysis	Methodological and didactical implications

2. The Bayesian therapy

Won't the Bayesian choice be unavoidable?

"We [statisticians] will all be Bayesians in 2020, and then we can be a united profession." (D.V. Lindley)

We argue that Bayesian methods are ideally suited for creating a change of emphasis in the presentation and interpretation of experimental results. We suggest using "noninformative" Bayesian methods as a therapy for curing the misuses and abuses of NHST.
For many years we have worked with colleagues in France with this perspective in mind in order to develop standard "noninformative" Bayesian methods for the most familiar situations encountered in experimental data analysis.

Beyond the significance test controversy: Prime time for Bayes?
Uses, abuses and misuses of significance tests in the scientific community: Won't the Bayesian choice be unavoidable?

The fiducial Bayesian methods

"Maybe Fisher's biggest blunder [fiducial inference] will become a big hit in the 21st century." (B. Efron)

In order to promote these Bayesian methods, it seemed important to us to give them a more explicit name than "standard", "noninformative" or "reference". We propose to call them fiducial Bayesian. This deliberately provocative name pays tribute to Fisher's work on scientific inference for research workers. It indicates their specificity and their aim to express "what the data have to say".
These fiducial Bayesian methods are concrete proposals in order to bypass the shortcomings of NHST and improve current statistical methodology and practice

New ways in statistical methodology: From significance tests to Bayesian inference
Uses, abuses and misuses of significance tests in the scientific community: Won't the Bayesian choice be unavoidable?
Bayesian methods for experimental data analysis

Criticisms of usual significance tests	The Bayesian therapy	Development of alternative inference methods	The Bayesian Analysis of Comparisons	The likelihood principle A need to rethink
Study of new distributions	Other application fields	Adaptative designs	Statistical inference and causal analysis	Methodological and didactical implications

3. Development of alternative statistical inference methods

"A common misconception is that Bayesian analysis is a subjective theory; this is neither true historically nor in practice." (J. Berger)

Our goal is to develop general alternative methods better suited to the needs of users. The Bayesian inference is a privileged theorical framework, at least as objective as the traditional frequentist inference.
The fiducial-Bayesian methods have been applied many times to real data and well accepted by experimental journals

Mémorisation de récits: Reconnaissance immédiate et différée d'énoncés par des enfants de 7, 8 et 10 ans
Orientation of attention and sensory gatting: An evoked potential and RT study in cat
From production to selection of interpretations for novel conceptual combinations: A developmental approach.

"Bayesian posterior probabilities are exactly what scientists want." (S.N. Goodman & J.A. Berlin)

From significance tests to Bayesian inference

I have the test statistic, can I get an interval?

I have find an article that report the results of a study designed to test the efficacy of a drug by comparing two groups (treatment vs placebo) of 15 patients each. The gives the observed difference d=+1.52 in favour of the treatment, and a "Student t test": t=+0.683, 28 degrees of freedom, p=0.50, nonsignificant.
I would be interested in an interval estimate (frequentist confidence interval, or fiducial-Bayesian credible interval) in order to assess if the inefficacy of the treatment has really been proved.

Is it possible?

4. The Bayesian Analysis of Comparisons

"ANOVA may be the most commonly used statistical procedure. It is assuredly the most commonly misused statistical procedure!" (D.A. Berry)

The Bayesian Analysis of Comparisons

questions

specific

Bayesian

Integrating traditional analysis of variance procedures (t tests, F tests, etc.)

and extending them with Bayesian (and frequentist) procedures

Criticisms of usual significance tests	The Bayesian therapy	Development of alternative inference methods	The Bayesian Analysis of Comparisons	The likelihood principle A need to rethink
Study of new distributions	Other application fields	Adaptative designs	Statistical inference and causal analysis	Methodological and didactical implications

10. Methodological and didactical implications

"In fact, I find it easier teaching Bayesian statistics than frequentist statistics. There is a single, pivotal notion - Bayes' rule - that describes the process of learning. Bayes' rule is especially easy to teach, and it is easy for students to use." (D.A. Berry)

Consulting

Psychologists - Pharmaceutical companies.

Teaching of Bayesian methods for the analysis of experimental data

1. Criticisms of usual significance tests