"The field of statistics continues to flourish despite, and partly because of, its foundational controversies."
(Efron, 1978)
    
Citations sur l'inférence statistique
Quotes about statistical inference
2010
  
          
          Auteurs
          /
        Authors      
        
      
      
        Bruno LECOUTRE
      
      
Directeur de recherche C.N.R.S. retraité 
      
      
      
                Jacques POITEVINEAU
      
      
        Ingénieur d'études C.N.R.S. retraité
  
        
               
      
        
        Generalities
      
      "If pressed, we would probably argue that Bayesian statistics (with emphasis on objective Bayesian methodology) should
      be the type of statistics that is taught to the masses, with frequentist statistics being taught primarily to
       advanced statisticians." (Bayarri & Berger, 2004, page 59)
      

      "What is done in practice is to use the
      confidence procedure on a series of different problems - not use the confidence
      procedure for a series of repetitions of the same problem with different data
      (which would typically make no sense in practice)." (Bayarri & Berger, 2004, page 60)
      

      "ANOVA may be the most commonly used statistical procedure. It is assuredly
      the most commonly misused statistical procedure!" (Berry, 1996, page 395)
      

      "There were far too many studies to plan and too much data to analyze to worry seriously
      about what the p-values and confidence coefficients produced by the package actually meant."
      (Breslow, 1990, page 269)
      

      "Statistical techniques must be chosen and used to aid, but not to replace, relevant thought." (Bryan-Jones & Finney, 1983)
      

      "My recommendation is to give always a look at the data, since the eye
      of the expert is in most simple (i.e. low-dimensional) cases better that automatic tests." (D'Agostini, 2000)
      

      "[...] statisticians believe that statistics exists as a discipline in its own right,
      even if they can't agree on its exact nature." (Efron, 1978)
      

      "The world of applied statistics seems to need an effective compromise between Bayesian and frequentist ideas." (Efron, 1998)
      

      "[...] the familiar optimality criteria of statistics are in fact in conflict with scientific principles [...]"
      (Fraser & Reid, 1990)
      

      "We need statistical thinking, not rituals." (Gigerenzer, 1998)
      

      "Statistics has unfortunately achieved almost the status of a superstition in some quarters in psychology, and I hope,
      in all humility, that this text sets a slightly more liberal and rational one." (Hays, 1963,
      pages vi-vii)
      

      "But if there is ever a conflict between
      the use of a statistical technique and common sense, then common sense comes first." (Hays, 1973, page 386)
      

      "For many years, statistics textbooks have
      followed this 'canonical' procedure: (1) the reader is warned not to use
      the discredited methods of Bayes and Laplace, (2) an orthodox method is
      extolled as superior and applied to a few simple problems, (3) the
      corresponding Bayesian solution are 
not worked out or described in any
      way. The net result is that no evidence whatsoever is offered to substantiate
      the claim of superiority of the orthodox method. [...] The orthodox results are
      satisfactory only when they agree closely (or exactly) with the Bayesian
      results. No contrary example has yet been produced. [...] We conclude that
      orthodox claims of superiority are totally unjustified; today the original
      statistical methods of Bayes and Laplace stand in apposition of proven
      superiority in actual performance, that places them beyond the reach of mere
      ideological or philosophical attacks. It is the continued teaching and use of
      orthodox methods that is in need of justification and defense." (Jaynes, 1976, page 175)
      

      "[...] I should think that orthodox teachers would be very troubled by the following
      situation. Who have made the important advances in statistical practice in this
      Century? Others will judge differently, but my own list is:
      'Student', Jeffreys, Fisher, Wiener, von Neumann, Shannon, Wald,
      Zellner, Burg, Skilling. Here we find a chemist, a physicist, a eugenicist, two
      mathematicians, an economist, an astromer, tow engineers - and only one
      professional statistician! Whatever
      list one makes, I think he will find that most of the important
      advances have come from outside the profession, and had to make their way
      against the opposition of most statisticians." (Jaynes, 1985, page 46)
      

      "The statistician can provide guidance as to what the statistics mean; but the
      individual consumer of the statistics remains the ultimate judge of whether the
      evidence of any experiment is convincing.
      Statistics cannot substitute for good
      judgement, nor can it transform a flawed experiment into a valid one. Where an
      experiment cannot distinguish between two equally capable explanations, no
      amount of statistical analysis will change that situation. Where data are at
      the margins of detectablility, the solution is to design a better experiment,
      not more statistics." (Jefferys, 1992)
      

      "The difficulty is that the solution to this problem [finding the most powerful test] has no relevance
      
per se to the problems of applied statistics..." (Kempthorne, 1977)
      

      "I know of no field where the foundations are of such practical importance as in statistics." (Lindley, 1972)
      

      "[...]whether the probabilities should only refer to data and 
      be based on frequency or whether they should also apply to hypotheses and be regarded as measures of beliefs." (Lindley, 1993)
      

      "At any rate what I feel quite sure at the moment to be needed is
      simple illustration of the new [Bayesian] notions on real, everyday statistical
      problems." (E.S. Pearson, 1962)
      

      "This points to the difference between statistics as an effort to learn, to get at the truth, and decision theory - a
      difference that was emphasized by Fisher in some of his disputes with Neyman." (Lehmann, 1998)
      

      "But we
      must question the value of statistical research 'stimulated by its
      mathematical, rather than practical, aspects' [McDermott & Wang, in Perlman
      & Wu, 1999, page 375] when such work produces impractical procedures
      that are them promoted (fortunately unsuccessfully) to the applied community."
      (Perlman & Wu, 1999, page 378)
      

      "We hope that we have alerted statisticians
      to the dangers inherent in uncritical application of the NP [Neyman &
      Pearson] criterion, and, more generally, convinced them to join Fisher, Cox and
      many others in carefully weighing the scientific relevance and logical
      consistency of any mathematical criterion proposed for statistical theory." (Perlman & Wu, 1999, page 381)
      

      "Neyman and Pearson contributed vitally to our understanding by their 
formulation
      of statistical problems, but they have never claimed their 
methods were more than ad hoc procedures with
      some pleasant properties. Their methods, while extremely ingenious and useful,
      are not completely satisfactory, let alone uniquely objective and scientific." (Pratt, 1962)
      

      "Statistical 'recipes' are followed blindly, and ritual has taken over from scientific thinking." (Preece)
      

      "I cannot see how anyone could now agree with this [Fisher's 1935 quote about experiments and null hypotheses]." (Preece)
      

      "[Neyman-Pearson theory] does not address the problem of representing and interpreting statistical evidence, and the
      decision rules derived from NP theory are not appropriate tools for
      interpreting data as evidence." (Royall, 1997, page 58)
      

      [A need for] "[...] development of diagnostic tools with a greater emphasis on assessing
      the usefulness of an assumed model for specic purposes at hand rather than on whether the model is true." (Tiao & Xu, 1993)
      

      "It is far better to arrive at an appropriate answer to the right question, which is often vague, than the exact
      answer to the wrong question, which can always be made precise." (Tukey, 1962)
      
      
      
               
      
        
      Null Hypothesis Significance Testing [NHST]
      
      "Somehow there has developed a widespread
      belief that statistical analysis is legitimate only if it includes significance
      testing. This belief leads to, and is fostered by, numerous introductory
      statistics texts that are little more than catalogues of techniques for
      performing significance tests." (Altman, 1985)
      

      "The test is like a gauge on the dashboard" (Anonymous psychology researcher, 
in
      Lecoutre M.-P., 2000, page 77)
      

      "Tests of the null hypothesis that there is
      no difference between certain treatments are often made in the analysis of agricultural
      or industrial experiments in which alternative methods or processes are
      compared. Such tests are [...] totally irrelevant. What are needed are
      estimates of magnitudes of effects, with standard errors." (Anscombe, 1956)
      

      "The common practice of reporting only the
      significance level of the test and not the data on which it was calculated
      (often justified on grounds of space) ensures that no conflict with other
      research can be detected." (Atkins & Jarrett, 1981, page 101)
      

      "It is hardly surprising that empirical
      views of science, and the structure of careers and institutions in social
      science, provide fertile ground for the use of procedures [significance tests]
      which tend to disguise the inadequacy of measurements, and the lack of
      developed theoretical explanations - as well as discouraging debate about
      alternative procedures."
      (Atkins & Jarrett, 1981, page 105)
      

      "The test of significance does not provide
      the information concerning psychological phenomena characteristically
      attributed to it; [...] a great deal of mischief has been associated with its use." (Bakan, 1966, page 423)
      

      "We need to get on with the business of
      generating psychological hypotheses and proceed to do investigations and make
      inferences which bear on them, instead of, as so much of our literature would
      attest, testing the statistical null hypothesis in any number of contexts in
      which we have every reason to suppose that it is false in the first place." (Bakan, 1967,
      
in 
      Morrison
      & Henkel, 1970, page 251)
      

      "When we reach a point where our
      statistical procedures are substitutes instead of aids to thought, and we are
      led to absurdities, then we must return to common sense."
      (Bakan, 1967,
      
in  Morrison & Henkel, 1970)
      

        "This device [hypothesis testing] tells you the chance of getting a 
      particular result, given that your theory is true. But that isn't what you 
      want to know. You already know that you 
have gotten these results, and talking 
      about the probability of getting them is somehow silly. What you want to know 
      is the probability of your theory being true, given that you got these 
      results." (Becker, 1998, page  21)
      
      "Cette ficelle [des tests d'hypothèses] nous indique la probabilité
    que nous avons d'obtenir un résultat donné si notre théorie est juste.
    Mais ce n'est pas ce que nous recherchons. Nous savons déjà que nous
    avons obtenu ces résultats, et parler de la probabilité de leur
    obtention est finalement assez stupide. Ce que nous voulons savoir,
  c'est la probabilité qu'a notre théorie d'être juste étant donné les
    résultats que nous avons obtenus." (traduction française de 2002, page  52) 
      

      "Statistics looks very bad when it
      recommends a conclusion that clearly contradicts common sense."
      (Berger & Wolpert, 1988, page 141)
      

      "In a world in which only significant
      results are published, this makes researchers into gamblers, whose careers
      depend on the outcome of the chance events they are attempting to control
      for."
      (Blaich,
      1998, page 194)
      

      "It is our belief that the great reliance
      placed by many sociologists on tests of significance is chiefly an attempt to
      provide scientific legitimacy to empirical research without adequate
      theoretical significance." (Camilleri, 1962)
      

      [NHST] "is not only useless, it is also harmful because it is interpreted to mean something it is not."
      (Carver, 1978, page 392)
      

      [NHST] a "corrupt form of the scientific method" (Carver, 1993, page 288)
      

      "The best research articles are those that
      include 
no tests of statistical significance." (Carver, 1993, page 289)
      

      "[...] a Bayesian is someone who doesn't
      understand what a frequentist is, and a frequentist is someone who doesn't
      understand what a Bayesian is" [from charles@clef.demon.co.uk,
      http://www.lns.cornell.edu/spr/2002-03/msg0040564.html, June 2003]
      

      "Many of tests reported in the
      
Journal [
of Wildlife Management] and the [
Wildlife Society]
      
Bulletin are unnecessary." (Cherry, 1998, page 947)
      

      "[NHST] does not tell us what we want to know, and we so much want to know what we want
      to know that, out of desperation, we nevertheless believe that it does!" (Cohen 1994, page 997)
      

      "[NHST] has not only failed to support the advance of psychology as a science but also
      has seriously impeded it." (Cohen, 1994, page 997)
      

      "When passing null hypothesis tests becomes the criterion for
      successful predictions, as well as for journal publications, there is no
      pressure on the psychology researcher to build a solid, accurate theory; all he
      or she is required to do, it seems, is produce
      'statistically significant' results." (Dar, 1987,page 149)
      

      "[NHST] An automatic routine" (Falk & Greenbaum, 1995)
      

      "[NHST] fail[s] to give us the information we need [...] induce[s] the illusion that we
      have it." (Falk & Greenbaum, 1995, page 94)
      

      "Rigid dependence upon significance tests in single experiments is to be deplored." (Finney)
      

      "The statistical examination of a body of
      data is thus logically similar to the general alternation of inductive and deductive
      methods throughout the sciences. A hypothesis is conceived and defined with all
      necessary exactitude; its logical consequences are ascertained by a deductive
      argument; these consequences are compared with the available observations; if
      these are completely in accord with the deductions, the hypothesis is justified
      at least until fresh and more stringent observations are available."
      (Fisher, 1990/1925, page 8)
      

      "[...] for the tests of significance are
      used as an aid to judgement, and should not be confused with automatic
      acceptance tests, or 'decision functions'."' (Fisher, 1990/1925,
      page  128)
      

      "Though recognizable as a psychological
      condition of reluctance, or resistance to the acceptance of a proposition, the
      feeling induced by a test of significance has an objective basis in that the
      probability statement on which it is based is a fact communicable to, and
      verifiable by, other rational minds. The level of significance in such cases
      fulfils the conditions of a measure of the rational grounds for the disbelief
      it engenders. It is more primitive, or elemental than, and does not justify,
      any exact probability statement about the proposition." (Fisher, 1990/1956, page 46)
      

      "Whereas, the only populations that can be
      referred to in a test of significance have no objective reality, being
      exclusively the product of the statistician's imagination through the
      hypotheses which he has decided to test, or usually indeed of some specific
      aspects of these hypotheses." (Fisher, 1990/1956, page 81)
      

      "[The] statistician who advertises the
      [scientifically unacceptable] procedure is guilty of professional
      misconduct." (Fraser & Reid, 
in Brown, 1990, page 507)
      

      "A way of thinking that has survived
      decades of ferocious attacks is likely to have some value." [An anonymous
      reviewer,
      
in
      Frick, 1996,
      page 379]
      

      "Thus null hypothesis testing is an optimal
      method for demonstrating sufficient evidence for an ordinal claim."
      (Frick, 1996, page 379)
      

      [NHST] "A mechanical behavior" (Gigerenzer, 1991)
      

      "The null hypothesis significance test (NHST) should not even exist, much less thrive
      as the dominant method for presenting statistical evidence in the social scientists.
      It is intellectually bankrupt and deeply flawed on logical and practical grounds.
      More than a few authors have convincing demonstrated this [...]" (Gill, 2004, page 39)
      

      "In psychology, it [NHST] has been
      practiced like ritualistics handwashing and sustained by wishful thinking about
      its utility." (Gigerenzer, 1998)
      

      "This ritual [NHST] discourages theory
      development by providing researchers with no incentive to specify
      hypotheses." (Gigerenzer, 1998)
      

      "It is misleading to tell a Student he must decide on his significance test in
      advance, although it is correct according to the Fisherian technique."
      (Good, 1976, page 54)
      

      The "star worshippers" (Guttman, 1983)
      

      "Despite their wide use in scientific
      journals such as The Journal of Wildlife Management, statistical hypothesis
      tests add very little value to the products of research. Indeed, they
      frequently confuse the interpretation of data." (Johnson, 1999, page 63)
      

      "There is a rising feeling among
      statisticians that hypothesis tests [...] are not the most meaningful analyses." (Jones, 1984)
      

      "At its worst, the results of statistical
      hypothesis testing can be seriously misleading, and at its best, it offers no
      informational advantage over its alternatives." (Jones & Matloff, 1986)
      

      "[...] in fact, focusing on 
p values and rejecting null hypotheses
      actually distracts us from reaching our goals: deciding whether data support
      our scientific hypothesis and are practically significant or useful."
      (Kirk, 1996, page 755)
      

      "Seventy-five years of null hypothesis testing has taught us the folly of blindly adhering to a
      ritualized procedure."
      (Kirk, 2001, page 217)
      

      "I believe
      that clear rationales for hypothesis testing (unified or not) should replace
      murky decision-theoretic metaphors, and that this replacement will facilitate
      improvements in both teaching and practice." (Krantz, 1999, page 1380)
      

      "Because of the relative simplicity of its
      structure, significance testing has been overemphasized in some presentations
      of statistics, and as a result some students come mistakenly to feel that
      statistics is little else than significance testing." (Kruskal)
      

      "Il est hélas tentant, lorsque le problème est complexe et possède trop de degrés de liberté,
      de les [les tests statistiques] utiliser mécaniquement et de s'en remettre à leur
      'froid jugement'. C'est une erreur qui a été maintes fois relevée dans la littérature statistique
      [...] et qui relève peut-être de ces '
restes de magie qui subsistent au coeur de chacun'
      dont parlait Alfred Sauvy." (Ladiray, 2002, pages 6-7)
      

      "However the use of NHST is such an integral part of scientists' behavior that its misuses and abuses should not be
      discontinued by flinging it out of the window." (Lecoutre, Lecoutre & Poitevineau, 2001, page 413)
      

      "Few concepts in the social sciences have wielded more discriminatory power over the status of knowledge claims
      than that of statistical significance." (Litle, 2001, page 363)
      

      "I believe part of the difficulty with the current use of NHST is the exaggerated
      practical implications that have come to be attached to its results."
      (Locascio, 1999)
      

      "Despite
      the stranglehold that hypothesis testing has on experimental psychology, I find
      it difficult to imagine a less insightful means of transiting from data to conclusions." (Loftus, 1991, page 103)
      

      "Null Hypothesis Statistical Testing, as typically utilized, is barren as
      a means of transiting from data to conclusions." (Loftus, 1996)
      

      "Problems stemming from the fact that
      hypothesis tests do not address questions of scientific interest."
      (Matloff, 1991)
      

      "I suggest to you that Sir Ronald [Fisher] has
      befuddled us, mesmerized us, and led us down the primrose path. I believe that
      the almost universal reliance on merely refuting the null hypothesis as the
      standard method for corroborating substantive theories in the soft areas is a
      terrible mistake, is basically unsound, poor scientific strategy, and one of
      the worst things that ever happened in the history of psychology." (Meehl, 1978, page 817)
      

      "Some hesitation about the unthinking use
      of significance tests is a sign of statistical maturity." (Moore & McCabe, 1993)
      

      "[...] thus, any difference in the groups
      on a particular variable in a given assignment will have some calculable
      probability of being due to errors in the assignment procedure..."
      (Morrison & Henkel, 1969,
      
in 
      Morrison
      & Henkel, 1970, pages 195-196)
      

      "The test
      provides neither the necessary nor the sufficient scope or type of knowledge
      that basic scientific social research requires."
      (Morrison
      & Henkel, 1969,
      
in 
      Morrison &
      Henkel, 1970, page 198)
      

      "In addition to important technical errors,
      fundamental errors in the philosophy of science are frequently involved in this
      indiscriminate use of the tests [of significance]." (Morrison & Henkel, 1969,
      
in 
      Morrison &
      Henkel, 1970)
      

      "The question many researchers (especially
      those interested in the application of science to solve practical problems)
      want to ask is whether the effects are large enough to make a real difference.
      The statistical tests most frequently encountered in the social and behavioral
      sciences do not directly address this question."
      (Murphy & Myors, 1999, page 234)
      

      "The grotesque emphasis on significance
      tests in statistics courses of all kinds [...] is taught to people, who if they
      come away with no other notion, will remember that statistics is about tests
      for significant differences. [...] The apparatus on which their statistics
      course has been constructed is often worse than irrelevant, it is misleading
      about what is important in examining data and making inferences." (Nelder)
      

      "I contend
      that the general acceptance of statistical hypothesis testing is one of the
      most unfortunate aspects of 20th century applied science." (Nester, 1996)
      

      "[...] if the null hypothesis is not
      rejected, it is usually because the N is too small. If enough data are
      gathered, the hypothesis will generally be rejected. If rejection of the null
      hypothesis were the real intention in psychological experiments, there usually
      would be no need to gather data." (Nunnally, 1960)
      

      "Probably few methodological issues have
      generated as much controversy among sociobehavioral scientists as the use of
      [Null Hypothesis Significance] tests."
      (Pedhazur & Schmelkin, 1991, page 198)
      

      "This reinforces the well-documented but
      oft-neglected fact that the Neyman-Pearson theory desideratum of a more (or
      most) powerful size alpha test may be scientifically inappropriate; the same is
      true for the criteria of unbiasedness and alpha-admissibility." (Perlman & Wu, 1999, page 355)
      

      "[...] Berger and Hsu (1996, page 192)
      make the following statement: "We believe that notions of size, power, and
      unbiasedness are more fundamental than 'intuition'..." In our opinion,
      such a statement places the credibility of statistical science at serious risk
      within the scientific community. If we are indeed teaching our students to
      disregard intuition in scientific inquiry, then a fundamental reassessment of
      the mission of mathematical statistics is urgently needed." (Perlman & Wu, 1999, page 366)
      [Berger's reply: "If we are indeed teaching our students to disregard intuition in scientific inquiry,
      then a fundamental reassessment of the mission of mathematical statistics is urgently needed." (page 373)]
      
      
      "[...] Le chercheur qui
      présente un résultat significatif, tel le vainqueur d'une épreuve sportive,
      fait souvent l'objet de suspicion et doit satisfaire à un contrôle avant que
      son résultat soit homologué (publié). C'est le rôle des éditeurs et rapporteurs
      des revues aux réserves desquels l'expérimentateur est souvent confronté.
      Malheureusement, la norme est si bien établie que ces réserves portent presque
      exclusivement sur la validité des tests (A-t-on utilisé le bon test? Les
      conditions d'application sont-elles satisfaisantes? Etc.) et non sur leur pertinence (Le
      test répond-il vraiment à la question posée?)." (Poitevineau, 1998, page 11)
      
        "Tests [of hypotheses] provide a poor model
        of most real problems, usually so poor that their objectivity is tangential and
        often too poor to be useful." (Pratt, 1976)
        

        "This reduces the role of tests essentially
        to convention. Convention is useful in daily life, law, religion, and politics,
        but it impedes philosophy." (Pratt, 1976)
        

        "Over-emphasis on significance-testing continues." (Preece)
        

        "Given the many attacks on it,
        null-hypothesis testing should be dead." (Rindskopf, 1997, page 319)
        

        "[...] there is the current prestige of 
        
exact tests
        in statistics. The magic of 'exactness' must be qualified of course.
        Student's
        
t
        -test was (and still is) an exact test to!"
        (Rouanet & Bert, 2000, page 121)
        

        "The stranglehold that conventional null
        hypothesis significance testing has clamped on publication standards must be broken." (Rozeboom, 1960,
        
in  Morrison & Henkel, 1970, page 230)
        

        "The traditional null hypothesis significance-test method, more appropriately called 'null hypothesis decision
        [NHD] procedure',of statistical analysis is here vigorously excoriated for its inappropriateness as a method of
        
inference." (Rozeboom, 1960, 
in  Morrison & Henkel, 1970, page 230)
        

        "[NHST] Surely the most bone-headedly misguided procedure ever institutionalised in the rote training of science
        students." (Rozeboom, 1997, page 335)
        

        "The 'religion of statistics' with its rites such as the use of the profoundly
        mysterious symbols of the religion 
NS, *, **, and 
mirabile dictu ***." (Salsburg, 1985)
        

        "One of the dangers of small samples is the
        discarding of valid results simply because of the relatively high probability
        that they 
might have occurred by chance." (Selvin, 1957, 
in  Morrison & Henkel, 1970, page 110)
        

        [NHST] "such tests do not provide the information that many researchers assume they do" (Shaver, 1993, page 294)
        

        [NHST] "diverts attention and energy from
        more appropriate strategies such as replication and consideration of the
        practical or theoretical significance of results" (Shaver, 1993, page 294)
        

 
        "One of the chief drawbacks on the F-test in ANOVA is that by itself, 
F(dfb,dfw)
        tells us hardly anything useful about what effects our experiment has had."
        (Smithson, 2000, page 238)
        

        "Fisher [...] appears to have placed an undue emphasis on the significance test." (Street, 1990)
        

        "Superficial understanding of significance
        testing has led to serious distortions, such as researchers interpreting
        significant results involving large effect sizes." (Thompson, 1989, page 2)
        

        "Tired researchers, having collected data
        from hundreds of subjects, then conduct a statistical test to evaluate whether
        there were a lot of subjects, which the researchers already know, because they
        collected the data and they are tired."
        (Thompson, 1993a, page 363)
        

        "Never use the unfortunate expression
        'accept the null hypothesis'. " (Wilkinson and Task Force on Statistical Inference, 1999)
        

        "We believe that although unreasonable claims are sometimes made for the test of
        significance and that although many have sinned in implicitly treating
        statistical significance as proof of a favored explanation, still the social scientists
        is better off for using the significance test that for ignoring it. More
        precisely, it is our judgment that although the test of significance is
        irrelevant to the interpretation of the cause of a difference, still it does
        provide a relevant and useful way of assessing the relative likelihood that a
        real difference exists and is worthy of interpretive attention, as opposed to
        the hypothesis that the set of data could be a haphazard arrangement."
        (Winch & Campbell, 1969,
        
in
        Morrison & Henkel, 1970,
        page 
        199)
        

        "The present writers think that the
        indiscriminate cataloguing of trivial effects is, in fact, a major problem in
        psychology today...".
        (Wilson
        
et al., 1967)
        

        "We reason that it is very important to
        have a formal and nonsubjective way of deciding whether a given set of data
        shows haphazard or systematic variation. [...] And we believe it is important
        not to leave the determination of what is systematic or haphazard arrangement
        of data to the intuition of the investigator." (Winch & Campbell, 1969, 
in 
        Morrison & Henkel, 1970, page  206)
        

        "The emphasis on tests of significance, and
        the consideration of the results of each experiment in isolation, have had the
        unfortunate consequence that scientific workers have often regarded the execution
        of a test of significance on an experiment as the ultimate objective. Results
        are significant or not significant and this is the end of it." (Yates, 1951)
        
      
      
                  
              
        Bayesianinference
        
        challenge of scientific objectivity. This is the ultimate stronghold of the
        frequentist viewpoint. If the 21st century is Bayesian, my guessing is that it
        will be some combination of subjective, objective and empirical Bayesian, not
        significantly less complicated and contradictory that the present situation." (Efron, 1978)
        

        "A widely accepted objective Bayes theory,
        which fiducial inference was intended to be, would be of immense theoretical
        and practical importance. A successful objective Bayes theory would have to
        provide good frequentist properties in familiar situations, for instance,
        reasonable coverage probabilities for whatever replaces confidence intervals." (Efron, 1998)
        

        "Bayesian inference might, in principle,
        fill the void created by abandoning significance-testing [...] Implementation
        of Bayesian analysis, however, requires subjective assessments of prior
        distributions, and often involves technical problems." (Falk & Greenbaum, 1995)
        

        "It is still wonder they [Bayesians] are
        still treated as a kind of lunatic fringe preaching a doctrine so pure and
        untainted by the real world as to make it useful for little other than
        academics furthering their research careers'." (Freeman, 1993, page  1450)
        

        "Bayesian posterior probabilities are exactly what scientists want." (Goodman & Berlin, 1994, page 203)
        

        "Confidence intervals should play an
        important role when setting sample size, and power should play no role once the
        data have been collected. [...] In this commentary, we present the reasons why
        the calculation of power after a study is over is inappropriate and how
        confidence intervals can be used during both study design and study
        interpretation." (Goodman & Berlin, 1994, page 200)
        

        "For interpretation of observed results, the concept of power has no place, and confidence intervals, likelihood, or
        Bayesian methods should be used instead." (Goodman & Berlin, 1994, page 205)
        

        "In fact, one can argue that the objections
        to subjective probability and Bayes are a nineteenth century aberration which
        has been redressed in the last thirty years - except by psychologists."
        (Gregson, 1998, page 202)
        

        "[Bayesian analysis provides] direct probability statements - which are what most people wrongly assume they are
        getting from conventional statistics." (Grunkemeier & Payne, 2002, page 1901)
        

        "It could be argued that since most physicians use statement A [the probability the true mean value is in the
        interval is 95%] to describe 'confidence' intervals, what they really want are
        'probability' intervals. Since to get them they must use Bayesian methods, then
        they are really Bayesians at heart!" (Grunkemeier & Payne, 2002, page 1904)
        

        "As long as we are uncertain about values of parameters, we will fall into the Bayesian camp." (Iversen, 2000, page 10)
        

        "Bayesian statistics, because of its straightforward interpretation, and because the assumptions are out in the
        open, offers a way to clarify and sharpen our thinking about experiments, and
        by giving us new insight about why parapsychological experiments are not having
        their intended effect of convincing a skeptical scientific world, they can
        point out research directions that might be more fruitful." (Jefferys, 1992)
        

        "Again it is not clear why such a set [the confidence
        interval] should be of interest unless one makes the natural error of thinking
        of the parameter as random and the confidence set as containing the parameter
        with a specified probability. Again, this is a statement only a Bayesian can
        make, although confidence intervals are often so misinterpreted. I find the
        classical quantities useless for making decisions and believe that they are
        widely misinterpreted as Bayesian because the Bayesian quantities are more
        natural." (Kadane, 1995, page 316)
        

        "The utility of the Bayesian approach is increasingly being recognized by the scientific establishment."
        (Krueger & Funder, 2001)
        

        "In our view, the way in which the Bayesian approach is used in an area of research
        reflects the maturity of this field." (Krueger & Funder, 2001)
        
        "Depuis 1973, l'Analyse
        des Comparaisons a intégré les techniques bayésiennes, classiques et
        contemporaines (Jeffreys, Lindley, etc.), mais en les utilisant avec une
        motivation fiduciaire (Fisher). Ces techniques nous paraissent en effet les
        mieux adaptées pour pallier les insuffisances [...] des tests de signification
        traditionnels." (Lecoutre, Rouanet & Denhière, 1988, page 384)
        
        "We [statisticians] will all be Bayesians in 2020, and then we can be a united profession." (Lindley,
        
in Smith, 1995, page 317)
        

        "This is the likelihood principle according to which values of x, other than that observed, play no role in inference." (Lindley, 2000)
        

        "A non-Bayesian states that there is a 95%
        chance that the [obtained] confidence interval contains the true value of the
        population mean. A Bayesian would say there is a 95% chance that the population
        mean falls between the obtained limits. One is a probability statement about
        the interval, the other about the population parameter." (Phillips, 1973, page 335)
        

        "Null-hypothesis tests are not completely
        stupid, but Bayesian statistics are better." (Rindskopf, 1998)
        

        "The motivation for using this methodology
        [a Bayesian approach] is practical rather than ideological."
        (Spiegelhalter, Freedman & Parmar,
        1994, page 357)
        

        "This state of affairs [the reluctance of
        scientists to use Bayesian inferential procedures in practice] appears to be
        due to a combination of factors including philosophical conviction, tradition,
        statistical training, lack of 'availability', computational difficulties,
        reporting difficulties, and perceived resistance by journal editors."
        (Winkler, 1974, page 129)
        
        
                 
        
        
         0.05/Choice of the level of significance
        
        
        "Another problem associated with the test of significance. The particular level of significance chosen for an
        investigation is not a logical consequence of the theory of statistical inference." (Camilleri, 1962,
        
in  Morrison & Henkel, 1970)
        

        "It is convenient to draw the line at about the level at which we can say:
        'Either there is something in the treatment, or a coincidence has occurred such as does not
        occur more than once in twenty trials.' [...] If one in twenty does not seem high enough odds, we may, if we prefer
        it, draw the line at one in fifty (the 2 per cent point), or once in a hundred
        (the 1 per cent point). Personally, the writer prefers to set a low standard of
        significance at the 5 per cent point, and ignore entirely all results which
        fail to reach that level. A scientific fact should be regarded as
        experimentally established only if a properly designed experiment
        
rarely fails to give this level of significance." (Fisher, 1926, page 504)
        

        "[...]
        the sanctified (and
        sanctifying) magic .05 level. This basis for decision has played a remarkable
        role in the social sciences and in the lives of social scientists. In governing
        decisions about the status of null hypotheses, it came to determine decisions
        about the acceptance of doctoral dissertations and the granting of research
        funding, and about publication, promotion, and whether to have a baby just now.
        Its arbitrary unreasonable tyranny has led to data fudging of varying degrees
        of subtlety from grossly altering data to dropping cases where there
        'must have been errors'." (Cohen, 1990, page 1307)
        

        "Do people, scientists and nonscientists,
        generally feel that an event which occurs 5% of the time or less is a rare
        event? If the answer [...]. is 'Yes,' [...] then the adoption of the level as a
        criterion for judging outcomes is justifiable." (Cowles and Davis, 1982, page 557)
        

        "Fisher "advocated 5% as the standard level." (Lehmann, 1993) [But see Fisher, 1990c/1956, p. 45: [...] for in
        fact no scientific worker has a fixed level of significance at which from year
        to year, and in all circumstances, he rejects hypotheses; he rather gives his
        mind to each particular case in the light of evidence and his ideas."].
        

        "[...] the degree of conviction is not
        even approximately the same in two situations with equal significance levels.
        5% in to-day's small sample does not mean the same as 5% in to-morrow's large
        one." (Lindley, 1957, page 189)
        

        "The current obsession with .05 [...] has
        the consequence of differentiating significant research findings and those best
        forgotten, published studies from unpublished ones, and renewal of grants from
        termination. It would not be difficult to document the joy experienced by a
        social scientist when his 
F ratio or 
t value yields significance at .05, nor
        his horror when the table reads 'only' .10 or .06. One comes to internalize the
        difference between .05 and .06 as 'right' 
vs. 'wrong', 'creditable' 
vs.
        'embarrassing', 'success' 
vs. 'failure'." (Skipper, Guenther & Nass, 1967)
        

        "Blind adherence to the .05 level denies
        any consideration of alternative strategies, and it is a serious impediment to
        the interpretation of data." (Skipper, Guenther & Nass, 1967)
        

        "Surely, God loves the .06 nearly as much as the .05." (Rosnow & Rosenthal, 1989, page 1277)
        

        "It may not be an exaggeration to say that
        for many Ph.D. students, form whom the .05 has acquired almost an ontological
        mystique, it can mean joy, a doctoral degree, and a tenure-track position at a
        major university if their dissertation 
p is less than .05. However, if
        the 
p is greater than .05, it can mean ruin, despair, and their
        advisor's thinking of a new control condition that should be run." (Rosnow & Rosenthal, 1989, page 1277)
        

        [alpha=.05] "A deliberate attempt to offer
        a standardized, public method for objectifying an individual scientist's
        willingness to make an inference." (Wilson, Miller & Loweret, 1967, page 191)
        
        
      
        
        
        
        Conception of probability
        
        
        "Identifying probability with frequency is like confusing a table with the English world 'table'." (D'Agostini, 2000)
        

        "The subject of a probability statement if
        we know what we are talking about, is singular and unique; we have some degree
        of uncertainty about its value, and it so happens that we can specify the exact
        nature and extent of our uncertainty by means of the concept of Mathematical
        Probability as developed by the great mathematicians of the 17
th
        century Fermat, Pascal, Leibnitz, Bernoulli and their immediate
        followers." [...] "The probability statements refer to the particular
        throw or to the particular result of shuffling the cards, on which the gambler
        lays his stake. The state of rational uncertainty in which he is placed may be
        
equated to that of the different
        situation which can be imagined in which his throw is chosen at random out of
        an aggregate of throws, or of shufflings, which might equally well have
        occurred, though such aggregates exist only in imagination." (Fisher, 1959, page 22)
        

        "For Fisher, probability appears as a measure of uncertainty applicable in certain cases but,
        regretfully, not in all cases. For me, it is solely the answer to the question 'How frequently this or
        that happens.'" (Neyman, 1952, page 187)
        

        "[...] Isn't this equivalent to discussing
        the probabilities of hypotheses themselves, which would be useless? E.g.,
        it would be useless to discuss the probability of Student's hypothesis because
        this would be the same as the probability of
        
mu= 0. As 
mu is an unknown constant, the probability of
        
mu being equal to zero must be either 
P{
mu = 0} = 0 or
        
P{
mu = 0} = 1 and, without obtaining precise
        information as to whether 
mu is equal to zero or not, it would be
        impossible to decide what is the value of 
P{
mu =  0}.
        Undoubtedly, 
mu is an unknown constant and, as far as we
        deal with the theory of probability as described in my first two lectures, it
        is useless to consider 
P{
mu =  0}." (Neyman, 1952, page 56)
        
        
                  
              
       
        
        
                Confidence intervals/Interval estimates
        
        
        "[Confidence intervals] in general, the best
        reporting strategy. The use of confidence intervals is therefore strongly
        recommended." (APA 
Publication Manual, 2001, page 22)
        

        "Significance testing in general has been a greatly overworked
        procedure, and in many cases where significance statements have been made it would
        have been better to provide an interval within which the value of the parameter
        would be expected to lie." (Box, Hunter & Hunter, 1978)
        

        "Confidence intervals avoid the problems of classic significance
        tests. They do not require a-priori hypotheses, nor do they test trivial
        hypotheses. Confidence intervals comprise the information of a significance
        test and are considerably easier to understand, which results in their didactic
        superiority." (Brandstaetter, 1999, page 43)
        

        "The question
        as to whether significance tests should replace confidence intervals or not can
        be answered with a guarded "yes". Confidence intervals contain the
        information of a significance test, therefore there is no loss of information
        and no risk involved when confidence intervals replace significance tests.
        Taken together, confidence intervals in addition to replications, graphic
        illustrations and meta-analyses seem to represent a methodically superior
        alternative to significance tests. Hence, in the long run, confidence intervals
        appear to promise a more fruitful avenue for scientific research."
        (Brandstaetter, 1999, page 43)
        

        "Not all statistically significant
        differences are clinically significant. Fortunately, confidence intervals can
        address both clinical and statistical significance." (Braitman, 1991, page 515)
        

        "In a large majority of problems
        (especially location problems) hypothesis testing is inappropriate: Set up the
        confidence interval and be done with it!" (Casella & Berger, 1987)
        

        "Scientists often finish their analysis by quoting a P-value, but this is not the right place to stop.
        One still wants to know how large the effect is, and a confidence interval should be given where
        possible." (Chatfield, 1988, page 51)
        

        "It is far more in-formative to provide a
        confidence interval." (Cohen, 1990, page  1310)
        

        "Objection has sometimes been made that the
        method of calculating Confidence Limits by setting an assigned value such as 1%
        on the frequency of observing 3 or less (or at the other end of observing 3 or
        more) is unrealistic in treating the values less than 3, which have not been
        observed, in exactly the same manner as the value 3, which is the one that has
        been observed.
        
This feature is indeed not very defensible save as an approximation". (Fisher, 1956, page 66, italics added)
        

        "[...] a
        confidence interval can function to indicate which values could not be rejected
        by a two-tailed test with alpha at .05. In this function, the confidence
        interval could replace the report of null hypothesis for just one value,
        instead of communicating the outcome of the tests of all values as null
        hypotheses." (Frick, 1996, page 383)
        

        "Confidence intervals should play an
        important role when setting sample size, and power should play no role once the
        data have been collected. [...] In this commentary, we present the reasons why
        the calculation of power after a study is over is inappropriate and how
        confidence intervals can be used during both study design and study
        interpretation." (Goodman & Berlin, 1994, page 200)
        

        "For interpretation of observed results,
        the concept of power has no place, and confidence intervals, likelihood, or
        Bayesian methods should be used instead." (Goodman & Berlin, 1994, page 205)
        

        "When making inferences about parameters
        [...] hypothesis tests should seldom be used if confidence intervals are
        available [...] the confidence intervals could lead to opposite practical
        conclusions when a test suggests rejection of H
0
        [...] even though H
0 is not rejected, the confidence interval gives
        more useful information." (Graybill, 1976)
        

        "We cannot escape the logic of NHST by turning to point estimates and confidence
        intervals". (Hagen, 1997, page 22)
        

        "For problems where the usual null
        hypothesis defines a special value for a parameter, surely it would be more
        informative to give a confidence range for that parameter." (Hinkley, 1987)
        

        "How about 'alpha and beta risks' and
        'testing the null hypothesis'? [...] The very beginning language employed by
        the statistician describes phenomena in which engineers/physical scientists
        have little practical interest! They want to know how many, how much, and how
        well [...] Required are interval estimates. We offer instead hypothesis tests
        and power curves." (Hunter, 1990)
        

        "Point estimates and their associated CIs are
        much easier for students and researchers to understand and, as a result, are
        much less frequently misinterpreted. Any teacher of statistics knows that it is
        much easier for students to understand point estimates and CIs than
        significance testing with its strangely inverted logic. This is another plus
        for point estimates and CIs."
        (Hunter & Schmidt, 1997, page 56)
        

        "Reporting of results in terms of confidence intervals instead of hypothesis tests should
        be strongly encouraged." (Jones, 1984)
        

        "We recommend that authors display the
        estimate of the difference and the confidence limit for this difference."
        (Jones & Matloff, 1986)
        

        "Prefer confidence intervals when they are available." (Jones & Tukey, 2000)
        

        "Again it is not clear why such a set [the
        confidence interval] should be of interest unless one makes the natural error
        of thinking of the parameter as random and the confidence set as containing the
        parameter with a specified probability. Again, this is a statement only a
        Bayesian can make, although confidence intervals are often so misinterpreted. I
        find the classical quantities useless for making decisions and believe that
        they are widely misinterpreted as Bayesian because the Bayesian quantities are
        more natural." (Kadane, 1995, page 316)
        

        "The preference of many mathematical
        statisticians for confidence interval procedures over significance tests is
        understandable since both procedures involve the same assumptions, but
        confidence interval procedures provide an experimenter with more information"
        (Kirk, 1982)
        

        "I believe that science is best served when
        researchers focus on the size of effects and their practical significance.
        Questions regarding the size of effects are addressed with descriptive
        statistics and confidence intervals. It is hard to understand why researchers
        have been so reluctant to embrace confidence intervals." (Kirk, 2001, page 214
        

        "It is easy to [...] throw out an
        interesting baby with the nonsignificant bath water. Lack of statistical
        significance at a conventional level does not mean that no real effect is
        present; it means only that no real effect is clearly seen from the data. That
        is why it is of the highest importance to look at power and to compute
        confidence intervals." (Kruskal)
        

        "Estimation procedures provide more
        information [than significance tests]: they tell one about reasonable
        alternatives and not just about the reasonableness of one value."
        (Lindley, 1986)
        

        "It is usually wise to give a confidence
        interval for the parameter in which you are interested." (Moore and McCabe)
        

        "The researcher armed with a confidence
        interval, but deprived of the respectability of statistical significance must
        work harder to convince himself and others of the importance of his findings.
        This can only be good." (Oakes, 1986, page 66)
        

        "Although the underlying logic is
        essentially similar they [confidence intervals] are not couched in the pseudo
        scientific hypotheses testing language of significance tests. They do not carry
        with them decision-making implications, but, by giving a plausible range for
        the unknown parameter, they provide a basis for a rational decision should one
        be necessary. Should sample size be inadequate this is signaled by the sheer
        width of the interval." (Oakes, 1986, pages 66-67)
        

        "Above all, interval estimates 
are  estimates of effect size. It is incomparably
        more useful to have a plausible range for the value of a parameter than to
        know, what whatever degree of certitude, what single value is untenable."
        (Oakes, 1986, page 67)
        

        "A confidence interval certainly gives more
        information than the result of a significance test alone [...]
        I [...] recommend its use [standard error of each mean]."
        (Perry, 1986)
        

        "A non-Bayesian states that there is a 95%
        chance that the [obtained] confidence interval contains the true value of the
        population mean. A Bayesian would say there is a 95% chance that the population
        mean falls between the obtained limits. One is a probability statement about
        the interval, the other about the population parameter." (Phillips, 1973, page 335)
        

        "Frequentist reasoning allows that investigators may use the word 
confidence
        for the specific numerical interval, but they are explicitely forbidden to use
        the term 
probability when making inferences for the same interval.
        It is perhaps not suprising that students often have difficulty with this distinction."
        (Prusek, 1997 pages 288-289)
        

        "Unfortunately, knowing that 95% of an
        infinite number of 95% confidence intervals would contain the population mean
        is not the inference that a researcher ordinarily desires. What usually is
        desired is not an inference about psi [the parameter of interest] based on an
        infinite number of confidence intervals but an inference about psi based on the
        results of the specific confidence intervals that is obtained in practice"
        (Reichardt & Gollob, 1997, page 263)
        

        "It would not be scientifically sound to justify a procedure [confidence intervals] by frequentist arguments and to
        interpret it in Bayesian terms." (Rouanet, 1998, page 54)
        

        "Whenever possible, the basic statistical
        report should be in the form of a confidence interval." (Rozeboom, 1960, 
        
in  Morrison & Henkel, 1970, page 227)
        

        "Prior to the appearance of Fisher's 1932 and 1935 texts, data analysis in individual
        studies was typically conducted using point estimates and confidence intervals." (Schmidt, 1996, page 121)
        

        "If we mindlessly
        interpret a confidence interval with reference to whether the interval subsumes
        zero, we are doing little more than nil hypothesis statistical testing"
        (Thompson, 1998, page 800)
        

        "An improved quantitative science would
        emphasize the use of confidence intervals (CIs), and especially CIs for effect sizes." (Thompson, 2002, page 25)
        

        "Probably the greatest ultimate importance among all types of
        statistical procedures we now know, belongs to confidence procedures."
        (Tukey, 1960)
        

        "Interval estimates should be given for any
        effect sizes involving principal outcomes. Provide intervals for correlations
        and other coefficients of association or variation whenever possible."
        (Wilkinson and Task Force on Statistical Inference, 1999)
        
        
                 
 
        
        
        Effect sizes/Magnitude of effects
        
        
        "The general principle to be followed [...] is to provide the reader not only with
        information about statistical significance but also with enough information to
        assess the magnitude of the observed effect or relationship." (APA Publication Manual, 2001 page 26)
        

        "Tests of the null hypothesis that there is
        no difference between certain treatments are often made in the analysis of
        agricultural or industrial experiments in which alternative methods or
        processes are compared. Such tests are [...] totally irrelevant. What are
        needed are estimates of magnitudes of effects, with standard errors."
        (Anscombe, 1956)
        

        "If the test of significance is really of
        such limited appropriateness [...]. At the very least it would appear that we
        would be much better if we were to attempt to estimate the magnitude of the
        parameters in the populations; and recognize that we then need to make other
        inferences concerning the psychological phenomena which may be manifesting
        themselves in these magnitudes." (Bakan, 1967, 
in  Morrison & Henkel, 1970, page 250)
        

        "Nothing is more important in educational and psychological research than making sure
        that the effect size of results is evaluated when tests of statistical
        significance are used." (Carver, 1993, page 289)
        

        "In many experiments it seems obvious that
        the different treatments must have produced some difference, however small, in
        effect. Thus the hypothesis that there is no difference is unrealistic: the
        real problem is to obtain estimates of the sizes of the differences."
        (Cochran & Cox, 1957)
        

        "Estimates and measures of variability are
        more valuable than hypothesis tests." (Cormack, 1985)
        

        "Statistical significance is quite
        different from scientific significance and [...] therefore estimation, at least
        roughly, of the magnitude of effects is in general essential regardless of
        whether statistically significant departure from the null hypothesis is achieved." (Cox, 1977, page 61)
        

        "The primary purpose of analysis of
        variance is to produce estimates of one or more error mean squares, and not (as
        is often believed) to provide significance tests." (Finney)
        

        "I conclude that effect sizes are the single best index of the relationship between theoretical predictions
        and the obtained data." (Harris, 1991)
        

        "The commonest agricultural experiments
        [...] are fertilizer and variety trials. In neither of these is there any
        question of the population treatment means being identical [...] the objective
        is to measure how big the differences are." (Healy)
        

        "Preoccupation with testing 'is there an interaction' in factorial experiments, [...] emphasis should be on 'how strong
        is the interaction?'" (Jones, 1984)
        

        "A null hypothesis rejection means that the researcher is pretty sure of the
        direction of the difference. Is this any way to develop psychological theory? I think not.
        How far would physics have progressed if their researchers had focused on discovering
        ordinal relationships? What we want to know is the size of the difference between
        A and B and the error associated with our estimate; knowing that A is
        greater than B is not enough." (Kirk, 1996, page 754)
        

        "The tests of null hypotheses of zero
        differences, of no relationships, are frequently weak, perhaps trivial
        statements of the researcher's aims [...] in many cases, instead of the tests
        of significance it would be more to the point to measure the magnitudes of the
        relationships, attaching proper statements of their sampling variation. The
        magnitudes of relationships cannot be measured in terms of levels of
        significance." (Kish, 
in  Morrison & Henkel, 1970)
        

        "It is unfortunate that the reporting of effect sizes has been framed as
        a controversy. Reporting of effect sizes is, instead, simply good scientific practice." (Hyde, 2001, page 228)
        

        "The experimental aim should not be to
        establish whether changes have occurred, but rather to estimate whether changes
        have occurred in excess of some stipulated magnitude and importance. When a
        'significant difference' has been established, investigators must then measure
        the size of the effect and consider whether it is of any biological or medical
        importance." (Lutz & Nimmo, 1977, page 77)
        

        "The question many researchers (especially
        those interested in the application of science to solve practical problems)
        want to ask is whether the effects are large enough to make a real difference.
        The statistical tests most frequently encountered in the social and behavioral
        sciences do not directly address this question."
        (Murphy & Myors, 1999, page 234)
        

  
        "Unfortunately, many researchers do not report [...] [effect sizes] along with their
        
F-test results. This is a pity. (Smithson, 2000 pages 245)
        

        "The most commonly occurring weakness [...]
        is [...] undue emphasis on tests of significance, and failure to recognise that
        in many types of experimental work estimates of treatment effects, together
        with estimates of the errors to which they are subject, are the quantities of
        primary interest." (Yates)
        
                
                 
        
        
        Fiducial inference
        
        
        "Maybe Fisher's biggest blunder [the
        fiducial inference] will become a big hit in the 21
st century."
        (Efron, 1998, page 107)
        

        "It is sometimes asserted that the fiducial
        method generally leads to the same results as the method of Confidence
        Intervals. It is difficult to understand how this can be so, since it has been
        firmly laid down that the method of confidence intervals does not lead to
        probability statements about parameters." (Fisher, 1959, page 26)
        

        "When knowledge
        
a priori
        in the form of mathematically exact probability statements
        is available, the fiducial argument is not used, but that of Bayes. Usually
        exact knowledge is absent, and, when the experiment can be so designed that
        estimation can be exhaustive, similar probability statements
        
a posteriori  may be inferred by the fiducial argument." (Fisher, 1990/1935 page 198)
        

        "[...] for there is no other method [the
        fiducial method] ordinarily available for making correct statements of
        probability about the real world."
        (Fisher, 1990/1935, pages 198-199)
        
        "Depuis 1973, l'Analyse des Comparaisons a intégré les techniques bayésiennes,
        classiques et contemporaines (Jeffreys, Lindley, etc.), mais en les utilisant avec une motivation
        fiduciaire (Fisher). Ces techniques nous paraissent en effet les mieux adaptées
        pour pallier les insuffisances [...] des tests de signification
        traditionnels." (Lecoutre, Rouanet & Denhière, 1988, page 384)
        
        "It seems reasonable to postulate that the no-knowledge 
a priori
        distribution in classical inverse probability theory should be that distribution which, when
        experimental data capable of yielding a fiducial argument are now given,
        results in an 
a posteriori distribution identical with the corresponding fiducial
        distribution." (Rozeboom, 1960, page 229)
        

        "The
        fiducial philosophy of inference is an alternative to, and compensates for, the
        deficiencies of the other two procedures of inference [Bayesian inference,
        frequentist inference]. It is unfortunate that its importance has been unduly
        overlooked." (Wang, 2000, page 105)
        
                
                 
        
        
        Frequentist (orthodox, classical) inference
        
        
        "The classical design of clinical trials is dictated by the eventual analysis. If the design varies from that planned then
        classical analysis is impossible." (Berry, 1987, page 181)
        

        "The steamroller of frequentism is not slowed by words." (Berry, 1993)
        

        "In contrast to the logical development and
        intuitive interpretations of the Bayesian approach, frequentist methods are
        nearly impossible to understand, even for the best students." (Berry, 1997)
        

        "Students in frequentist courses may learn very well how to calculate confidence intervals and P values, but they cannot
        give them correct interpretations. I stopped teaching frequentist methods when
        I decided that they could not be learned." (Berry, 1997)
         

        "All probability statements in the frequentist approach are about possible data that could
        have been observed, but were not. These statements aren't of much scientific use." (Bolstad, 2005°)
        

        "In attempts to teach the 'correct' interpretation of frequentist
        procedures, we are fighting a losing battle." (Freeman, 1993, page 1446)
        

        "Maybe we should banish our use of the world 
probability  and substitute
        
how often, instead, if we stay with the frequentist approach. Then, perhaps we can stay frequentists and still be
        honest with ourselves." (Iversen, 2000, page 9)
        

        "Some say that Bayesianism has feet of clay (the need to specify a prior); but at
        least its feet are out in the open for everyone to see and criticise. By
        contrast, frequentist statistics has no clothes, for it calculates an
        irrelevant number and pretends that this tells us something important about the
        hypotheses we are interested in." (Jefferys, 1995, page 122)
        

        "Classical statistics was invented to make
        statistical inference 'objective.' In fact, classical statistics is no more
        objective than Bayesian statistics, but by hiding its subjectivity it gives the
        illusion of objectivity." Jefferys, 1992)
        

        "Interestingly, the sampling distribution that orthodox theory does allow us to use is noting
        more than a way of describing our prior knowledge bout the 'noise'
        (measurement errors). This, orthodox thinking is in the curious position of
        holding it decent to use prior information about nosie, but indecent to use
        prior information about the 'signal' of interest." (Jaynes, 1985, page 30)
        
                
                 
        
        
        Misinterpretations of NHST and confidence intervals
        
        
        "[the confidence level] a measure of the
        confidence we have that the interval does indeed contain the parameter of interest" (Aczel, 1995, page  205)
        

        "[a significant result] indicates that the chances of the finding being random is only 5 percent or less." (Azar, 1999)
        

        "The psychological literature is filled with misinterpretations of the nature of the tests of significance."
        (Bakan, 1967, 
in  Morrison & Henkel, 1970, page 239)
        

        "In addition to the overall interpretative bias there was a very strong interaction between the training and the transfer
        problems [chi
2(1)=14.71, 
p<0.001]." (Bassock 
et al., 1995)
        

        "Subject's performance was not affected by differences in the size of the assigned and the receiving sets
        [chi
2(1)=0.08, n.s.], so we combined the results of subjects [...]" (Bassock 
etal., 1995)
        

        "An alternative approach to estimation is to extend the concept of error bound to
        produce an interval of values that is likely to contain the true value of the
        parameter." (Bhattacharyya & Johnson, 1977, page 243)
        

        "Many instructors err in describing confidence intervals and even some texts err. But whether texts or instructors
        err in explaining them, students do not understand them. And they carry this
        misunderstanding with them into later life. Calculating a confidence interval
        is easy. But everyone except the cognoscenti believes that when one calculates
        95% confidence limits of 2.6 and 7.9 say, the probability is 95% that the
        parameter in question lies in the interval from 2.6 to 7.9." (Berry, 1997)
        

        "P values are nearly as obscure as confidence intervals." (Berry, 1997)
        

        "Students in frequentist courses may learn very well how to calculate confidence intervals and P values,
        but they cannot give them correct interpretations. I stopped teaching frequentist methods when
        I decided that they could not be learned." (Berry, 1997)
        

        "Inevitably, students (and essentially everyone else) give an inverse or Bayesian twist to frequentist
        measures such as confidence intervals and P values." (Berry, 1997)
        

        "[...] when a statistician rejects the null
        hypothesis at a certain level of confidence, say .05, he may be fairly well
        assured (
        
p
         = .95) that the
        alternative statistical hypothesis is correct."
        (Bolles, 1962, page 639)
        

        "Ask your colleagues how they perceive the statement '95% confidence level lower bound
        of 77.5 GeV/c2 is obtained for the mass of the Standard Model Higgs boson'. I
        conducted an extensive poll in July 1998, personally and by electronic mail.
        The result is that for the large majority of people the above statement means
        that 'assuming the Higgs boson exists, we are 95% confident that the Higgs mass
        is above that limit, i.e. the Higgs boson has 95% chance (or probability) of
        being on the upper side, and 5% chance of being on the lower side', which is
        not what the operational definition of that limit implies." (D'Agostini, 2000)
        

        "[...] we assert that the population mean probably falls within the interval that we have established."
        (Elifson, Runyon & Haber, 1990, page 367)
        
        "Comme nous l'avons dit, on a avantage à rechercher si une transformation de l'échelle des
        x peut conduire à un schéma linéaire, c'est-à-dire à un F2
        non significatif." (Faverge, 1975, tome 2, page 268)
        
        "Further, two additional 2×2 chi-square tests found class status (graduate vs.
        undergraduate) to be independent of whether students appear to hold misconceptions
        (chi2 = 3.5, 
df= l, 
p>.05) and whether students passed the test
        (chi2 = 3.02, 
df=l, 
p>.05)." (Hirsch &O'Donnell, 2001,
        page 10)
        

        "I see these answers [about confidence intervals: '95% of the intervals would fall between the two values of the
        parameter', '95% chance that the actual value will be contained within the
        confidence interval' [...] as cries in the wilderness about how the world view we try to
        construct for our customers is not a world view our customers are comfortable with." (Iversen, 200, page 8)
        

        "Dobyns, Jahn, and others [...] publish the 
observed p- value, calling this
        'the probability of obtaining this result by chance.' Such a use of 
p-values is illegitimate and
        not condoned by standard statistical theory." (Jefferys, 1995, page 595)
        

        "A random sample can be used to specify a segment or interval on the number line such that the parameter has a high
        probability of lying on the segment. The segment is called a confidence interval." (Kirk, 1982, page 42)
        

        "We can be 95% confident that the
        population mean is between 114.06 and 119.94."
        (Kirk, 1982, page 43)
        
        "La consultation des tables permet simplement de dire que l'on ne peut
        pas refuser l'hypothèse posée au début. Il est vrai que, dans la pratique, beaucoup diront, et cela par un
        abus de langage strict, que les 3 groupes ne présentent pas de différence
        significative entre eux, qu'ils appartiennent à la même population.
        L'interprétation correcte est bien : 'on ne peut pas refuser l'hypothèse
        posée au départ'." (Mialaret, 1996, page 127) "La valeur 0 étant
        comprise dans l'intervalle de confiance on ne peut pas refuser l'hypothèse
        nulle selon laquelle les deux séries de valeurs ont la même moyenne. On dira,
        en d'autres termes, que l'ensemencement n'a pas eu d'effet sur la prise des
        pêcheurs." (Mialaret, 1996, page 112)
        
        "In these conditions [a 
p-value of 1/15], the odds of 14 to 1 that this loss
        was caused by seeding [of clouds] do not appear negligible to us." (Neyman 
et al., 1969)
        

        "[...] 95 [chances] out of 100 that the observed difference will hold up in future investigations."
        (Nunnally, 1975, page 195; quoted by Carver, 1978)
        

        "[95%
        CI] an interval such that the probability is 0.95 that the interval contains
        the population value."
        (Pagano, 1990, page 288)
        
        "En fonction de la valeur
        du Khi-deux et du nombre de degrés de liberté, le logiciel calcule la
        probabilité exacte. Si l'on se donne un seuil de 5% de risques, une probabilité
        inférieure à ce seuil signifie que l'erreur d'échantillonnage est faible, on
        suppose qu'il existe une dépendance entre les 2 variables ligne et colonne. Le
        hasard intervient seulement dans moins de 5 chances sur 100, dans la
        répartition observée des effectifs dans le tableau. Le hasard, l'erreur
        d'échantillonnage sont considérés comme négligeables. L'hypothèse
        d'indépendance est rejetée." (QUESTION, Grimmer logiciels, 1993)
        
        "A confidence interval is an assertion that an unknown parameter lies in a computed range,
        with a specified probability." (Rinaman, Heil, Strauss, Mascagni & Sousa, 1996, page 608)
        
        "Par exemple, si dans un sondage de taille 1000, on trouve [fréquence] = 0,613,
        la proportion pi à estimer a une probabilité 0,95 de se trouver dans la fourchette: [...]
        [0,58; 0,64]." (Robert Cl., 1995, pages 221-222)
        
        "La majorité des chercheurs en psychologie ont recours à une épreuve de
        signification statistique pour décider si les résultats obtenus
        confirment ou infirment leur hypothèse. Cette épreuve permet d'établir quelle
        est la probabilité d'obtenir de tels résultats plutôt que ceux correspondant à
        l'hypothèse nulle, soit un postulat statistique attribuant les variations
        comportementales à des erreurs d'échantillonnage et de mesure, ainsi qu'au
        hasard." (Robert M., 1995, page 66)
        
        "In summary, the probability [of the effect] was established for several samples of psychologists. For one
        
N of 20, 
p=.88; for one 
N of 19, 
p=.996; for a smaller 
N of 2,
        
p='1.00' and for another 
N of 2, 
p='0.00'." (Rosenthal & Gaito, 1964)
        

        "From the table the probability is 0.9985 [1-p] or the odds are 666 to 1 that [soporific] 2 is the better
        soporific." (Student, 1908, page 21)
        

        "The fact that statistical experts and investigators publishing in the best journals cannot consistently interpret the
        results of these analyses is extremely disturbing. Seventy-two years of
        education have resulted in minuscule, if any, progress toward correcting this
        situation. It is difficult to estimate the handicap that widespread, incorrect,
        and intractable use of a primary data analytic method has on a scientific
        discipline, but the deleterious effects are doubtless substantial..." (Tryon, 1998, page 
        796)
        

        "Most psychologists and other users of statistics believe that this minimum significance level is the 'probability
        that the results are due to chance' and many applied statistics texts support this belief." (Wilson, 1961, page 230)
        
                
                 
        
        Multiple comparison test procedures
        
        
        "One of the attractions of the Bayesian approach is that there is no need to introduce a penalty term for performing
        thousands of simultaneous tests; Bayesian testing has a built-in penalty or Ockham's razor effect'."
        (Scott & Berger, 2003)
        

        "I have failed to find a single instance in
        which the Duncan test was helpful, and I doubt whether any of the alternative
        tests [multiple range significance tests] would please me better."
        (Finney)
        

        "The blind need frequent warnings and help
        in avoiding the multiple comparison test procedures that some editors demand
        but that to me appear completely devoid of practical utility." (Finney)
        

        "Multiple comparison methods have no place at all in the interpretation of data." (Nelder)
        

        "The ritualistic use of multiple-range tests-often
        when the null hypothesis is a priori untenable [...] is a disease." (Preece)
        
        
                
        
        
        Null hypothesis
        
        
        "There is really no good reason to expect
        the null hypothesis to be true in any population [...] Why should any
        correlation coefficient be exactly .00 in the population? [...] why should
        different drugs have exactly the same effect on any population parameter."
        (Bakan, 1967, 
in  Morrison & Henkel, 1970)
        

        "[this] is patently absurd and not in fact what scientists do. They do not test the same hypothesis over and over
        again." (Camilleri, 1962)
        

        "Statistical significance testing sets up a straw man, the null hypothesis, and tries to knock him down."
        (Carver, 1978, page 381)
        

        "The research worker has been oversold on hypothesis
        testing. Just as no two peas in a pod are identical, no two treatment means
        will be exactly equal. [...] It seems ridiculous [...] to test a hypothesis
        that we a priori know is almost certain to be false." (Chew, 1976)
        

        "In many experiments it seems obvious that
        the different treatments must have produced some difference, however small, in
        effect. Thus the hypothesis that there is no difference is unrealistic: the
        real problem is to obtain estimates of the sizes of the differences."
        (Cochran & Cox, 1957)
        

        "Exact truth of a null hypothesis is very unlikely except in a genuine uniformity trial." (Cox)
        

        "In typical applications, one of the hypotheses-the null hypothesis-is known by all concerned to be false from the
        outset." (Edwards, Lindman & Savage, 1963)
        

        "A null hypothesis that yields under two
        different treatments have identical expectations is scarcely very plausible,
        and its rejection by a significance test is more dependent upon the size of an
        experiment than upon its untruth." (Finney)
        

        "Is it ever worth basing analysis and
        interpretation of an experiment on the inherently implausible null hypothesis
        that two (or more) recognizably distinct cultivars have identical yield capacities?" (Finney)
        

        "[...] it would therefore, add greatly to
        the clarity with which the tests of significance are regarded if it were
        generally understood that tests of significance, when used accurately, are
        capable of rejecting or invalidating hypotheses, in so far as they are
        contradicted by the data: but that they are never capable of establishing them
        as certainly true." (Fisher, 1929, page 192)
        

        "[...] and it should be noted that the null
        hypothesis is never proved or established, but is possibly disproved, in the
        course of experimentation. Every experiment may be said to exist in order to
        give the facts a chance of disproving the null hypothesis."
        (Fisher, 1990/1935, page 16)
        

        "It is evident that the null hypothesis
        must be exact, that is free from vagueness and ambiguity, because it must
        supply the basis of the 'problem of distribution', of which the test of
        significance is the solution." (Fisher, 1990/1935, page 16)
        

        "We are typically not terribly concerned with Type 1 error because we
        rarely believe that it is possible for the null hypothesis to be strictly true."
        (Gelman, Hill & Yajima, 2009)        
        

        "A large enough sample will usually lead to
        the rejection of almost any null hypothesis [...] Why bother to carry out a
        statistical experiment to test a null hypothesis if it is known in advance that
        the hypothesis cannot be exactly true." (Good, 1983)
        

        "The commonest agricultural experiments
        [...] are fertilizer and variety trials. In neither of these is there any
        question of the population treatment means being identical [...] the objective
        is to measure how big the differences are." (Healy)
        

        "When we formulate the hypothesis that the
        sex ratio is the same in two populations, we do not really believe that it
        could be exactly the same." (Hodges & Lehmann, 1954)
        

        "All populations are different, a priori." (Jones & Matloff, 1986)
        

        "Because point hypotheses, while
        mathematically convenient, are never fulfilled in practice, convert them to
        limiting cases of interval hypotheses." (Jones & Tukey, 2000)
        

        "Welcome to the 
Journal of Articles in Support of the Null Hypothesis.
        In the past other journals and reviewers have exhibited a bias against articles
        that did not reject the null hypothesis. We seek to change that by offering an outlet
        for experiments that do not reach the traditional significance levels (
p < .05).
        Thus, reducing the file drawer problem, and reducing the bias in psychological literature.
        Without such a resource researchers could be wasting their time examining empirical questions
        that have already been examined. We collect these articles and provide them to
        the scientific community free of cost."
        (Journal of Articles in Support of the Null Hypothesis, 2002)
        

        "No one, I think, really believes in the possibility of sharp null hypotheses
        that two means are absolutely equal in noisy sciences." (Kempthorne)
        

        "It is ironic that a ritualistic adherence
        to null hypothesis significance testing has led researchers to focus on
        controlling the Type I error that cannot occur because all null hypotheses are
        false." (Kirk, 1996, page  747)
        

        "Another criticism of standard significance
        tests is that in most applications it is known beforehand that the null
        hypothesis cannot be exactly true." (Kruskal)
        

        "[...] rejecting a typical null hypothesis is like rejecting the proposition that
        the moon is made of green cheese." (Loftus, 1996)
        

        "Unless one of the variables is wholly
        unreliable so that the values obtained are strictly random, it would be foolish
        to suppose that the correlation between any two variables is identically equal
        to 0.0000 [...] (or that the effect of some treatment or the difference between
        two groups is exactly zero)."
        (Lykken, 1968,
        
in 
        Morrison & Henkel,
        1970)
        

        "The test is asking whether a certain
        condition holds exactly, and this exactness is almost never of scientific
        interest." (Matloff, 1991)
        

        "With regard to a goodness-of-fit test to
        answer whether certain ratios have given exact values, 'we know a priori this
        is not true; no model can completely capture all possible genetical
        mechanisms'." (Matloff, 1991)
        

        "The number of stars by itself is relevant only to the question of whether H
0
        is exactly true-a question which is almost always not of interest to us, especially because we usually know a
        priori that H
0 cannot be exactly true." (Matloff, 1991)
        

        "We usually know in advance of testing that the null hypothesis is false." (Morrison & Henkel, 1969,
        
in 
        Morrison & Henkel, 1970)
        

        "The null-hypothesis models [...] share a
        crippling flaw: in the real world the null hypothesis is almost never true, and
        it is usually nonsensical to perform an experiment with the sole aim of
        rejecting the null hypothesis." (Nunnally, 1960)
        

        "If rejection of the null hypothesis were
        the real intention in psychological experiments, there usually would be no need
        to gather data." (Nunnally, 1960)
        

        "The mere rejection of a null hypothesis provides only meager information." (Nunnally, 1960)
        

        "And when, as so often, the test is of a
        hypothesis known to be false [...] the relevance of the conventional testing
        approach remains to be explicated." (Pratt, 1976)
        

        "Null hypotheses of no difference are
        usually known to be false before the data are collected [...] when they are, their
        rejection or acceptance simply reflects the size of the sample and the power of
        the test, and is not a contribution to science." (Savage, 1957)
        

        "One feature [...] which requires much more
        justification than is usually given, is the setting up of unplausible null
        hypotheses. For example, a statistician may set out a test to see whether two
        drugs have exactly the same effect, or whether a regression line is exactly
        straight. These hypotheses can scarcely be taken literally." (Smith, 1960)
        

         "Most researchers mindlessly test only nulls of no difference or of no relationship because most statistical packages
        only test such hypotheses." (Thompson, 1998, page 799)
        

        "It is foolish to ask 'Are the effects of A
        and B different?' They are always different - for some decimal place."
        (Tukey, 1991, page 100)
        

        "The worst, i.e., most dangerous, feature
        of 'accepting the null hypothesis' is the giving up of explicit uncertainty [...]
        Mathematics can sometimes be put in such black-and-white terms, but our
        knowledge or belief about the external world never can." (Tukey, 1991)
        

        "Never use the unfortunate expression
        'accept the null hypothesis'." (Wilkinson and Task Force on Statistical Inference, 1999)
        

        "In many experiments [...] it is known that the null hypothesis customarily tested, i.e. that the treatments produce no
        effects, is certainly untrue [...]." (Yates, 1964, page 320)
        

        "The occasions [...] in which quantitative data are collected solely with the object of proving or disproving a given
        hypothesis are relatively rare." (Yates)
        
                
                 
        
        One-sided vs two-sided tests
        
        
        "I regard the one-sided vs. two-sided
        
p
        value debate to be silly."
        (Berry, 1991, page 86)
        

        "While the popularity of one-tailed tests
        is undoubtedly attributable in part to the overwillingness of psychologists as
        a group to make use of the statistical recommendations they have most recently read, [...]" (Burke, 1953)
        
                
                
        
        Power
        
        
        "In either case it is inappropriate for
        authors to claim that there is no effect if the results of a test yield a
        nonsignificant 
P-value and low power." (Cherry, 1998, page 949
        

        "[...] the question of the probability that
        his investigation would lead to statistically significant results, i.e., its power?" (Cohen, 1969, page vii)
        

        "A salutary effect of power analysis is
        that it draws one forcibly to consider the magnitude of effects. In psychology,
        and especially in soft psychology, under the sway of the Fisherian scheme,
        there has been little consciousness of how big things are. [...] Because
        science is inevitably about magnitudes, it is not surprising how frequently
        
p values are treated as surrogates for
        effect sizes. [...] In retrospect, it seems to me simultaneously quite
        understandable yet also ridiculous to try to develop theories about human
        behavior with 
p values from Fisherian
        hypothesis testing and no more than a primitive sense of effect size. And I
        wish I were talking about the long, long ago." (Cohen, 1990, page 1309)
        

        "Confidence intervals should play an
        important role when setting sample size, and power should play no role once the
        data have been collected. [...] In this commentary, we present the reasons why
        the calculation of power after a study is over is inappropriate and how
        confidence intervals can be used during both study design and study
        interpretation." (Goodman & Berlin, 1994, page 200)
        

        "For interpretation of observed results,
        the concept of power has no place, and confidence intervals, likelihood, or
        Bayesian methods should be used instead." (Goodman & Berlin, 1994, page 205)
        

        "We would not entirely rule out the use of power-type concepts in data analysis,
        but their application is extremely limited." (Hoenig & Heisey, 2001, page 23)
        

        "Power calculations tell us how well we might be able to characterize nature in the future given a particular state and
        statistical study design, but they cannot use information in the data to tell us about the likely states of nature."
         (Hoenig & Heisey, 2001, page 23)
        
                
                 
        
        Predictive probabilities
        
        
        "An essential aspect of the process of
        evaluating design strategies is the ability to calculate predictive
        probabilities of potential results." (Berry, 1991, page 81)
        
                        
                                       
        
        p-values
        
        
        "A Neyman-Pearson error probability, alpha,
        has the actual frequentist interpretation that a long series of alpha
        level tests
        will reject no more than 100alpha% of true
        
H
        0, but the data-dependent
        P-values have no such interpretation. P-values do not even fitt easily into any
        of the conditional frequentist paradigms." (Berger & Delampady, 1987)
        

        "
P-values calculated assuming fixed sample sizes may be reasonable as measures of
        extremity, to answer the question 'How unusual are the data if H
0 is
        true?', but one should not take them too seriously." (Berry, 1985, page 525)
        

        "P values are nearly as obscure as confidence intervals." (Berry, 1997)
        

        "It is very bad practice to summarise an important investigation solely by a value of P." (Cox, 1982, page 327)
        

        "p-values can be used to spot a possible
        problem, but certainly not to draw scientific conclusions or to take decisions."
        (D'Agostini, 2000, page 18)
        

        "Although
        
p-values are not the most direct index of this information [about
        the strength of evidence], they provide a reasonable surrogate within the
        constraints posed by the mechanics of traditional hypothesis testing." (Dixon, 1998, page 391)
        

        "The actual value of 
p [...] indicates the strength of the evidence against the
        hypothesis." (Fisher, 1990/1925, page 80)
        

        "The current widespread practice of using 
p-values as the main means of assessing and
        reporting the results of clinical trials cannot be defended." (Freeman, 1993, page 1443)
        

        "Even statisticians seem to have very little idea of how the interpretation of
        
p-values should depend on sample size." (Freeman, 1993)
        

        "[...] 
whatever assumptions one makes,the observed p-value is not a valid estimate
         of the probability that the nullhypothesis is true, and in fact, it always underestimates this probability
        by a large factor." (Jefferys, 1995, page 597)
        

        "P-values, as provided by orthodox statistical methods, can be and often are misunderstood even by those who use
        them every day. Data-dependent P-values contain subtle traps that makes their
        interpretation hazardous." (Jefferys, 1992)
        

        "If P is small, that means that there have been
        unexpectedly large departures from prediction. But why should these be stated
        in terms of P? The latter gives the probability of departures, measured in a
        particular way, equal to 
 or greater than the observed set, and the contribution
        from the actual value is nearly always negligible.
        
What the use of P implies, therefore, is that a hypothesis that may be true may be rejected
        because it has not predicted observable results that have not occurred. 
        This seems a remarkable procedure." (Jeffreys, 1961, page 385, italics added)
        

        "It is a travesty to describe a 
p value [...] as 'simple, objective and
        easily interpreted' [...] To use it as a measure of closeness between model and
        data is to invite confusion." (Healy)
        

        "Editors must be bold enough to take responsibility for deciding which studies are good and which are not, without
        resorting to letting the 
p value of the significance tests determine this decision." (Lykken, 1968)
        

        "[...] 
p values may be preferred to confidence intervals precisely because
        
p values, in some ways, present less information than confidence intervals."
        (Reichardt & Gollob, 1997, page 275)
        

        "There is no statistical sense to significance levels." (Rubin, 1969)
        

        "Need we - should we - stick to
        
p=0.05 if what we seek is a
        relatively pure list of appearances? No matter where our cutoff comes, we will
        not be sure of all appearances. Might it not be better to adjust the critical
        
p moderately - say to 0.03 or 0.07 - whenever such a less standard value seems to offer a greater fraction of
        presumably real appearances among those significant at the critical 
p? We would then use different
        modifications for different sets of data. No one, to my knowledge, has set
        himself the twin problems of how to do this and how well doing this in a
        specific way performs." (Tukey, 1969, page 85)
        
                
                               
        
                                                             Replication of experiments
        
        
        "[...] smaller samples produce statistics
        more frequently which deviate widely from parameter than do large samples. Thus
        the large differences in a small sample must always be replicated in large
        samples to assess substantive importance." (Gold, 1958, 
in  Morrison & Henkel, 1970, page 108)
        

        "The essence of science is replication: a scientist should always be concerned about what would happen if he or another
        scientist were to repeat his experiment." (Guttman, 1983)
        
                
                 
        
        Scientific/Experimental research
        
        
        "We find that null hypothesis testing is
        uninformative when no estimates of means or effect size and their precision are
        given. Contrary to common dogma, tests of statistical null hypotheses have
        relatively little utility in science and are not a fundamental aspect of the
        scientific method." (Anderson, Burnham & Thompson, 2000,
        page 912)
        

        "Nor do you find experimentalists typically
        engaged in disproving things. They are looking for appropriate evidence for
        affirmative conclusions. Even if the mediate purpose is the disestablishment of
        some current idea, the immediate objective of a working scientist is likely to
        be gain affirmative evidence in favor of something that will refute the
        allegation which is under attack." (Berkson, 1942)
        

        "However, statistics is not merely a set of methods for analyzing data. It is also a way for integrating data into the
        scientific process." (Berry, 1995, preface)
        

        "There is no doubt that most scientists would disavow knowledge of
        Bayesian methods. But these same scientists think and reason like Bayesians,
        whether or not they know Bayes' theorem. Namely, they update what they think on
        the basis of the results of experiments." (Berry, 1997)
        

        "The resultant magnification of the
        importance of formal hypothesis tests has inadvertently led to underestimation
        by scientists of the area in which statistical methods can be of value and to a
        wide misunderstanding of their purpose." (Box, 1976)
        

        "In the past, the need for probabilities
        expressing prior belief has often been thought of, not as a necessity for all
        scientific inference, but rather as a feature peculiar to Bayesian inference.
        This seems to come from the curious idea that an outright assumption does not
        count as a prior belief [...] I believe that it is impossible logically to
        distinguish between model assumptions and the prior distribution of the parameters." (Box, 1980)
        

        "In problem of scientific inference we
        would usually, were it possible, like the data to 'speak by themselves'."
        (Box & Tiao, 1973, page 2)
        

        "But in psychology, like it or not, NHSTP
        [Null Hypothesis Significance Test Procedure] is the principal tool for testing
        substantive hypotheses in theory-corroborating studies. and in that capacity it
        is not only inadequate, but may be destructive to psychology as a scientific
        discipline." (Dar, 1998, page 196)
        

        "The attempts that have been made to
        explain the cogency of tests of significance in scientific research, by
        reference to supposed frequencies of possible statements, based on them, being
        right or wrong, thus seems to miss the essential nature of such tests. [...]
        However, the calculation is absurdly academic, for in fact no scientific worker
        has a fixed level of significance at which from year to year, and in all
        circumstances, he rejects hypotheses; he rather gives his mind to each
        particular case in the light of his evidence and his ideas." (Fisher, 1990/1956, pages 44-45)
        

        "I shall emphasize some of the various ways
        in which this operation [acceptance procedures] differs from that by which improved
        theoretical knowledge is sought in experimental research. This emphasis is
        primilary necessary because the needs and purposes of workers in the
        experimental sciences have been so badly misunderstood and
        misrepresented." (Fisher 1990/1956, pages 81-82)
        

        "None of the meaningful questions in drawing conclusions from research results - such as how probable are the
        hypotheses? how reliable are the results? what is the size and impact of the
        effect that was found? - is answered by the test." (Falk & Greenbaum, 1995, page 94)
        

        "The single most important problem with
        null hypothesis testing provides researcher with no incentive to specify either
        their own research hypotheses [...] Testing an unspecified hypothesis against
        chance may be all we can do in situations where we know very little. But when
        used as a general ritual, this method ironically ensures that we continue to
        know very little." (Gigerenzer, 1998, page 200)
        

        "The subjectivist states his judgements,
        whereas the objectivist sweeps them under the carpet by calling assumptions
        knowledge, and he basks in the glorious objectivity of science." (Good, 1973)
        

        "However, conventions about significant
        result should not be turned into canons of good scientific practice. Even more
        emphatically, a convention must not be made a superstition. [...] It is a grave
        error to evaluate the 'goodness' of an experiment only in terms of the
        significance level of its results." (Hays, 1973, page 385)
        

        "Based on my own experience, most good research psychologists consult only occasionally
        with statistical experts. Thus [...] experts [...] more often see poor practice
        [...]. Such situations [...] offer the statistician little insight concerning
        the effective roles of statistical methods in good scientific work."
        (Krantz, 1999, page 1375)
        

        "Psychology
        will be a much better science when we change the way we analyze data."
        (Loftus, 1996)
        

        "Problems stemming from the fact that
        hypothesis tests do not address questions of scientific interest."
        (Matloff, 1991)
        

        "The test provides neither the necessary nor the sufficient scope or type of knowledge
        that basic scientific social research requires." (Morrison & Henkel, 1969,
        
in  Morrison &Henkel, 1970, page 198)
        

        "The use of significance tests involves the researcher in the process of making firm
        'reject' or 'accept' decisions on each test of each null hypothesis on the
        basis of a formal, firm, and frequently arbitrary criterion, the significance
        level. This 
decision making process  is antithetical to the 
information accumulation process
        of scientific inference." (Morrison & Henkel, 1970, page 309)
        

        "Why do editors think that 
P-value dominated analysis constitutes a scientific procedure?"
        (Nelder, 1996, quoted by Cherry, 1998, page 947)
        

        "Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our
        behaviour with regard to them, in following which we insure that, in the long
        run of experience, we shall not be too often wrong." (Neyman & Pearson, 1933a, page 291)
        

        "We have suggested that a statistical test may be regarded as a rule of behaviour to be applied repeatedly in our
        experience when faced with the same set of alternative hypotheses."
        (Neyman & Pearson, 1933b, page 509)
        

        "[...] Berger and Hsu (1996, page 192)
        make the following statement: "We believe that notions of size, power, and
        unbiasedness are more fundamental than 'intuition'..." In our opinion,
        such a statement places the credibility of statistical science at serious risk
        within the scientific community. If we are indeed teaching our students to
        disregard intuition in scientific inquiry, then a fundamental reassessment of
        the mission of mathematical statistics is urgently needed." (Perlman & Wu, 1999, page 366)
        [Berger's reply: "If we are indeed teaching our students to disregard intuition in scientific inquiry, then a fundamental
        reassessment of the mission of mathematical statistics is urgently needed." (page 373)
        

        "We hope that we have alerted statisticians
        to the dangers inherent in uncritical application of the NP [Neyman &
        Pearson] criterion, and, more generally, convinced them to join Fisher, Cox and
        many others in carefully weighing the scientific relevance and logical
        consistency of any mathematical criterion proposed for statistical
        theory." (Perlman & Wu, 1999, page 381)
        

        "While several serious objections to the method [the null hypothesis significance-test method] are raised, its most
        basic error lies in mistaking the aim of a scientific investigation to be a
        
decision, rather than a 
cognitive evaluation of propositions." (Rozeboom, 1960, 
in 
        Morrison & Henkel, 1970, page 230)
        

        "The null-hypothesis significance test treats 'acceptance' or 'rejection' of a hypothesis as though these were
        decisions one makes. 
But the primary aim of a scientific experiment is not to precipitate decisions,
        but to make an appropriate adjustment in the degree to which one accepts, or believes, the hypothesis or hypotheses
        being tested." (Rozeboom, 1960, 
in  Morrison & Henkel, 1970, page 221)
        

        "It is the claim of this paper that, rather
        than being the 'only correct way to analyse a clinical trial', this paradigm
        [the 'intent to treat' paradigm] is a warning that we should heed Fisher's
        original observation that the N-P [Neyman-Pearson] formulation is irrelevant to
        scientific research." (Salsburg, 1994, page 334)
        

        "Statistical significance testing retards the growth of scientific knowledge; it never makes
        a positive contribution. After decades of unsuccessful efforts, it now appears possible
        that reform of data analysis procedures will finally succeed. If so, a major
        impediment to the advance of scientific knowledge will have been removed."
        (Schmidt & Hunter, in Harlow, Mulaik & Steiger, 1997, chap. 3, page 37)
        

        [Statistical significance tests] "provide a façade of scientism
        in research. For many in educational research, being quantitative is equated
        with being scientific...despite the fact that some scientists and many
        psychologists [...] have managed very well without inferential statistics. (Shaver,
        1992, page 2)
        

        "A common problem for statistical inference
        is to determine, in terms of probability, whether observed differences between
        two samples signify that the populations sampled are themselves really different." (Siegel, 1956, page 2)
        

        "Establishing that a correlation exists
        between two variables may be the ultimate aim of a research, [...]."
        "It is, of course, of some interest to be able to state the degree of
        association between two sets of scores from a given group of subjects. But it
        is perhaps of greater interest to be able to say whether or not some observed
        association in a sample of scores indicates that the variables under study are
        most probably associated in the population from which the sample was drawn." (Siegel, 1956, page 195)
        

        "Science becomes an automated, blind research for mindless tabular asterisks using thoughtless hypotheses."
        (Thompson, 1998, page 799)
        

        "The tyranny of the N-P [Neyman-Pearson] theory in many branches of empirical science is detrimental, not advantageous,
        to the course of science." (Wang, 1993)
        

        "In many experiments [...] it is known that
        the null hypothesis customarily tested, i.e. that the treatments produce no
        effects, is certainly untrue; such experiments are in fact undertaken with the
        different purpose of assessing the magnitude of the effects. Fisher himself was
        of course well aware of this, as is evinced in many of his own analyses of
        experimental data, but he did not, I think, sufficiently emphasise the point in
        his expository writings. Scientists were thus encouraged to expect final and
        definite answers from their experiments in situations in which only slow and
        careful accumulation of information could be hoped for. And some of them,
        indeed, came to regard the achievement of a significant result as an end in itself." (Yates, 1964, page 320)
        

        "The emphasis given to formal tests of significance [...] has resulted in [...] an undue concentration of effort by
        mathematical statisticians on investigations of tests of significance
        applicable to problems which are of little or no practical importance [...] and
        [...] it has caused scientific research workers to pay undue attention to the
        results of the tests of significance [...] and too little to the estimates of
        the magnitude of the effects they are investigating." (Yates)
        

        "The unfortunate consequence that scientific workers have often regarded the execution of a test of significance
        on an experiment as the ultimate objective." (Yates)
        
                
                 
        
        Statistical significance/nonsignificance
        
        
        "Typically, mere difference from zero is totally uninteresting." (Abelson, 1997, page 121)
        

        "Rather than ask if these differences are statistically significant, it seems more important to ask if they are of
        educational importance." (Chatfield, 1985)
        

        "1. A significant effect is not necessarily the same thing as an interesting effect;
        2. A non-significant effect is not necessarily the same thing as no difference." (Chatfield, 1988, page 51)
        

        "Researchers and journal editors as a whole
        tend to (over)rely on 'significant differences' as the definition of meaningful
        research." (Craig, Eison & Metze, 1976, page 282)
        

        "However, conventions about significant
        result should not be turned into canons of good scientific practice. Even more
        emphatically, a convention must not be made a superstition. [...] It is a grave
        error to evaluate the 'goodness' of an experiment only in terms of the
        significance level of its results." (Hays, 1973, page 385)
        

        "Acceptability of a statistically
        significant result [...] promotes a high output of publication. Hence the
        argument that the techniques work has a tempting appeal to young biologists, if
        harassed by their seniors to produce results, or if admonished by editors to
        conform to a prescribed ritual of analysis before publication. [...] the plea
        for justification by works [...] is therefore likely to fall on deaf ears,
        unless we reinstate reflective thinking in the university curriculum."
        (Hogben, 1957)
        

        "I believe we can already detect signs of
        such deterioration in the growing volume of published papers  - especially
        in the domain of animal behaviour - recording so-called significant
        conclusions which an earlier vintage would have regarded merely as private
        clues for further exploration." (Hogben, 1957,
        
in 
        Morrison &
        Henkel, 1970, page 21)
        

        "We can already detect signs of such
        deterioration in the growing volume of published papers [...] recording
        so-called significant conclusions which an earlier vintage would have regarded
        merely as private clues for further exploration." (Hogben, 1957)
        

        "To use statistics adequately, one must
        understand the principles involved and be able to judge whether obtained
        results are statistically significant
        
and 
        whether they are meaningful in
        the particular research context." (Kerlinger, 1989, pages 318-319)
        

        "Significance should stand for meaning and
        refer to substantive matter. [...] I would recommend that statisticians discard
        the phrase 'test of significance'." (Kish, 1957,
        
in 
        Morrison & Henkel, 1970)
        

        "Statistical significance of a sample bears
        no necessary relationship to possible subject-matter significance."
        (Kruskal)
        

        "It is easy to [...] throw out an
        interesting baby with the nonsignificant bath water. Lack of statistical
        significance at a conventional level does not mean that no real effect is
        present; it means only that no real effect is clearly seen from the data. That
        is why it is of the highest importance to look at power and to compute
        confidence intervals." (Kruskal)
        

        "We are also concerned about the use of
        statistical significance - P values - to measure importance; this is like the old
        confusion of substantive with statistical significance." (Kruskal &
        Majors, 1989)
        

        "Statistical significance (alpha and 
p values) and practical significance
        (effect sizes) are not 
competing  concepts.
        They are 
complementary ones." (Levin, 1993, page 379)
        

        "It is important to ask whether we really
        want to test the existence or nonexistence of a relation. Suppose a relation is
        extremely weak: Is such a relation of interest? Probably not, in most cases;
        yet a large enough sample would find such a relation to be significantly
        different from chance. On the other hand, an extremely strong relationship
        would be found not significantly different from chance if the sample were very
        small. (Lipset, Trow & Coleman, 1956,
        
in Morrison & Henkel, 1970, page 85)
        

        "The idea that one should proceed no further with an analysis, once a non-significant
        
F-value for treatments is found, has led many experimenters to
        overlook important information in the interpretation of their data." (Little, 1981)
        

        "The moral of this story is that the finding of statistical significance is perhaps the least important attribute of
        a good experiment: it is 
never a sufficient condition for concluding that a theory has been corroborated, that a
        useful empirical fact has been established with reasonable confidence - or that
        an experimental report ought to be published." (Lykken, 1968)
        

        "The finding of statistical significance is
        perhaps the least important attribute of a good experiment."
        (Lykken,
        
in 
        Morrison & Henkel, 1970)
        

        "Editors must be bold enough to take
        responsibility for deciding which studies are good and which are not, without
        resorting to letting the 
p value of the significance tests determine this decision." (Lykken, 1968)
        

        "Scientists care about whether a result is
        statistically significant, but they should care much more about whether it is meaningful." (McCloskey, 1995)
        

        "Too many users of the analysis of variance
        seem to regard the reaching of a mediocre level of significance as more
        important than any descriptive specification of the underlying averages." (McNemar, 1960)
        

        "So much of what should be regarded as
        preliminary gets published, then quoted as the last word, which it usually is
        because the investigator is too willing to rest on the laurels that come from
        finding a significant difference. Why should he worry about the degree of
        relationship or its possible lack of linearity." (McNemar, 1960)
        

        "In psychological and sociological investigations
        involving very large numbers of subjects, it is regularly found that almost all
        correlations or differences between means are statistically significant."
        (Meehl, 1967,
        
in
        Morrison &
        Henkel, 1970)
        

        [significant] "cancerous" "misleading" (Meehl, 1997, page 421)
        

        "[...] I am cautioning that we must not get
        caught up in the misguided belief that having statistically significant things
        makes our research significant." (Moore, 1992)
        

        "[...] there is ample evidence that it is
        impossible to use the term 'significant' in a statistical context and avoid the
        erroneous connotations of that term (for writers
        
and readers)." (Morrison & Henkel, 1969, 
in  Morrison & Henkel, 1970, page 198)
        

        "To have
        the latter [scientific inference] we will have to have much more than the
        façade that claims of significance provide."
        (Morrison
        & Henkel, 1969,
        
in 
        Morrison &
        Henkel, 1970, page 198)
        

        "We should not feel proud when we see the
        psychologist smile and say 'the correlation is significant beyond the .01
        level.' Perhaps that is the most that he can say, but he has no reason to smile." (Nunnally, 1960)
        

        "To make measurements and then ignore their
        magnitude would ordinarily be pointless. Exclusive reliance on tests of
        significance obscures the fact that statistical significance does not imply
        substantive significance." (Savage, 1957)
        

        "I hope most researchers understand that 
significant (statistically) and 
important 
        are two different things. Surely the term 
significant was ill chosen" (Schafer, 1993, page 387)
        

        "For many years, medical research has overrated the importance of
        
p-values and thereby statistical and clinical significance have been mixed up and
        misinterpreted." (Schmidt, 1995, page 483)
        

        "Is there anybody who would believe that
        the two values are exactly the same? The problem is to get a reliable estimates
        for the difference. You want not statistical significance but practical
        significance." (Schmitt, 1969, page 255)
        

        "Nonsignificance was generally interpreted
        [in the
        
Journal of Abnormal Psychology,
        1984] as confirmation of the null hypothesis (if this was the research
        hypothesis), although the median power was as low as .25 in these cases."
        (Seldmeier & (Gigerenzer, 1989)
        

        "Many users of tests confuse statistical
        significance with substantive importance or with size of association."
        (Selvin, 1957,
        
in 
        Morrison &
        Henkel, 1970,
        page 
        106)
        

        "Moreover, the tendency to dichotomy
        resulting from judging some results 'significant' and other 'nonsignificant'
        can be misleading both to professional and lay audiences." (Skipper, Guenther & Nass, 1967)
        

        "There is no guarantee, form SS [Statistical Significance] that the mean difference is greater than
        infinitesimal." (Sohn, 1998, page 299)
        

        "In many experiments it is well known [...]
        that there are differences among the treatments. The point of the experiment is
        to estimate [...] and provide [...] standard errors. One of the consequences of
        this emphasis on significance tests is that some scientists [...] have come to
        see a significant result as an end in itself." (Street, 1990)
        

        "The emphasis on significance levels tends
        to obscure a fundamental distinction between the size of an effect and its
        statistical significance." (Tversky & Kahneman, 1971)
        

        "The interpretations which have commonly
        been drawn from recent studies indicate clearly that we are prone to conceive
        of statistical significance as equivalent to social significance. These two
        terms are essentially different and ought not to be confused. [...] Differences
        which are statistically significant are not always socially important. The
        corollary is also true: differences which are not shown to be statistically
        significant may nevertheless be socially significant." (Tyler, 1931, pages 115-117)
        

        "The experimenter must keep in mind that significance at the 5% level will only coincide with practical significance by
        chance!" (Upton, 1992)
        

        "But is vital that a
        
statistically significant
        difference should not necessarily be
        assumed to be an
        
important
        difference. [...] It is extremely important that doctors give thought to these
        matters and that they are not persuaded by advertisers or others to accept
        statistically significant differences in the performance of drugs as
        necessarily indicating a difference of practical importance of value."
        (Wade & Waterhouse, 1977,
        page 412)
        

        "The word 'significant' could be abolished
        [...] Based on a dictionary definition, one might expect that results that are
        declared significant would be important, meaningful, or consequential. Being
        'significant at an arbitrary probability level' [...] ensures none of these." (Warren, 1986)
        

        "Results are significant or not significant and this is the end of it." (Yates, 1951)
        
                
                 
        
        
        Editorial policies/Guidelines/Statistical Education
        
        
        "We estimated that 47% (SE=3.9%) of the
        P-values in the Journal of Wildlife Management lacked estimates of means or
        effect sizes or even the sign of the difference in means or other parameters.
        We find that null hypothesis testing is uninformative when no estimates of
        means or effect size and their precision are given. Contrary to common dogma,
        tests of statistical null hypotheses have relatively little utility in science
        and are not a fundamental aspect of the scientific method. We recommend their
        use be reduced in favor of more informative approaches." (Anderson, Burnham & Thompson, 2000, page 912)
        

        "If pressed, we would probably argue that
        Bayesian statistics (with emphasis on objective Bayesian methodology) should be
        the type of statistics that is taught to the masses, with frequentist
        statistics being taught primarily to advanced statisticians" (Bayarri & Berger, 2003, page 3)
        

        "Most elementary statistical procedures have an objective Bayesian interpretation (and
        indeed many were first derived in the inverse probability days of objective Bayesianism).
        Teaching the procedures with this interpretation is much easier than teaching them with
        frequentist interpretations: it is quite a bit easier to understand '
theta is in the interval
        (2.31,4.42) with degree-of-belief probability 0.95' than to understand 'the confidence
        procedure C(
x) will contain 
theta with probability 0.95 if it were repeatedly used with
        random data from the model for a fixed 
theta, and the interval for the given data happened
        to be (2.31,4.42)'." (Berger, 2004, pages 8-9)
                

        "In presenting the main results of a study
        it is good practice to provide confidence intervals rather than to restrict the
        analysis to significance tests. Only by doing so can authors give readers
        sufficient information for a proper conclusion to be done." (Berry, 1986,
        
The Medical Journal of Australia, Editorial)
        

        "Therefore, intending authors are urged to express their main
        conclusions in confidence interval form (possibly with the addition of a
        significance test, although strictly that would provide no extra
        information)." (Berry, 1986, 
The Medical Journal of Australia, Editorial)
        

        "Significance tests are intended solely to
        address the viability of the null hypothesis that a treatment has no effect,
        and not to estimate the magnitude of the treatment effect. Researchers are
        advised to move away from significance tests and to present instead an estimate
        of effect size bounded by confidence intervals."
        (Borenstein, 1997, 
Annals of Allergy, Asthma, & Immunology, page 5)
        

        "The statistical descriptors known as
        confidence intervals can increase the ability of readers to evaluate
        conclusions drawn form small trials."
        (Braitman, 1988,
        
Annals of Internal Medicine, Editorial)
        

        "[...] the point estimate both summarizes the sample and infers the true value; it should
        always be reported. Confidence intervals should be used to assess the clinical
        significance as well as the statistical significance of the main study results.
        When space permits, presenting all the raw data for important results (for
        example, in a graph) is best; this is practical only for relatively small
        studies. In reporting results of statistical tests, exact
        
P values are preferable to verbal statements of 'statistical significance' (or
        
P<0.05) or of nonsignificance (
P>0.05)
        because they contain more information." (Braitman, 1991, 
Annals ofInternal Medicine, Editorial)
        

        "In a large majority of problems
        (especially location problems) hypothesis testing is inappropriate: Set up the confidence
        interval and be done with it!" (Casella & Berger, 1987)
        

        "My main recommendation is for wildlife researchers to stop taking statistical testing so seriously."
        (Cherry, 1998, page 951)
        

        "Since power is a direct monotonic function of sample size, it is recommended that
        investigators use larger sample sizes than they customarily do. It is further
        recommended that research 
plans be
        routinely subjected to power analysis, using as conventions the criteria of
        population effect size employed in this survey." (Cohen, 1962, page 153)
        

        "Rather, we are proposing that indices of
        association are another part of a composite picture a researcher is building
        when he reports data suggesting one or more variables are important in
        understanding a particular behavior." (Craig, Eison & Metze, 1976, page 282)
        

        "I think that much clarity will be achieved if we remove from scientific parlance the
        misleading expressions 'confidence intervals' and 'confidence levels'." (D'Agostini, 2000)
        

        "Together with many recent critics of NHT,
        we also urge reporting of important hypothesis tests in enough descriptive
        detail to permit secondary uses such as meta-analysis."
        (Greenwald, Gonzalez, Harris &
        Guthrie, 1996, page 175)
        

        "Of course, researchers must be exposed to hypothesis tests and p values in their statistical education if for no
        other reason than so they are able to read their literatures. However, more emphasis should be placed on general principles
        and less emphasis on mechanics." (Hoenig & Heisey, 2001, page 23)
        

        "As statisticians, we owe it to researchers
        using statistics in their research to make clear the impact statistics has on
        their work and enable them to choose Bayesian methods. We should train
        researchers well enough to make it possible for them to understand the role
        Bayesian statistics can play in their work." (Iversen, 2000, page 10)
        

        "Authors are required to report and interpret magnitude-of-effect measures in conjunction with every
        
p value that is reported." (Heldref Foundation, 1997, 
Journal of ExperimentalEducation, Guidelines)
        

        "As a teacher, I therefore feel that to
        continue the time honoured practice - still in effect in many schools - of
        teaching pure orthodox statistics to students, with only a passing sneer at
        Bayes and Laplace, is to perpetuate a tragic error which has already wasted
        thousands of man-years of our fines mathematical talent in pursuit of false
        goals. If this talent had been directed toward understanding Laplace's
        contributions and learning how to user them properly, statistical practice
        would be far more advanced than it is." (Jaynes, 1976, page 256)
        

        "Reporting of results in terms of confidence intervals instead of hypothesis tests should be strongly
        encouraged." (Jones, 1984)
        

        "We recommend that authors display the
        estimate of the difference and the confidence limit for this difference." (Jones & Matloff, 1986)
        

        "The only remedy [...] is for journal editors to be keenly aware of the problems associated with hypothesis tests,
        and to be sympathetic, if not strongly encouraging, toward individuals who are
        taking the initial lead in phasing them out." (Jones & Matloff, 1986)
        

        "The reader will find that no traditional significance tests have been reported in connection with the statistical
        results in this volume. This is intentional policy rather than accidental
        oversight." (Kendall, 1957, 
in  Morrison & Henkel, 1970, page 87)
        

        "Evaluations of the outcomes of psychological treatments are favourably enhanced when the
        published report includes not only statistical significance and the required
        effect size but also a consideration of clinical significance. That is, [...]
        it is also important for the evaluator to consider the degree to which the
        outcomes are clinically significant (e.g., normative comparisons)."
        (Kendall, 1997, 
Journal of Consulting and Clinical Psychology, Editorial)
        

        "As a remedial step, I would recommend that statisticians discard the phrase 'test of
        significance', perhaps in favor of the somewhat longer but proper phrase
        'test against the null hypothesis or the abbreviation 'TANH'.
        (Kish, 1959, 
in  Morrison & Henkel, 1970, page 139)
        

        "We suggest that the sole effective therapy for curing its [NHST] 'ills' is a 
smooth transition
        towards the Bayesian paradigm." (Lecoutre, Lecoutre & Poitevineau, 2001, page 413)
        

        "In this book, no statistical tests of significance have been used." (Lipset, Trow & Coleman, 1956,
        
in  Morrison & Henkel, 1970, page 81)
        

        "In particular, I offer the following guidelines. 1. By default,
        data should be conveyed as a figure depicting sample means
        
with associated standard errors and/or, where appropriate, standard deviations.
        2. More often than not, inspection of such a figure will
        immediately obviate the necessity of any hypothesis-testing procedures. In such
        situations, presentation of the usual hypothesis information
        (
F values, 
p values, etc.) will be discouraged." (Loftus, 1993,
        
Memory & Cognition, Editorial comment)
        

        "When a 'significant difference' has been
        established, investigators must then measure the size of the effect and
        consider whether it is of any biological or medical importance." (Lutz & Nimmo, 1977,
        
European Journal of Clinical Investigation, Editorial)
        

        "It is usually wise to give a confidence
        interval for the parameter in which you are interested." (Moore & McCabe)
        

        "If an author decides not to present an
        effect size estimate along with the outcome of a significance test, I will ask
        the author to provide specific justification for why effect sizes are not
        reported. So far, I have not heard a good argument against presenting effect
        sizes. Therefore, unless there is a real impediment to doing so, you should
        routinely include effect size information in the papers you submit."
        (Murphy, 1997, 
Journal of Applied Psychology, Editorial)
        

        "In reporting results, authors should still
        provide measures of variability and address the issue of the generalizability
        and reliability of their empirical findings across people and materials. There
        are a number of acceptable ways to do this, including reporting MSEs and
        confidence intervals and, in case of within-subject or within-items designs,
        the number of people or items that show the effect in the reported
        direction." (Neeley, 1995, 
Learning, Memory, and Cognition, 
21, page 261)
        

        "A confidence interval certainly gives more information than the result of a significance test alone [...]
        I [...] recommend its use [standard error of each mean]." (Perry, 1986)
        

        "The norm should be that only a standard error is quoted for comparing means from an experiment." (Preece)
        

        "[...] confidence intervals are unlikely to be widely reported in the literature unless their use is
        encouraged, or at least not penalized, by the publication criteria of
        journals." (Reichardt & Gollob, 1997, page 282)
        

        "Bayesian hypothesis testing is reasonably well developed [...] and well worth inclusion in the arsenal
        of any data analyst." (Robinson & Wainer, 2002, page 270)
        

        "In the past, journals have encouraged the
        routine use of tests of statistical significance; I believe the time has now
        come for journals to encourage routine use of confidence intervals instead."
        (Rothman, 1978, 
The NewEngland Journal of medicine, Editorial)
        

        "Whenever possible, the basic statistical report should be in the form of a confidence interval."
        (Rozeboom, 1960, 
in  Morrison & Henkel, 1970)
        

        "Accepting the proposition that significance testing should be discontinued and replaced by point estimates and
        confidence intervals entails the difficult effort of changing the beliefs and practices
        of a lifetime. Naturally such a prospect provokes resistance. Researchers would
        like to believe there is a legitimate rationale for refusing to make such a
        change." (Schmidt & Hunter, 1997, page 49)
        

        "It just seemed high time that someone stirred the Bayesian pot on an elementary level so that practitioners, rather
        than theorists, could start discussions and supply feedback to one another." (Schmitt, 1969, preface)
        

        "It is recommended that, when inferential statistical analysis is performed, CIs [confidence intervals] should accompany
        point estimates and conventional hypothesis tests wherever possible." (Sim & Reid, 1999, page 186)
        

        "We will go further [than mere encouragement]. Authors reporting statistical significance will be 
required
        to both report and interpret effect sizes. However, these effect sizes may be of various forms, including
        standardized differences, or uncorrected (e.g.,
        
r2, 
R2, 
eta2) or corrected (e.g.,
        adjusted 
R2, 
omega2)
        variance-accounted-for statistics." (Thompson, 1994,
        
Educational and Psychological Measurement, Guidelines)
        

        "It is proposed to judge the clinical relevance and importance by means of four
        values, fixed in discussions with the clinician before commencement of the
        study, and to proceed by testing non-zero nullhypotheses (shifted
        nullhypotheses) where the 'clinically relevant difference'
        is the shift parameter." (Victor, 1987, page 109)
        

        "It would seem to us to be easier for those
        who design clinical trials to continue to use the usual form of tests of
        significance based on the null hypothesis. But is vital that a
        
statisticallysignificant difference should not necessarily be assumed to be an
        
important difference. [...] It is extremely important that doctors [...] are not
        persuaded by advertisers or others to accept statistically significant
        differences in the performance of drugs as necessarily indicating a difference
        of practical importance of value." (Wade & Waterhouse, 1977,
        
British Journal of Clinical Pharmacology, Editorial)
        

        "It is hard to imagine a situation in which
        a dichotomous accept-reject decision is better than reporting an actual
        
p value or, better still, a confidence interval. 
        Never use the unfortunate expression 'accept the null hypothesis.' Always provide
        some effect-size estimate when reporting a 
p value." (Wilkinson and Task
        Force on Statistical Inference, 1999)
        

        "Interval estimates should be given for any
        effect sizes involving principal outcomes. Provide intervals for correlations
        and other coefficients of association or variation whenever possible."
        (Wilkinson and Task Force on Statistical Inference, 1999)
        

        "Always present effect sizes for primary
        outcomes. If the units of measurement are meaningful on a practical level
        (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized
        measure (regression coefficient or mean difference) to a standardized
        measure." (Wilkinson and Task Force on Statistical Inference, 1999)
        

        "Provide information on sample size and the
        process that led to sample size decisions. Document the effect sizes, sampling
        and measurement assumptions, as well as analytic procedures used in power
        calculations. Because power computations are most meaningful when done before
        data are collected and examined, it is important to show how effect-size
        estimates have been derived from previous research and theory in order to
        dispel suspicions that they might have been taken from data used in the study
        or, even worse, constructed to justify a particular sample size."
        (Wilkinson and Task Force on Statistical Inference, 1999)
        

        "We encourage researchers to use CIs to present their research findings, rather than relying on p-values alone."
        (Wolfe & Cumming, 2004, page 138)