Calculating the main alternatives to null-hypothesis-significance testing in between-subject experimental designs

  1. Balluerka Lasa, Nekane
  2. Vergara Iraeta, Ana Isabel
  3. Arnau Gras, Jaume
Revista:
Psicothema

ISSN: 0214-9915

Año de publicación: 2009

Volumen: 21

Número: 1

Páginas: 141-151

Tipo: Artículo

Otras publicaciones en: Psicothema

Resumen

Cálculo de las principales alternativas a la prueba de significación de la hipótesis nula en diseños experimentales intersujetos. El presente artículo se centra en la controversia existente actualmente en torno a la prueba de significación de la hipótesis nula en la investigación psicológica. Se exponen las principales críticas y contra-críticas que plantean sus detractores y defensores, así como las alternativas que, contando con el apoyo de la Task Force on Statistical Inference de la APA, han sido propuestas para sustituirla o complementarla. Además, se muestra la forma en la que pueden calcularse, mediante el programa estadístico SPSS, tales índices alternativos en un diseño factorial aleatorio. Con ello se pretende dotar al investigador aplicado de una serie de recursos que le permitan analizar e interpretar los resultados de cualquier estudio manejando un conjunto de indicadores que proporcionan un alto grado de validez a la inferencia realizada.

Referencias bibliográficas

  • Allen, M., & Preiss, R. (1993). Replication and meta-analysis: A necessary connection. Journal of Social Behavior and Personality, 8(6), 9-20.
  • Abelson, R.P. (1995). Statistics as principled argument. Hillsdale, NJ: Erlbaum.
  • Bakan, D. (1966). The tests of significance in psychological research. Psychological Bulletin, 66, 423-437.
  • Balluerka, N., Gómez, J., & Hidalgo, M.D. (2005). The controversy over null hypothesis significance testing revisited. METHODOLOGY. European Journal of Research Methods for the Behavioral and Social Sciences, 1(2), 55-70.
  • Baril, G.L., & Cannon, J.T. (1995). What is the probability that null hypothesis testing is meaningless? American Psychologist, 50, 1098- 1099.
  • Binder, A. (1963). Further considerations on testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review, 70, 107-115.
  • Bracey, G.W. (1991). Sense, non-sense and statistics. PhiDelta Kappan, 73, 335.
  • Branstätter, E. (1999). Confidence intervals as an alternative to significance testing. Methods of Psychological Research Online, 4(2), 33-46.
  • Carver, R.P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378-399.
  • Carver, R.P. (1993). The case against statistical significance testing revisited. Journal of Experimental Education, 61, 287-292.
  • Chase, L.J., & Chase, R.B. (1976). A statistical power analysis of applied psychological research. Journal of Applied Psychology, 61, 234-237.
  • Chow, S.L. (1987). Experimental psychology: Rationale, procedures and issues. Calgary, Alberta, Canada: Detselig Enterprises.
  • Chow, S.L. (1988). Significance test or effect size? Psychological Bulletin, 103, 105-110.
  • Chow, S.L. (1989). Significance tests and deduction: Reply to Folger (1989). Psychological Bulletin, 106, 161-165.
  • Chow, S.L. (1991). Some reservations about power analysis. American Psychologist, 46, 1088-1089.
  • Chow, S.L. (1996). Statistical significance: Rationale, validity and utility. Beverly Hills, CA.: Sage.
  • Chow, S.L. (1998a). Précis of statistical significance: Rationale, validity and utility. Behavioral and Brain Sciences, 21, 169-239.
  • Chow, S.L. (1998b). What statistical significance means. Theory and Psychology, 8, 323-330.
  • Clark-Carter, D. (1997). The account taken of statistical power in research published in the British Journal of Psychology. British Journal of Psychology, 88, 71-83.
  • Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145-153.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (rev. ed.). Hillsdale, NJ: Erlbaum.
  • Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304-1312.
  • Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.
  • Cook, T.D., & Campbell, D.T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally.
  • Cortina, J.M., & Dunlap, W.P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2(2), 161-172.
  • Cowles, M., & Davis, C. (1982). On the origins of the .05 level of statistical significance. American Psychologist, 37(5), 553-558.
  • Cox, D.R. (1977). The role of significance tests. Scandinavian Journal of Statistics, 4, 49-70.
  • Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532-574.
  • Dixon, P. (1998). Why scientists value p values. Psychonomic Bulletin and Review, 5, 390-396.
  • Erwin, E. (1998). The logic of null hypothesis testing. Behavioral and Brain Sciences, 21, 197-198.
  • Estes, W.K. (1997). On the communication of information by displays of standard errors and confidence intervals. Psychonomic Bulletin and Review, 4(3), 330-341.
  • Falk, R., & Greenbaum, C.W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory and Psychology, 5, 75-98.
  • Fidler, F. (2002). The fifth edition of the APA publication manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement, 62(5), 749-770.
  • Fidler, F., & Thompson, B. (2001). Computing correct confidence intervals for ANOVA fixed-and random-effects effect sizes. Educational and Psychological Measurement, 61(4), 575-604.
  • Finch, S., Cumming, G., & Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61(2), 181-210.
  • Fisher, R.A. (1925). Statistical methods for research workers. London: Oliver & Boyd.
  • Fisher, R.A. (1937). The design of experiments. London: Oliver & Boyd.
  • Folger, R. (1989). Significance tests and the duplicity of binary decisions. Psychological Bulletin, 106, 155-160.
  • Frías, D., García, J.F., & Pascual, J. (1994). Estudio de la potencia de los trabajos publicados en Psicológica. Estimación del número de sujetos fijando alfa y beta. III Simposium de Metodología de las Ciencias Sociales y del Comportamiento (pp. 1057-1063). Santiago de Compostela: Servicio de Publicaciones de la Universidad de Santiago de Compostela.
  • Frick, R.W. (1995). Accepting the null hypothesis. Memory & Cognition, 23(1), 132-138.
  • Frick, R.W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1(4), 379-390.
  • Glass, G.V. (1976). Primary, secondary and meta-analysis of research. Educational Researcher, 5, 3-8.
  • Glass, G.V., McGraw, B., & Smith, M.L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage.
  • Grant, D.A. (1962). Testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological review, 69, 54-61.
  • Greenwald, A.G., González, R., Harris, R.J., & Guthrie, D. (1996). Effect sizes and pvalues: What should be reported and what should be replicated? Psychophysiology, 33, 175-183.
  • Hagen, R.L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52(1), 15-24.
  • Harris, R.J. (1991). Significance tests are not enough: The role of effectsize estimation in theory corroboration. Theory and Psychology, 1, 375-382.
  • Hays, W.L. (1963). Statistics for psychologists. New York, NY: Holt, Rinehart & Winston.
  • Hays, W. L. (1994). Statistics (4th ed.). New York: Holt, Rinehart and Winston. Hubbard, R., & Armstrong, J.S. (1994). Replications and extensions in Marketing: Rarely published but quite contrary. International Journal of Research in Marketing, 11, 233-248.
  • Hubbard, R., Parsa, A.R., & Luthy, M.R. (1997). The spread of statistical significance testing in psychology: The case of the Journal of Applied Psychology, 1917-1994. Theory and Psychology, 7(4), 545-554.
  • Hubbard, R., & Ryan, P.A. (2000). The historical growth of statistical significance testing in psychology and its future prospects. Educational and Psychological Measurement, 60(5), 661-681.
  • Huberty, C.J. (1987). On statistical testing. Educational Researcher, 16(8), 4-9.
  • Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62, 227-240.
  • Hunter, J.E. (1997). Need: A ban on the significance test. Psychological Science, 8, 3-7.
  • Jeffrey, H. (1934). Probability and scientific method. Proceedings of the Royal Society of London, Series A, 146, 9-16.
  • Johnson, D.H. (1999). The insignificance of statistical significance testing. Journal of Wildlife Management, 63, 763-772.
  • Kazdin, A.E., & Bass, D. (1989). Power to detect differences between alternative treatments in comparative psychotherapy outcome research. Journal of Consulting and Clinical Psychology, 57, 138-147.
  • Kelley, T.L. (1935). An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences, 21, 554-559.
  • Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16(5), 345-353.
  • Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746-759.
  • Kirk, R.E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61(2), 213-218.
  • Lindsay, R.M., & Ehrenberg, A.S.C. (1993). The design of replicated studies. American Statistician, 47, 217-228.
  • Loftus, G.R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36, 102-105.
  • Loftus, G.R. (1995). Data analysis as insight: Reply to Morrison and Weaver. Behavior Research Methods, Instruments and Computers, 27, 57-59.
  • Loftus, G.R. (1996). Psychology will be a much better science when we change the way to analyse data. Current Directions in Psychological Science, 5, 161-171.
  • Mayo, D. (1996). Error and the growth of experimental knowledge. Chicago: The University of Chicago Press.
  • Meehl, P.E. (1990a). Appraising and amending theories: The strategy of Lakatosian defence and two principles that warrant it. Psychological Inquiry, 1, 108-141.
  • Meehl, P.E. (1990b). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195-244.
  • Meehl, P.E. (1991). Why summaries of research on psychological theories are often uninterpretable. In R.E. Snow & D.E. Wilet (Eds.): Improving inquiry in social science: A volume in honor of Lee J. Cronbach (pp. 13-59). Hillsdale, NJ: Erlbaum.
  • Meehl, P.E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L.L. Harlow, S.A. Mulaik & J.H. Steiger (Eds.): What if there were no significance tests? (pp. 391-423). Hillsdale, NJ: Erlbaum.
  • Monterde, H., Pascual, J., & Frías, M.D. (2006). Errores de interpretación de los métodos estadísticos: importancia y recomendaciones. Psicothema, 18(4), 848-856.
  • Murphy, K.R. (1990). If the null hypothesis is impossible, why test it? American Psychologist, 45(3), 403-404.
  • Nickerson, R.S. (2000). Null hypothesis significance testing: A review of and old and continuing controversy. Psychological Methods, 5(2), 241-301.
  • Oakes, M. (1986). Statistical inference: A commentary for social and behavioral sciences. New York: Wiley.
  • Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations and limitations. Contemporary Educational Psychology, 25, 241-286.
  • Pearson, K. (1905). Mathematical contributions to the theory of evolution: XIV. On the general theory of skew correlations and nonlinear regression (Draper’s Company Research Memoirs, Biometric Series II). London: Dulau.
  • Pollard, P. (1993). How significant is «significance»? In G. Keren & C. Lewis (Eds.): A handbook for data analysis in the behavioural sciences: Volume 1. Methodological issues. Hillsdale, NJ: Erlbaum.
  • Robinson, D., & Levin, J. (1997). Reflections on statistical and substantive significance, with a slice of replication. Educational Researcher, 26(5), 21-26.
  • Robinson, D.H., & Wainer, H. (2001). On the past and future of null hypothesis significance testing. Princeton: Statistics & Research Division.
  • Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13.
  • Rosenthal, R. (1984). Meta-analytic procedures for social research. Beverly Hills, CA: Sage.
  • Rosnow, R.L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276-1284.
  • Rossi, J.S. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 5, 646-656.
  • Rozeboom, W.W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416-428.
  • Sánchez, J., Valera, A., Velandrino, A.P., & Marín, F. (1992). Un estudio de la potencia estadística en Anales de Psicología. Anales de Psicología, 8, 19-32.
  • Schafer, W.D. (1993). Interpreting statistical significance and non-significance. Journal of Experimental Education, 61, 383-387.
  • Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115-129.
  • Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309-316.
  • Serlin, R.C., & Lapsley, D.K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73-83.
  • Serlin, R.C., & Lapsley, D.K. (1993). Rational appraisal of psychological research and the good-enough principle. In G. Keren & C. Lewis (Eds.): A handbook of data analysis in behavioural sciences: Volume 1. Methodological Issues (pp. 199-228). Hillsdale, NJ: Erlbaum.
  • Shaver, J. (1985). Chance and nonsense: A conversation about interpreting tests of statistical significance. PhiDelta Kappan, 67(1), 138-141.
  • Shaver, J. (1993). What statistical significance testing is and what it is not. Journal of Experimental Education, 61(4), 293-316.
  • Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61(4), 305-632.
  • Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334-349.
  • Snow, R.E. (1998). Inductive strategy and statistical tactics. Behavioral and Brain Sciences, 21, 219.
  • Steiger, J.H., & Fouladi, R.T. (1997). Noncentral interval estimation and the evaluation of statistical models. In L.L. Harlow, S.A. Mulaik & J.H. Steiger (Eds.): What if there were no significance tests? (pp. 221-258). Hillsdale, NJ: Erlbaum.
  • Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation. Journal of Consulting and Clinical Psychology, 70, 434-438.
  • Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26-30.
  • Thompson, B. (2002). «Statistical», «practical» and «clinical»: How many kinds of significance do counsellors need to consider? Journal of Counseling and Development, 80, 64-71.
  • Thompson, B., & Baugh, F. (2002). Using effect sizes, confidence intervals and especially confidence intervals for effect sizes: New APA and journal mandates for improved practices. Metodología de las Ciencias del Comportamiento, special number, 539-543.
  • Tukey, J.W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100-116.
  • Valera, A., Sánchez, J., & Marín, F. (2000). Contraste de hipótesis e investigación psicológica española: análisis y propuestas. Psicothema, 12(2), 549-552.
  • Weitzman, R.A. (1984). Seven treacherous pitfalls of statistics, illustrated. Psychological Reports, 54(2), 355-363.
  • Wilkinson, L., & the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.