Use of multilevel regression to identify the causes of differential item functioning

  1. Balluerka Lasa, Nekane
  2. Gorostiaga Manterola, Arantxa
  3. Gómez Benito, Juana
  4. Hidalgo Montesinos, María Dolores
Revista:
Psicothema

ISSN: 0214-9915

Año de publicación: 2010

Volumen: 22

Número: 4

Páginas: 1018-1025

Tipo: Artículo

Otras publicaciones en: Psicothema

Resumen

Utilización de la regresión logística multinivel para identifi car las causas del funcionamiento diferencial de los ítems. Dada la relevancia de los tests como instrumentos de evaluación y de toma de decisiones en los campos de la psicología y de la educación, la posibilidad de que algunos de sus ítems presenten un comportamiento diferencial constituye una preocupación central de los psicómetras. En las últimas décadas se han producido importantes avances con respecto a las técnicas diseñadas para detectar el funcionamiento diferencial de los ítems (DIF). Sin embargo, los hallazgos son escasos en lo que respecta a identifi car las causas que lo explican. El presente trabajo aborda este problema desde la perspectiva del análisis multinivel. Partiendo del estudio de un caso del ámbito de las comparaciones transculturales, se utiliza la regresión logística multinivel para: 1) identifi car las características de los ítems asociadas a la presencia de DIF; 2) estimar la proporción de la variación en los coefi cientes de DIF explicada por tales características; y 3) evaluar explicaciones alternativas para el DIF comparando la capacidad explicativa o el ajuste de diferentes modelos. La comparación entre tales modelos permitió confi rmar una de las dos alternativas (la familiaridad con el estímulo) y descartar la otra (el tema de estudio) como causa del funcionamiento diferencial de los ítems en los grupos comparados.

Referencias bibliográficas

  • Ackerman, T. (1992). A didactic explanation of item bias, item impact and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.
  • Allalouf, A., Hambleton, R.K., & Sireci, S.G. (1999). Identifying the causes of DIF in Translated Verbal Items. Journal of Educational Measurement, 36(3), 185-198.
  • Bryk, A.S., & Raudenbush, S.W. (1992). Hierarchical linear models for social and behavioural research: Applications and data analysis methods. Newbury Park, CA: Sage Publications.
  • Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.
  • Chang, H., Mazzeo, J., & Roussos, L.A. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.
  • Cheong, Y.F., & Raudenbush, S.W. (2000). Measurement and structural models of children's problem behaviors. Psychological Methods, 5, 477-495.
  • Clauser, B.E., & Mazor, K.M. (1998). Using statistical procedures to identify differentially functioning test items (ITEMS Module). Educational Measurement: Issues and Practice, 17(1), 31-44.
  • Clauser, B.E., Nungester, R.J., & Swaminathan, H. (1996). Improving the matching for DIF analysis by conditioning on both test score and an educational background variable. Journal of Educational Measurement, 33, 453-464.
  • Cohen, A.S., Kim, S.H., & Baker, E. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17, 335-350.
  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-559.
  • Ferne, T., & Rupp, A.A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges and recommendations. Language Assessment Quarterly, 4, 113-148.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio. Applied Psychological Measurement, 29, 278-295.
  • Flowers, C.P., Oshima, T.C., & Raju, N.S. (1999). A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309-326.
  • French, A.W., & Miller, T.R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33(3), 315-333.
  • Gelin, M.N., & Zumbo, B.D. (2007). Operating characteristics of the DIF MIMIC approach using Jöreskog's covariance matrix with ML and WLS estimation for short scales. Journal of Modern Applied Statistical Methods, 6, 573-588.
  • Gierl, M.J., & Khaliq, S.N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests. Journal of Educational Measurement, 38, 164-187.
  • Goldstein, H.I. (1987). Multilevel models in educational and social research. London: Oxford University Press.
  • Gómez-Benito, J., & Hidalgo, M.D. (1997). Evaluación del funcionamiento diferencial en ítems dicotómicos: una revisión metodológica. Anuario de Psicología, 74, 3-32.
  • Hidalgo, M.D., & Gómez-Benito, J. (1999). Técnicas de detección del funcionamiento diferencial en ítems politómicos. Metodología de las Ciencias del Comportamiento, 1, 39-60.
  • Hidalgo, M.D., & Gómez-Benito, J. (2000). Comparación de la eficacia de la regresión logística politómica y el análisis discriminante logístico en la detección del DIF no uniforme. Psicothema, 12(2), 298-300.
  • Hidalgo, M.D., & Gómez-Benito, J. (2010). Education measurement: Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.): International Encyclopedia of Education, 3rd edition. Elsevier Science & Technology.
  • Holland, P.W., & Thayer, D.T. (1988). Differential item performance and Mantel-Haenszel procedure. In H. Wainer, & H.I. Braun (Eds.): Test Validity. Hillsdale, N.J.: Erlbaum.
  • Kim, S.-H., & Cohen, A.S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345-355.
  • Kim, S.-H., Cohen, A.S., Alagoz, C., & Kim, S. (2007). DIF detection and effect size measures for polytomous scored items. Journal of Educational Measurement, 44, 93-116.
  • Kristjansson, E., Aylesworth, R., McDowell, I., & Zumbo, B.D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65, 935-953.
  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: LEA.
  • McCullagh, P., & Nelder, J.A. (1989). Generalized linear models. Chapman and Hall: London.
  • Miller, T., & Spray, J. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30(2), 107-122.
  • Millsap, R.E., & Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.
  • Muthén, B.O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557-585.
  • Ordóñez, X.G., & Romero, S.J. (2007). XS-DIF: programa para el análisis del funcionamiento diferencial de los ítems en Excel. Psicothema, 19, 171-172.
  • Osterlind, S.J., & Everson, H.T. (2009). Differential item functioning (2nd edition). Thousand Oaks, California: Sage Publications, Inc.
  • Padilla, J.L., Pérez, C., & González, A. (1998). La explicación del sesgo en los ítems. Psicothema, 2, 481-490.
  • Potenza, M.T., & Dorans, N.J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
  • Raju, N.S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.
  • Raudenbush, S., Bryk, A., Cheong, Y.F., & Congdon, R. (2004). HLM 6: Hierarchical linear and nonlinear modeling. Scientific Software International.
  • Rogers, H.J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
  • Roussos, L.A., & Stout, W.F. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371.
  • Roussos, L. A., & Stout, W. F. (2004). Differential item functioning analysis. In D. Kaplan (Ed.): The Sage handbook of quantitative methodology for the social sciences (pp. 107-116). Thousand Oaks, CA: Sage.
  • Shealy, R.T., & Stout, W.F. (1993a). An item response theory model for test bias and differential test functioning. In Holland, P.W., & Wainer, H. (Eds.): Differential item functioning (pp. 197-239). Hillsdale, NJ: LEA.
  • Shealy, R.T., & Stout, W.F. (1993b). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 58, 159-194.
  • Shih, C.-L., & Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
  • Snijders, T., & Bosker, R. (1999). Multilevel analysis. London: Sage Publications.
  • Su, Y.-H., & Wang, W.-C (2005). Efficiency of the Mantel-Haenszel, generalized Mantel-Haesnzel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.
  • Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
  • Swanson, D.B., Clauser, B.E., Case, S. M., Nungester, R.J., & Featherman, C. (2002). Analysis of differential item functioning (DIF) using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27, 53-75.
  • Thissen, D. (2001). IRTLRDIF v.2.0b [Computer program]. University of North Carolina at Chapel Hall: L.L. Thurstone Psychometric Laboratory.
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.): Differential item functioning (pp. 67-113). Hillsdale, NJ: LEA.
  • Welch, C., & Hoover, H.D. (1993). Procedures for extending item bias detection techniques to polytomously scored items. Applied Measurement in Education, 6, 1-19.
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic Regression Modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
  • Zumbo, B.D., & Gelin, M.N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1-23.
  • Zwick, R., & Thayer, D.T. (1996). Evaluating the magnitude of differential item functioning in polytomous items. Journal of Educational and Behavioral Statistics, 21(3), 187-201.
  • Zwick, R., Donoghue, J., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of Educational Measurement, 30, 233-251.