AzterTestHerramienta de Análisis Lingüístico y Estilístico de Código Abierto

  1. Kepa Bengoetxea, I
  2. Amaia Aguirregoitia
  3. tziar González-Dios
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2020

Número: 64

Páginas: 61-68

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

El análisis de texto es un procedimiento útil para ayudar a los profesionales de la educación en la selección de los textos más adecuados para sus alumnos. Esta tarea exige el análisis de varias características de texto (por ejemplo, complejidad sintáctica, variedad de palabras, etc.), que se realiza principalmente de forma manual. En este artículo, presentamos AzterTest, una herramienta de código abierto para el análisis lingüístico y estilístico. AzterTest calcula 153 características y obtiene una exactitud de 90.09 % al distinguir tres niveles de lectura (elemental, intermedio y avanzado). AzterTest también se encuentra disponible como herramienta web.

Información de financiación

We acknowledge following projects: DL4NLP (KK-2019/00045), DeepReading RTI2018-096846-B-C21 (MCIU/AEI/FEDER, UE) and BigKnowledge for Text Mining, BBVA.

Referencias bibliográficas

  • Aluísio, S., L. Specia, C. Gasperin, and C. Scarton. 2010. Readability assessment for text simplification. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 1–9. ACL.
  • Boros, T., S. D. Dumitrescu, and R. Burtica. 2018. Nlp-cube: End-to-end raw text processing with neural networks. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 171–179.
  • Cer, D., Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175.
  • Chall, J. S. and E. Dale. 1995. Readability Revisited: The New Dale–Chall Readability Formula. Brookline Books, Cambridge, MA.
  • Dell’Orletta, F., S. Montemagni, and G. Venturi. 2011. READ-IT: assessing readability of Italian texts with a view to text simplification. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, SLPAT ’11, pages 73–83. ACL.
  • Feng, L., M. Jansche, M. Huenerfauth, and N. Elhadad. 2010. A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276–284. ACL.
  • Flesch, R. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221.
  • François, T. and C. Fairon. 2012. An AI readability formula for French as a foreign language. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 466–477. ACL.
  • Gonzalez-Dios, I., M. J. Aranzabe, A. Díaz de Ilarraza, and H. Salaberri. 2014. Simple or complex? assessing the readability of basque texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 334–344, Dublin, Ireland, August. DCU and ACL.
  • Graesser, A. C., D. S. McNamara, and J. M. Kulikowich. 2011. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational Researcher, 40(5):223–234.
  • Gunning, R. 1968. The technique of clear writing. McGraw-Hill New York.
  • Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The WEKA data mining software: an update. AzterTest: Open source linguistic and stylistic analysis tool 67 ACM SIGKDD Explorations Newsletter, 11(1):10–18.
  • Hancke, J., S. Vajjala, and D. Meurers. 2012. Readability Classification for German using lexical, syntactic, and morphological features. In COLING 2012: Technical Papers, page 1063–1080.
  • Landwehr, N., M. Hall, and E. Frank. 2005. Logistic model trees. 95(1-2):161–205.
  • Madrazo, I. and M. S. Pera. 2019. Multiattentive recurrent neural network architecture for multilingual readability assessment. Transactions of the Association for Computational Linguistics, 7:421–436.
  • Mc Laughlin, G. H. 1969. SMOG grading-a new readability formula. Journal of reading, 12(8):639–646.
  • Miller, G. A. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41. OECD. 2016.
  • PISA 2015. Results in Focus. OECD Publishing.
  • Parodi, G. 2006. Discurso especializado y lengua escrita: Foco y variación. Estudios filológicos, (41):165–204.
  • Petersen, S. E. and M. Ostendorf. 2009. A machine learning approach to reading level assessment. Computer Speech & Language, 23(1):89–106.
  • Platt, J. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14.
  • Qi, P., T. Dozat, Y. Zhang, and C. D. Manning. 2019. Universal dependency parsing from scratch. arXiv preprint arXiv:1901.10457.
  • Quispersaravia, A., W. Perez, M. A. S. Cabezudo, and F. Alva-Manchengo. 2016. Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 4694–4698.
  • Scarton, C. and S. M. Aluısio. 2010. Cohmetrix-port: a readability assessment tool for texts in brazilian portuguese. In Proceedings of the 9th International Conference on Computational Processing of the Portuguese Language, Extended Activities Proceedings, PROPOR, volume 10. sn.
  • Si, L. and J. Callan. 2001. A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management, pages 574–576. ACM.
  • Speer, R., J. Chin, A. Lin, S. Jewett, and L. Nathan. 2018. Luminosoinsight/wordfreq: v2.2, October.
  • Vajjala, S. and I. Lucic. 2018. Onestopenglish corpus: A new corpus for automatic readability assessment and text simplification.
  • Venegas, R. 2008. Interfaz computacional de apoyo al análisis textual:“el manchador de textos”. RLA. Revista de lingüística teórica y aplicada, 46(2):53–79.
  • Stajner, S. and H. Saggion. 2013. Readability Indices for Automatic Evaluation of Text Simplification Systems: A Feasibility Study for Spanish. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 374–382, Nagoya, Japan, October. Asian Federation of Natural Language Processing.
  • Weide, R. 2005. The carnegie mellon pronouncing dictionary [cmudict. 0.6].
  • Zeman, D. and J. Hajiˇc, editors. 2018. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. ACL, Brussels, Belgium, October