AzterTestHerramienta de Análisis Lingüístico y Estilístico de Código Abierto

  1. Kepa Bengoetxea, I
  2. Amaia Aguirregoitia
  3. tziar González-Dios
Zeitschrift:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Datum der Publikation: 2020

Nummer: 64

Seiten: 61-68

Art: Artikel

Andere Publikationen in: Procesamiento del lenguaje natural

Zusammenfassung

Text Analysis is a useful process to assist teachers in the selection of the most suitable texts for their students. This task demands the analysis of several text features, which is done mostly manually (e.g. syntactic complexity, words variety, etc.). In this paper, we present an open source tool useful for linguistic and stylistic analysis, called AzterTest. AzterTest calculates 153 features and obtains 90.09 % in accuracy when classifying into three reading levels (elementary, intermediate, and advanced). AzterTest is available also as web tool.AzterTest:Herramienta de Análisis Lingüístico y Estilístico de Código Abierto

Informationen zur Finanzierung

We acknowledge following projects: DL4NLP (KK-2019/00045), DeepReading RTI2018-096846-B-C21 (MCIU/AEI/FEDER, UE) and BigKnowledge for Text Mining, BBVA.

Bibliographische Referenzen

  • Aluísio, S., L. Specia, C. Gasperin, and C. Scarton. 2010. Readability assessment for text simplification. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pages 1–9. ACL.
  • Boros, T., S. D. Dumitrescu, and R. Burtica. 2018. Nlp-cube: End-to-end raw text processing with neural networks. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 171–179.
  • Cer, D., Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175.
  • Chall, J. S. and E. Dale. 1995. Readability Revisited: The New Dale–Chall Readability Formula. Brookline Books, Cambridge, MA.
  • Dell’Orletta, F., S. Montemagni, and G. Venturi. 2011. READ-IT: assessing readability of Italian texts with a view to text simplification. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, SLPAT ’11, pages 73–83. ACL.
  • Feng, L., M. Jansche, M. Huenerfauth, and N. Elhadad. 2010. A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 276–284. ACL.
  • Flesch, R. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221.
  • François, T. and C. Fairon. 2012. An AI readability formula for French as a foreign language. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 466–477. ACL.
  • Gonzalez-Dios, I., M. J. Aranzabe, A. Díaz de Ilarraza, and H. Salaberri. 2014. Simple or complex? assessing the readability of basque texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 334–344, Dublin, Ireland, August. DCU and ACL.
  • Graesser, A. C., D. S. McNamara, and J. M. Kulikowich. 2011. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational Researcher, 40(5):223–234.
  • Gunning, R. 1968. The technique of clear writing. McGraw-Hill New York.
  • Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The WEKA data mining software: an update. AzterTest: Open source linguistic and stylistic analysis tool 67 ACM SIGKDD Explorations Newsletter, 11(1):10–18.
  • Hancke, J., S. Vajjala, and D. Meurers. 2012. Readability Classification for German using lexical, syntactic, and morphological features. In COLING 2012: Technical Papers, page 1063–1080.
  • Landwehr, N., M. Hall, and E. Frank. 2005. Logistic model trees. 95(1-2):161–205.
  • Madrazo, I. and M. S. Pera. 2019. Multiattentive recurrent neural network architecture for multilingual readability assessment. Transactions of the Association for Computational Linguistics, 7:421–436.
  • Mc Laughlin, G. H. 1969. SMOG grading-a new readability formula. Journal of reading, 12(8):639–646.
  • Miller, G. A. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41. OECD. 2016.
  • PISA 2015. Results in Focus. OECD Publishing.
  • Parodi, G. 2006. Discurso especializado y lengua escrita: Foco y variación. Estudios filológicos, (41):165–204.
  • Petersen, S. E. and M. Ostendorf. 2009. A machine learning approach to reading level assessment. Computer Speech & Language, 23(1):89–106.
  • Platt, J. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14.
  • Qi, P., T. Dozat, Y. Zhang, and C. D. Manning. 2019. Universal dependency parsing from scratch. arXiv preprint arXiv:1901.10457.
  • Quispersaravia, A., W. Perez, M. A. S. Cabezudo, and F. Alva-Manchengo. 2016. Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 4694–4698.
  • Scarton, C. and S. M. Aluısio. 2010. Cohmetrix-port: a readability assessment tool for texts in brazilian portuguese. In Proceedings of the 9th International Conference on Computational Processing of the Portuguese Language, Extended Activities Proceedings, PROPOR, volume 10. sn.
  • Si, L. and J. Callan. 2001. A statistical model for scientific readability. In Proceedings of the tenth international conference on Information and knowledge management, pages 574–576. ACM.
  • Speer, R., J. Chin, A. Lin, S. Jewett, and L. Nathan. 2018. Luminosoinsight/wordfreq: v2.2, October.
  • Vajjala, S. and I. Lucic. 2018. Onestopenglish corpus: A new corpus for automatic readability assessment and text simplification.
  • Venegas, R. 2008. Interfaz computacional de apoyo al análisis textual:“el manchador de textos”. RLA. Revista de lingüística teórica y aplicada, 46(2):53–79.
  • Stajner, S. and H. Saggion. 2013. Readability Indices for Automatic Evaluation of Text Simplification Systems: A Feasibility Study for Spanish. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 374–382, Nagoya, Japan, October. Asian Federation of Natural Language Processing.
  • Weide, R. 2005. The carnegie mellon pronouncing dictionary [cmudict. 0.6].
  • Zeman, D. and J. Hajiˇc, editors. 2018. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. ACL, Brussels, Belgium, October