Grammatical error correction for Spanish health records

Salvador Lima López; Naiara Pérez; Montserrat Cuadros Oller

Grammatical error correction for Spanish health records

Salvador Lima López
Naiara Pérez
Montserrat Cuadros Oller

Aldizkaria:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2021

Zenbakia: 66

Orrialdeak: 121-132

Mota: Artikulua

DIALNET GOOGLE SCHOLAR RUA editor

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

Este artículo presenta el primer trabajo sobre la corrección gramatical de textos clínicos en español. En este trabajo, presentamos un conjunto de experimentos basados en redes neuronales y aumentación de datos, en los cuales conseguimos una puntuación de 70,89 F0,5. Además, se presentan dos corpus creados para esta tarea: el corpus IMEC, un corpus médico corregido manualmente, y el corpus TMAE, un corpus de textos clínicos aumentado con errores.

€ Ikusi finantzaketa

Finantzaketari buruzko informazioa

This work has been supported by Vi- comtech and partially funded by the projects DeepText (KK-2020-00088, SPRI, Basque Government) and DeepReading (RTI2018-096846-B-C21, MCIU/AEI/FEDER, UE). We also want to thank Olatz Pérez de Viñaspre, who has collaborated in the research behind this article and whose contributions have been essential.

Finantzatzaile

Steadman Philippon Research Institute United States
Ministerio de Ciencia, Innovación y Universidades Spain
Eusko Jaurlaritza Spain
- RTI2018-096846-B-C21
European Regional Development Fund European Union
Agencia Estatal de Investigación Spain

Erreferentzia bibliografikoak

Aguilar Ruíz, M. J. 2013. Las normas ortográficas y ortotipográficas de la nueva Ortografía de la lengua española (2010) aplicadas a las publicaciones biomédicas en español: una visión de conjunto. Panace@: Revista de Medicina, Lenguaje y Traducción, XIV(37):101–120.
Atkinson, K. 2020. GNU Aspell 0.61 documentation.
Barrault, L., O. Bojar, M. R. Costa-jussà, C. Federmann, M. Fishel, Y. Graham, B. Haddow, M. Huck, P. Koehn, S. Malmasi, C. Monz, M. Müller, S. Pal, M. Post, and M. Zampieri. 2019. Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61, Florence, Italy. Association for Computational Linguistics.
Bello Gutiérrez, P. 2016. Aprendiendo a redactar mejor tus informes. Curso de Actualización Pediatría, pages 391–400.
Beloki, Z., X. Saralegi, K. Ceberio, and A. Corral. 2020. Grammatical error correction for basque through a seq2seq neural architecture and synthetic examples. Procesamiento del Lenguaje Natural, 65:13–20, September.
Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
Boletín Oficial del Estado. 2015. Real decreto 9/2015, de 6 de febrero, por el que se regula el registro de actividad de atención sanitaria especializada.
Bryant, C. 2019. Automatic annotation of error types for grammatical error correction. University of Cambridge.
Bryant, C., M. Felice, Ø. E. Andersen, and T. Briscoe. 2019. The BEA-2019 shared task on grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–75, Florence, Italy. Association for Computational Linguistics.
Chollampatt, S. and H. T. Ng. 2018. A multilayer convolutional encoder-decoder neural network for grammatical error correction. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, pages 5755–5762, New Orleans, Louisiana, USA. AAAI Press.
Dahlmeier, D., H. T. Ng, and S. M. Wu. 2013. Building a large annotated corpus of learner English: The NUS corpus of learner English. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pages 22–31, Atlanta, Georgia. Association for Computational Linguistics.
Davidson, S., A. Yamada, P. Fernandez Mira, A. Carando, C. H. Sanchez Gutierrez, and K. Sagae. 2020. Developing NLP tools with a new corpus of learner Spanish. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 7238–7243, Marseille, France. European Language Resources Association.
Felice, M. 2016. Artificial error generation for translation-based grammatical error correction. Number 895.
Felice, M., C. Bryant, and T. Briscoe. 2016. Automatic extraction of learner errors in ESL sentences using linguistically enhanced alignments. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 825–835, Osaka, Japan. The COLING 2016 Organizing Committee.
Gamon, M., J. Gao, C. Brockett, A. Klementiev, W. B. Dolan, D. Belenko, and L. Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. In Proceedings of the Third International Joint Conference on Natural Language Processing: VolumeI, pages 449–456. Asia Federation of Natural Language Processing.
Granger, S. 1998. The computer learner corpus: a versatile new source of data for SLA research. na.
Grundkiewicz, R., M. Junczys-Dowmunt, and K. Heafield. 2019. Neural grammatical error correction systems with unsupervised pre-training on synthetic data. In Salvador Lima-López, Naiara Perez, Montse Cuadros 130 Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 252–263, Florence, Italy. Association for Computational Linguistics.
Heafield, K. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 187– 197, Edinburgh, Scotland. Association for Computational Linguistics.
Intxaurrondo, A. 2018. SPACCC. Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
Lee, J. and S. Seneff. 2008. Correcting misuse of verb forms. In Proceedings of ACL08: HLT, pages 174–182, Columbus, Ohio, USA. Association for Computational Linguistics.
Lima López, S., N. Pérez, M. Cuadros, and G. Rigau. 2020. NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC2020), pages 5772–5781, Marseille, France. European Language Resources Association.
Macdonald, N. H. 1983. Human factors and behavioral science: The UNIX Writer’s Workbench software: Rationale and design. The Bell System Technical Journal, 62(6):1891–1908.
Ministerio de Sanidad. 2018. Recursos físicos, actividad y calidad de los servicios sanitarios.
Mizumoto, T., Y. Hayashibe, M. Komachi, M. Nagata, and Y. Matsumoto. 2012. The effect of learner corpus size in grammatical error correction of ESL writings. In Proceedings of COLING 2012: Posters, pages 863–872, Mumbai, India. The COLING 2012 Organizing Committee.
Náplava, J. and M. Straka. 2019. Grammatical error correction in low-resource scenarios. In Proceedings of the 2019 EMNLP Workshop W-NUT: The 5th Workshop on Noisy User-generated Text, pages 346– 356, Hong Kong, China. Association for Computational Linguistics.
Ng, H. T., S. M. Wu, T. Briscoe, C. Hadiwinoto, R. Susanto, and C. Bryant. 2014. The conll-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–14. Association for Computational Linguistics.
Ng, H. T., S. M. Wu, Y. Wu, C. Hadiwinoto, and J. Tetreault. 2013. The conll-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 1–12. Association for Computational Linguistics.
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan, July. Association for Computational Linguistics.
Omelianchuk, K., V. Atrasevych, A. Chernodub, and O. Skurzhanskyi. 2020. GECToR – Grammatical Error Correction: Tag, Not Rewrite. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163–170, Seattle, WA, USA. Association for Computational Linguistics.
Rei, M., M. Felice, Z. Yuan, and T. Briscoe. 2017. Artificial error generation with machine translation and syntactic patterns. CoRR, abs/1707.05236.
Richardson, S. D. and L. C. Braden-Harder. 1988. The experience of developing a large-scale natural language text processing system: Critique. In Second Conference on Applied Natural Language Processing, pages 195–202, Austin, Texas, USA. Association for Computational Linguistics.
Sennrich, R., B. Haddow, and A. Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany. Association for Computational Linguistics.
Shen, L., A. Sarkar, and F. J. Och. 2004. Discriminative reranking for machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 177–184, Boston, Massachusetts, USA. Association for Computational Linguistics.
Tajiri, T., M. Komachi, and Y. Matsumoto. 2012. Tense and aspect error correction for ESL learners using global context. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 198–202, Jeju Island, Korea. Association for Computational Linguistics.
Terroba Reinares, A. R. 2015. Mejora de la calidad del informe clínico de alta hospitalaria desde el punto de vista lingüístico. Universidad de La Rioja.
Tetreault, J., J. Foster, and M. Chodorow. 2010. Using parse features for preposition selection and error detection. In Proceedings of the ACL 2010 Conference Short Papers, pages 353–358, Uppsala, Sweden. Association for Computational Linguistics.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., pages 5998–6008.
Villegas, M., A. Intxaurrondo, A. Gonzalez Agirre, M. Marimon, and M. Krallinger. 2018. The MeSpEN resource for EnglishSpanish medical machine translation and terminologies: Census of parallel corpora, glossaries and term translations. In Proceedings of the LREC 2018 Workshop “MultilingualBIO: Multilingual Biomedical Text Processing”, pages 32–39. European Language Resources Association.
Xie, Z., A. Avati, N. Arivazhagan, D. Jurafsky, and A. Y. Ng. 2016. Neural language correction with character-based attention. CoRR, abs/1603.09727.
Yannakoudakis, H., Ø. E. Andersen, A. Geranpayeh, T. Briscoe, and D. Nicholls. 2018. Developing an automated writing placement system for esl learners. Applied Measurement in Education, 31(3):251–267.
Yannakoudakis, H., T. Briscoe, and B. Medlock. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 180–189, Portland, Oregon, USA. Association for Computational Linguistics.

Datuen iturria: Dialnet