Distance Metric Learning for Explainability in Complex Problems

Suárez Díaz, Juan Luis

Distance Metric Learning for Explainability in Complex Problems

Suárez Díaz, Juan Luis

Dirigée par:

Salvador García López Co-directeur/trice
Francisco Herrera Triguero Co-directeur/trice

Université de défendre: Universidad de Granada

Fecha de defensa: 09 janvier 2024

Jury:

Pedro González Garcia President
Alberto Fernández Hilario Secrétaire
María José del Jesús Díaz Rapporteur
Javier del Ser Lorente Rapporteur
Daniel Peralta Cámara Rapporteur

Type: Thèses

Teseo: 826292 DIALNET DIGIBUG editor

Résumé

The major technological advances of recent years have led to the generation of large amounts of data from which, if properly processed and analyzed, relevant information can be extracted for many fields such as science, business or communication. Machine learning, which is part of the process known as knowledge discovery in databases, is emerging as a discipline focused on the study of techniques that allow machines to learn from data, in the sense of recognizing patterns or drawing inferences from previously unseen data. Within machine learning, there is a set of algorithms called similarity-based learning algorithms, which are inspired by one of the most powerful mechanisms of human learning: recognizing objects based on their similarity to other previously seen objects. These algorithms require a distance or similarity measure between the data, which allows us to assess the similarity of any two samples. This distance or similarity measure is crucial for the correct functioning of these algorithms, since the quality of the results obtained depends on it. In this thesis, the distance metric learning problem is addressed. This problem consists in learning the distance or similarity measures from the data itself, so that they can be successfully used later in similarity-based learning algorithms. Specifically, this project tackles the study and development of distance metric learning algorithms and their application in novel or uncommon problems of machine learning, beyond the classic classification or regression problems. This thesis addresses the following objectives: 1. The study of distance metric learning and its algorithms from both a theoretical and an experimental point of view. To this end, the development of a software library with the highest-performing algorithms in the field is proposed, as well as a tutorial that includes a theoretical review, an experimental study and an analysis of the results of the algorithms. 2. The development of new distance metric learning algorithms for unconventional or singular problems, i.e., machine learning problems beyond the classic standards of classification or regression. To achieve this goal, three distance metric learning algorithms have been developed and proposed for the first time in this thesis, which address three different singular problems: imbalanced, ordinal and monotonic classification. 3. The development of deep metric learning models to tackle complex problems. In recent years, deep learning has revolutionized the field of machine learning, and this revolution has also reached distance metric learning. Deep metric learning proposes new models for learning distances that open up a new range of possibilities in this field. In addition, these methods have proven to be effective in some of the most challenging problems in deep learning, such as those where little data is available. In regard, this thesis includes a proposal for a deep metric learning model applied to a natural language processing problem with data scarcity. 4. The analysis of the explainability of the developed models, based on the explainable characteristics of the similarity-based learning algorithms, and on how learning a distance influences these characteristics. 5. The specialization of the developed proposals for their application in real problems. This is addressed together with the third objective in the natural language processing problem treated. This thesis project successfully addresses the above objectives and thus leaves notable contributions in the field of distance metric learning. The software library and tutorial developed here provide a solid basis for understanding the state-of-the-art in traditional distance metric learning, and a practical starting point for those who want to enter the discipline. The algorithms proposed in the second objective provide new perspectives to address less common problems in machine learning; the deep metric learning models developed in the third objective show the potential of combining deep learning and distance metric learning, and are also applied in a real case study, thus addressing the fifth objective. Finally, the explainability study proposed in the fourth objective analyzes, for one of the developed algorithms, how distance metric learning can influence the explainability of the resulting model.