Implementación y optimización de algoritmos para aprendizaje automático con teoría de perturbaciones

Ortega Tenezaca, Delfín Bernabé

Implementación y optimización de algoritmos para aprendizaje automático con teoría de perturbaciones

Ortega Tenezaca, Delfín Bernabé

Supervised by:

Cristian-Robert Munteanu Director
Aliuska Duardo Sánchez Director

Defence university: Universidade da Coruña

Fecha de defensa: 31 March 2023

Committee:

Enrique Onieva Caracuel Chair
A. Pazos Secretary
Miren Josune Pérez Estrada Committee member

Type: Thesis

Teseo: 801195 DIALNET RUC editor

Abstract

Currently, a huge amount of data related to complex systems of a very varied nature has been accumulated: biomolecular, economic, social, etc. These systems are of great relevance in different areas such as biomolecular sciences, biomedical engineering, and social and legal sciences. The techniques of Artificial Intelligence (AI) and/or Machine Learning (ML) can be useful to predict properties of interest in these systems. For this, at least two main steps are needed. The first refers to collecting similar information from many cases of known systems to be able to train AI/ML models. The second indispensable step is related to the numerical quantification of structural information, the conditions external to the system, and the properties of the same to be predicted. In this second step, the numeric input and output variables are defined to train the AI/ML algorithms. Unfortunately, complex systems are generally made up of various sub-systems, and information about the system as a whole or its parts cannot be found in the same source. However, it is common to find information on each of the sub-systems and their properties in various scattered sources. To solve this problem, the algorithm NIFPTML = NI + IF+ PT + ML has been developed. These algorithms involve the following stages. In the NI stage (Network Invariant) complex networks are used to represent different systems and/or their subsystems and the invariants of these networks are calculated to quantify their structure. In the following stage, it is necessary to use Information Fusion (IF) techniques from various sources to obtain an enriched set of data. Later, the operators of the Perturbation Theory (PT) process the information by quantifying the perturbations/deviations in the structural variables with respect to the expected values for different subsets of categorical variables. Finally, in Machine Learning (ML), different AI/ML algorithms are trained, allowing predictive models to be found. The NIFPTML algorithms have been applied there and the results published in the literature. Unfortunately, there is no user-friendly software application for regular users of these algorithms. Therefore, the developers of NIFPTML algorithms need to use several different tools for each of the stages. On the other hand, there is a lack of knowledge of the legal implications of the development of computational algorithms such as the NIFPTML in scientific research in these areas. In this thesis we propose to develop (program) a beta version of a software, SOFT.PTML, in which NIFPTML algorithms are implemented for the first time in the same application. In addition, the usefulness of this program applied to different practical problems in the aforementioned areas will be demonstrated, such as: the design of drugs, the discovery of nanomaterials, the study of legal systems. Lastly, an analysis of the legal implications of the development and application of this type of algorithm in research will be provided.