Variable selection for data aggregated from different sources with group of variable structure

Broc, Camilo Lucien

Variable selection for data aggregated from different sources with group of variable structure

Broc, Camilo Lucien

Supervised by:

Borja Calvo Molinos Director

Defence university: Universidad del País Vasco - Euskal Herriko Unibertsitatea

Fecha de defensa: 14 November 2019

Committee:

Christophe Ambroise Chair
Stéphane Robin Secretary
Borja Calvo Molinos Committee member
Astrid Jourdan Committee member
Hélène Jacquemin Gadda Committee member

Department:

Ciencia de la Computación e Inteligencia Artificial

Type: Thesis

Teseo: 154306 DIALNET ADDI editor

Abstract

During the last decades, the amount of available genetic data on populations has grown drastically. From one side, a refinement of chemical technologies have made possible the extraction of the human genome of individuals at an accessible cost. From the other side, consortia of institutions and laboratories around the world have permitted the collection of data on a variety of individuals and population. This amount of data raised hope on our ability to understand the deepest mechanisms involved in the functioning of our cells. Notably, genetic epidemiology is a field that studies the relation between the genetic features and the onset of a disease. Specific statistical methods have been necessary for those analyses, especially due to the dimensions of available data: in genetics, information is contained in a high number of variables compared to the number of observations. In this dissertation, two contributions are presented. The first project called PIGE (Pathway Interaction Gene Environment) deals with gene-environment interaction assessments. The second one aims at developing variable selection methods for data which has group structures in both the variables and the observations.