Métaprogramme DIGIT-BIO. Crédit photo : @REZOOmarketing
Theses

Theses

DIGIT-BIO provides support for interdisciplinary PhDs theses that meet its scientific objectives.

Co-funded doctoral contracts

Metaprogramme DIGIT-BIO co-funds two doctoral contracts each year, providing 50% of thesis costs. Thesis topics supported by the metaprogramme must be developed within an interdisciplinary context; these theses are typically supervised jointly by researchers from two different disciplines, and address interface questions.

  • Further information on applying to the metaprogramme for co-funding: INRAE Intranet (restricted access)
  • ​​​​​​​

Accredited thesis topics

The metaprogramme is also able to offer accreditation for thesis topics that fall within its research themes. Accreditation allows doctoral students and their supervisors to become part of the DIGIT- BIO scientific community (participating in seminars and events organised by the metaprogramme) and accredited students may apply for occasional grants to support certain activities.

 

Phd in progress

2023

  • Title : Large-scale integration of matched multi-omics data
  • Doctoral Student : Jeong Hwan Ko (jeong-hwan.ko@inrae.fr)
  • Supervisors : Nathalie Vialaneix (INRAE, MIA-T), Andrea Rau (INRAE, GABI)

This thesis project aims to develop statistical analysis methods to explore inter-species genetic diversity. More specifically, it is part of the Agroecology and Digital PEPR (AgroDiv) which aims for a fine characterization of genetic variability on the basis of an extensive collection of botanical and animal species which could be of interest for the agriculture, particularly in a context of strong environmental change and climate change. In this context, the doctoral student will be in charge of developing omics data integration methods to explore the relationships between genetic marks of interest and transcriptomic and epigenetic signals. These methods will be the basis for the definition of relationships specific to a given species or, on the contrary, conserved between species and will allow a better characterization of the variability of regulatory phenomena in a given race.

  • Title : DeepSelectGene: Deep Learning for Genotype Data and its Application in Genomic Selection
  • Doctoral Student : Sihan Xie (sihan.xie@inrae.fr)
  • Supervisors : Eric BARREY, (UMR GABI, INRAE), Blaise HANCZAR (IBISC, Université d’Evry Val d’Essonne), Julien Chiquet (MIA Paris-Saclay, INRAE)

Deep learning (DL) methods are beginning to be used as predictive models for phenotypes based on genotype data in the context of human diseases and production traits for genomic selection in domestic animals. These models necessitate training with numerous data sets (genotype > phenotype pairs), which may not always be feasible for certain species genotyped with only a few thousand animals.
The project of this thesis will consist of addressing this limitation by successively using two type of DL methods: a first generative DL method, for example, the "generative adversarial neural networks" (GANs), will allow the simulation of genotype data from training on a small but qualitatively representative real data set. Thus, we will artificially increase the size of the database necessary for the proper training of a second predictive DL model for phenotype prediction from a genotype (50-800K SNP). This second DL model, adopting a simpler structure for predictions, will need to be optimized based on our initial exploratory studies on the subject during the GenIALearn project on bovine genomic selection (MP DIGIT-BIO 2022-2024).
In summary, this thesis work will propose a novel method for simulating genotype data that is beneficial for i) enhancing understanding of the genetic determinism involved in phenotype formation; ii) generating additional quasi-real data essential for training the DL prediction model. Therefore, this DL model for phenotype prediction can be applied with just a few thousand genotype-phenotype data sets and will subsequently be improved as the database is progressively enriched. This topic represents a highly interdisciplinary field in genomic modeling, encompassing genetics, genomics, statistics, and data science, and is at the forefront of AI applications in genomics.

  • Title : Characterization and algorithmic modeling of root nitrogen in a heterogeneous nitrate environment.
  • Doctoral Student : Cannelle Armengaud (cannelle.armengaud@inrae.fr)
  • Supervisors : Sandrine Ruffel (INRAE, IPSiM)

The root system, the anchoring organ of plants, performs the function of acquiring water and nutrients. Since plants often encounter heterogeneous soils, their balance between resource use and soil exploration must be optimized, especially since the latter is a necessary but costly process. The aim of this project is to better understand and model the behavior of the root system when faced with a choice between environments that differ in their nutrient availability. By optimizing the split-root experimental system, the dynamic measurement of root system architecture in response to nitrate heterogeneity will be algorithmically modeled within the mathematical framework of Bandit-Manchot in the model species Arabidopsis. In addition, the exploitation of an environment also depends on transport activity, whose response dynamics with respect to development and growth will be characterized within a range of genotypes of interest (e.g., between ecotypes whose root systems appear very different under heterogeneous conditions). This component will then be integrated into the model.
This interdisciplinary project responds to two challenges: i) to identify relevant targets that decouple the functions of development/growth and transport to improve nitrate acquisition in plants, and ii) to establish bio-inspired decision algorithms that could potentially provide optimization solutions in very diverse fields such as robotics or computational engineering.

  • Title : Coupling genome based and energy based approaches of syntrophic microbial interactions for the modelling of high rate anaerobic digestion
  • Doctoral Student : Sahak Yeghiazaryan (sahak.yeghiazaryan@inrae.fr)
  • Supervisors : Nicolas Bernet (INRAE, LBE), Théodore Bouchez (INRAE, PROSE)

Methanization is one of the few mature technologies available for producing energy in the form of biogas from organic waste or sludge. The process involves a complex cascade of biochemical reactions catalyzed by a variety of microorganisms described in a widely used reference model (ADM1: Anaerobic Digestion Model n°1). Today, the development of anaerobic digestion requires processes operating at higher loading rates and adapting to a wider range of organic matter (codigestion). Unfortunately, the ADM1 model is not capable of correctly predicting performance under these conditions, which severely limits current engineering capabilities. Indeed, the syntrophic oxidation of organic acids, which is the kinetically limiting step, is not currently included in the model. In this thesis project, we proposeto develop thermodynamically constrained metabolic models of syntrophic interactions, and then to reduce these models for inclusion in ADM1. Back and forth between modelling and experiments on synthetic communities will enable the models to be validated step by step. We believe that this project will enable the development of more robust and predictive mechanization models, in line with current engineering needs

2022

  • Title : Copula-based network inference for multi-omics data
  • Doctoral Student : Ekaterina Tomilina (ekaterina.tomilina@inrae.fr)
  • Supervisors : Gildas MAZO (INRAE, MaIAGE, Florence JAFFREZIC, Andrea RAU (INRAE, GABI)

To better understand the relationships between different objects that comprise a biological network (genes, proteins, etc), biologists observe variables of various types: categorical, ordinal, continuous. The 'discovery'' of the inter-dependencies in those heterogeneous data (also known as "multi-omics'' data in biology) is a genuine challenge both in biology and statistics. The goal of this PhD thesis is to build a statistical model to infer those inter-dependencies by modeling the heterogeneity in the data with copulas, which are functions that can couple variables of varying types. The estimation method will be examined both theoretically and numerically, and will be applied to a multi-omics dataset produced by INRAE.

  • Title : Multiscale mathematical modeling of oogenesis in fish
  • Doctoral Student : Louis Fostier (louis.fostier@inrae.fr)
  • Supervisors : Romain Yvinec (INRAE, PRC) / Frédérique Clément (INRIA, team MUSCA), associate supervisor : Violette Thermes (INRAE, PRC)

This PhD thesis is dedicated to the understanding and quantification of the ovarian dynamics of fish models (e.g. medaka, zebrafish). Given its cyclical nature, the ovary present a deep cellular remodelling of which the regulation is essential for gametes maturation and reproductive success. The lack of observable in real time at the level of the cell population distribution (enumeration, cell type, etc.) is a recurrent difficulty for the comprehension of the cellular dynamics in ovaries. The aim of the thesis is to construct and calibrate a mathematical model integrating all the gametes maturation dynamics (and its regulations) throughout the lifetime of the fish, by using notably the new datas obtained through recent advances of 3-D imaging. We will pay particular attention to demonstrate how this model will improve the predictions of the effect of environmental perturbations on the reproductive function, over the fish lifetime. This thesis will lead to new research results in the fish reproduction field, and will develop generic approaches for structured population modelling.

  • Title : Integrative analysis of multiple study data for the identification of common metabolic syndrome phenotypes
  • Doctoral Student : Elfried Salanon (elfried.salanon@inrae.fr)
  • Supervisors : Julien BOCCARD (University of Geneva), Estelle PUJOS-GUILLOT (INRAE, UNH)

 

2021

  • Title : Development of meta-analysis methods for the analysis of GxE interactions in association analysis with applications to plant genetics
  • Doctoral Student : Annaïg de Walsche (annaig.de-walsche@inrae.fr)
  • Supervisors : Tristan Mary-huard (INRAE, GQE Le Moulon, MIA Paris Saclay), Alain Charcosset (INRAE GQE Le Moulon)

The purpose of this thesis is to develop statistical methods for the detection of QTLs from panels characterised in multi-environment experiments. Meta-analysis methods, already widely used in human genetics, will be considered as the starting methodological basis. However, the existing MA methods do not take into account the specificities of the envisaged application. The doctoral student will therefore have to adapt the methods to - Take into account the dependencies between measurements performed on related panels, - Take into account the heterogeneity of the inter-environmental effects to be detected, - Develop innovative strategies to detect QTLs whose effect is confined to a limited number of - Develop innovative strategies to detect QTLs whose effect is confined to a limited number of environments.s to solve problems similar to those addressed by artificial networks? To answer these questions, it is important to determine whether there are information processing motifs in natural biological networks, and in particular, with microorganisms that can be engineered. It is also necessary to check whether natural biological networks like metabolic networks can be used as a machine learning architecture for training. The purpose of this thesis will be to answer these questions using computational methods to search for learning motifs and to train machine learning and synthetic biology methods to build biological systems (cellular or cell-free) capable of handling complex learning problems.

  • Title : Functional-structural modelling of plant-to-plant interactions and C, Nbudget in legume-based associations.
  • Doctoral Student : Solen Farra-yanguinindje (solen.farra-yanguinindje@inrae.fr)
  • Supervisor : Alexandra Jullien (INRAE, UMR ECOSYS)

The combination of species including a legume is a lever of agroecology to reduce the dependence of crops on synthetic inputs, particularly nitrogen. Understanding the phenomena of competition and/or complementarity for the acquisition of resources between the two species is a key point in designing and improving this type of cover. Indeed, competition or complementarity for the interception of light (carbon) depends on the respective aerial architecture of the associated species and conditions the allocation of biomass to the roots and therefore their development. Symmetrically, the root architecture and the ifferential exploration of the soil by the two species, determine the absorption of nitrogen and its allocation to the aerial parts to ensure its growth. The objective of the thesis is to model these interactions and their variability within pure or leguminous (faba bean) rapeseed canopies in order to evaluate the added value of the association in terms of carbon and nitrogen acquisition by simulation. The objective is to achieve a model of an individual rapeseed plant that realistically responds to a competition, considered here as a forcing, and that is intended to be integrated into a model of association between rapeseed and individualcentric legumes.

Defended theses

2020

  • Title : Development of hybrid models for genome-scale metabolic networks
  • Doctoral Student : Leon Faure (leon.faure@inrae.fr)
  • Supervisors : Jean Loup Faulon (INRAE, MICALIS), Wolfram Liebermeister (INRAE, MaiAge)

Over the past two decades, the systems biology community has dedicated substantial efforts to constructing genome-scale metabolic models (GEMs), which offer detailed representations of an organism’s entire metabolism. GEMs present metabolism as a network, linking metabolic reactions and metabolites. Despite their wealth of information, GEMs come with notable limitations. They attempt to encompass all potential metabolic phenotypes, leading to an extensive solution space that can be challenging to explore efficiently. The predominant approach for exploiting GEMs, Flux Balance Analysis (FBA), relies on simplifications and lacks the ability to generalize across diverse conditions. In contrast, Machine Learning (ML) techniques, have gained interest for metabolic modeling, notably by harnessing largescale-omics data to predict biological behaviors in various environments. While many approaches combine GEMs and ML together, they still separate the metabolic modeling and ML parts, limiting their adaptability and reusability. Within this Ph.D. thesis, I introduce an innovative approach that tackles this limitation: a hybrid neural-mechanistic model for GEMs, termed Artificial Metabolic Network (AMN). This entails the development of FBA surrogate methods compatible with gradient backpropagation and the creation of a mechanistic loss function to align AMN predictions with GEMs’ constraints. This dissertation delves into the biological phenomena addressed by AMNs and surveys the state-of-the-art GEM utilization methods. Then, it demonstrates how AMNs outperform FBA in predicting E. coli growth rates across diverse media and genetic conditions, without requiring additional experimental data. The capabilities and limitations of AMNs are then thoroughly examined. Finally, I summarize the findings and offer insights into ways to pursue the development of hybrid models for GEMs, that may help in building high-performance, insightful whole-cell models—an ambitious goal in the realm of systems biology. 

Keywords : Metabolic modeling, Machine learning, Hybrid models

  • Title : Integration and analysis of heterogeneous biological data through multilayer graph exploitation to gain deeper insights into feed efficiency variations in growing pig
  • Doctoral Student : Camille Juigne (camille.juigne@inrae.fr)
  • Supervisors : Florence Gondret (INRAE, UMR PEGASE), Emmanuelle Becker (Univ. Rennes)

Recent technological advancements in biological data acquisition have resulted in an explosion of multimodal and multicentric data. This phenomenon raises numerous questions regarding the storage, standardization, and analysis of these massive datasets. This thesis focuses on the development of an integrative method for analyzing biological data to extract knowledge from them. To account for their strong interdependencies, this approach involves integrating different types of biological entities (mRNA, proteins, metabolites, observable traits) that are typically studied independently. The devised computational solution enables the integration of these heterogeneous data into a multilayer graph, with each layer representing a specific type of entity. The novelty lies in linking elements within a layer or across different layers by utilizing properties extracted from public knowledge databases through Semantic Web technologies. Based on this graph, the objective is to characterize the relationships among a group of molecules of interest using graph theory metrics. The method was applied to experimental datasets (transcriptomics, metabolomics and animal phenotypes) to describe and understand the relationships between specific molecules and determine their importance in feed efficiency variations in growing pigs. Feed efficiency is a key phenotype for sustainable farming, but is recognized as complex. This work provides innovative analysis methods to analyze and integrate various levels of biological organization, facilitating a better understanding of biological processes.

Keywords :  Data integration, Feed efficiency, Multilayer graph, Multi-omics, Web Semantic