Thesis by Alexandre Asset (2024-2026)

Machine learning and high-throughput epigenotyping: a new lever to improve phenotype predictions in cattle

Thesis by Alexandre Asset (BREED /MIA-PS, 2024 - 2026). Building on the work of EPINUM, this thesis proposes to investigate the most appropriate AI approaches that integrate epigenetic data into phenotype prediction models.

  • Accredited thesis
  • Starting date : 01/10/2024
  • Research laboratory :  BREED (Biology of Reproduction, Environment, Epigenetics and Development) & MIA Paris Saclay (Applied Mathematics and Computer Science)
  • Thesis director : Hélène Kiefer (épigénétique), David Makowski (modélisation)
  • Metaprogramme axis : axis 2 (Predicting phenotypes and their responses to changes in stress fields)

Summary

Cattle farming is affected by climate change and must adapt to the development of agroecological practices. To face these challenges, new tools for precise, rapid, and minimally invasive phenotyping need to be developed to monitor the suitability of the animal/environment pair. Epigenetic modifications are molecular elements that contribute to phenotypic variability in individuals throughout their lives from the peri-conceptional period onwards. Studying these modifications allows us to understand the effects of the environment on genome function. Epigenetic monitoring of animals could thus help establish recommendations for practices that support the agroecological transition while optimizing the profitability and sustainability of farms. This is the objective of two work packages in the H2020 RUMIGEN program (2021-2026, coordinated by E. Pailhoux, BREED), which focuses particularly on the impact of climate change on ruminant farming.

As part of WP6 of RUMIGEN, which aims to exploit epigenetic biomarkers for precision breeding and farming, we have developed a first-generation "medium-density" epigenotyping chip. This tool will enable standardized and cost-effective measurement of DNA methylation in cattle at 50,000 markers carefully selected for their scientific relevance and diagnostic potential for the animal/environment pair (markers related to thermal and metabolic stress, fertility, immune response, and somatic cell count in milk). It offers strong innovation potential for breeding and farming advisory services and a competitive advantage for European livestock farming. We anticipate a gradual increase in its use by the industry, following the model of genotyping chips. It is crucial to support this deployment from the earliest stages by establishing the necessary digital tools.

The thesis aims to evaluate the potential of machine learning approaches to improve phenotype prediction in cattle. The methodological challenges will be to select the machine learning methods best suited to the generated data, build predictive models that integrate genetic and epigenetic information, and assess the quality and robustness of the predictions compared to quantitative genetics models, using one of the largest cohorts ever used to generate epigenetic data in animals. The thesis also relies on a new collaboration combining expertise in epigenetics (BREED, ELIANCE), modeling and machine learning (MIA-PS), and access to genetic resources (GABI), initiated within the framework of the INRAE DIGIT-BIO EPINUM metaprogram (2024-2025).

Contact :