OBAMA © Pexels Sarai Zuno
Exploratory project OBAMA (2024-2025)

Artificial intelligence as a tool for the genetic selection of livestock

The genetic selection of animals has been revolutionised over the past years by the advent of genomics, making it easier to select for specific essential phenotypes. Nevertheless, the task of understanding the links between observed genetic variations and phenotypic characteristics of interest remains complex. The OBAMA interdisciplinary project proposes to combine AI with genomics to improve our understanding of the influence of genetic factors on phenotypes in pigs.

Context and challenges

In recent years, a genuine revolution has occurred in the genetic selection of animals thanks to the introduction of genomics. It has allowed genomes to be sequenced, thereby enabling selection programmes to select for particular essential character traits – phenotypes.

Studies of pangenomic associations – in which multiple genetic variations in a large number of individuals are analysed in order to investigate their correlation with phenotypical traits – have made possible the identification of thousands of variants associated with complex agronomic characteristics.

However, most of these variants have been detected in non-coding genomic regions, preventing access to the underlying biological mechanisms involved. To improve our understanding of the role of these non-coding variants, one promising approach has been the prediction of molecular processes based on DNA sequencing with the help of machine deep learning. But classic supervised learning in AI requires very large data sets and DNA sequences to be associated with functional data to build the training models. A further problem lies in the strict limits imposed on the volume of available data by the finite nature of the human genome. 

To overcome this obstacle, approaches involving the augmentation of data volumes through orthology have the potential to considerably enrich the training datasets, thereby improving the predictive capabilities of the models in question. 

Goals

The OBAMA project proposes a new approach based on data augmentation, that has previously been developed for image analysis but has never been used to analyse DNA sequences. This approach has the advantage of allowing the use of classic supervised computer training, for which most models have been developed, while exploiting non-annotated data from numerous sequenced mammal genomes in far greater volumes than the annotated data provided (x 50 – x 100), making the model training far more robust.

Based on pig data, the project will work to achieve two goals:

  • Develop new approaches to deep learning that have greater precision and go beyond classic supervised models (limited to human data) by processing large quantities of data derived from mammal genome sequencing and by augmenting the data through orthology.
  • Use experimentation to validate the prediction of the phenotypical effects of variants obtained by these models on a trait of interest.

This project will allow the identification and validation of the causal variant (or variants) implicated in a quantitative phenotype of interest in pigs.

On completion, the project will have allowed validation of a new strategy for the identification of causal variants for complex characteristics in pigs, and possibly in other farm animals.

Contact - Coordination :

Project participants

INRAE structures

DivisionUnitsExpertises
MathNumMIA-TDeep learning in genomics, Deep learning in transcriptomics
GAGenPhySEGenetics and genomics

Partenaires extérieurs

InstituteExpertises
CNRS (LISN)Deep learning in genetics
CNRS (LCQB)Deep learning in genomics