Artificial intelligence as a tool for the genetic selection of livestock

The genetic selection of animals has been revolutionised over the past years by the advent of genomics, making it easier to select for specific essential phenotypes. Nevertheless, the task of understanding the links between observed genetic variations and phenotypic characteristics of interest remains complex. The OBAMA interdisciplinary project proposes to combine AI with genomics to improve our understanding of the influence of genetic factors on phenotypes in pigs.

Context and challenges

In recent years, a genuine revolution has occurred in the genetic selection of animals thanks to the introduction of genomics. It has allowed genomes to be sequenced, thereby enabling selection programmes to select for particular essential character traits – phenotypes.

Studies of pangenomic associations – in which multiple genetic variations in a large number of individuals are analysed in order to investigate their correlation with phenotypical traits – have made possible the identification of thousands of variants associated with complex agronomic characteristics.

However, most of these variants have been detected in non-coding genomic regions, preventing access to the underlying biological mechanisms involved. To improve our understanding of the role of these non-coding variants, one promising approach has been the prediction of molecular processes based on DNA sequencing with the help of machine deep learning. But classic supervised learning in AI requires very large data sets and DNA sequences to be associated with functional data to build the training models. A further problem lies in the strict limits imposed on the volume of available data by the finite nature of the human genome.

To overcome this obstacle, approaches involving the augmentation of data volumes through orthology have the potential to considerably enrich the training datasets, thereby improving the predictive capabilities of the models in question.

Goals

The OBAMA project proposes a new approach based on data augmentation, that has previously been developed for image analysis but has never been used to analyse DNA sequences. This approach has the advantage of allowing the use of classic supervised computer training, for which most models have been developed, while exploiting non-annotated data from numerous sequenced mammal genomes in far greater volumes than the annotated data provided (x 50 – x 100), making the model training far more robust.

Based on pig data, the project will work to achieve two goals:

Develop new approaches to deep learning that have greater precision and go beyond classic supervised models (limited to human data) by processing large quantities of data derived from mammal genome sequencing and by augmenting the data through orthology.
Use experimentation to validate the prediction of the phenotypical effects of variants obtained by these models on a trait of interest.

This project will allow the identification and validation of the causal variant (or variants) implicated in a quantitative phenotype of interest in pigs.

On completion, the project will have allowed validation of a new strategy for the identification of causal variants for complex characteristics in pigs, and possibly in other farm animals.

Contact - Coordination :

Raphael Mourad (MIA-T)
Brouard Céline (MIA-T)
Julie Demars (GenPhyse)

Project participants

INRAE structures

Division	Units	Expertises
MathNum	MIA-T	Deep learning in genomics, Deep learning in transcriptomics
GA	GenPhySE	Genetics and genomics

Partenaires extérieurs

Institute	Expertises
CNRS (LISN)	Deep learning in genetics
CNRS (LCQB)	Deep learning in genomics

Publications

Journal article

Han Phan, Céline Brouard, Raphaël Mourad, Semi-supervised learning with pseudo-labeling compares favorably with large language models for regulatory sequence prediction, Briefings in Bioinformatics, Volume 25, Issue 6, November 2024, bbae560, https://doi.org/10.1093/bib/bbae560

Modification date: 18 March 2025 | Publication date: 10 June 2024 | By: Marjorie Domergue

Name of the cookie	Purpose	Shelf life
CAS and PHP session cookies	Login credentials, session security	Session
Tarteaucitron	Saving your cookie consent choices	12 months

Name of the cookie	Purpose	Shelf life
atid	Trace the visitor's route in order to establish visit statistics.	13 months
atuserid	Store the anonymous ID of the visitor who starts the first time he visits the site	13 months
atidvisitor	Identify the numbers (unique identifiers of a site) seen by the visitor and store the visitor's identifiers.	13 months