Cross Methodological Insights for Multi-source Data Integration

In biology, as in other scientific fields, the integration of multi-source data is more relevant than ever. Indeed, the data collected are increasingly complex and their volume is growing, due to the development of analytical platforms, imaging techniques, the rise of omics data, etc

Background and challenges

This context has stimulated the search for new methods allowing the joint analysis of several data sets (structured data, multi-block, multi-channel) in many fields, such as:

Machine Learning, where several approaches are considered for the processing of multi-source data (matrix factorisation, probabilistic approach).
Chemometrics, where different methods are proposed to establish a chemical mapping of samples using several analytical techniques (generalisations of canonical analysis, NIPALS algorithm and tensor decompositions)
Bioinformatics, where integrative methodological approaches allow the most complete picture possible of the dynamics of molecular systems to be drawn.

In order to contribute to meeting the challenge of analysing and exploiting these multi-source data from an exploratory, but also predictive perspective, it is essential to bring together different viewpoints, practices and paradigms in order to reconcile these different approaches. It is also necessary to encourage collaboration between "method generators" and "data generators" in the various application fields.

This is the challenge that the MIMS consortium proposes to take up, by bringing together an interdisciplinary community working on approaches to the analysis and integration of multi-source data.

Goals

Cross-Methodological-Insights-for-Multi-source-Data-Integration-MIMS.jpg — Legend

MIMS is a multidisciplinary consortium gathering more than 60 researchers, whose objective is to examine the analysis and exploitation of multi-source data, both in an exploratory and predictive perspective.

This consortium brings together multidisciplinary skills: information processing, biological sciences and analytics. The implementation of this multi-disciplinarity and its management will be based on the sharing of data, practices and methods between the partners, with the aim of formalising a scientific project to meet a common challenge: the optimal analysis of multi-source data for exploratory and predictive purposes.

Contacts :

Mohamed Hanafi , StatSC
Jean-Michel Roger, UMR ITAP

Units involved and partners

INRAE participants

Food, bioproducts and waste division	Expertise
USC StatSC	Sensometry/Chemometrics/Statistics/Multispectral imaging
UR BIA	Chemometrics/computer science
UR QuaPA	Volatolomics/MRI Chemometrics/Data Analysis/Image Analysis/System & Data Management
UMR SPO	Chemometrics
LBE	biostatistics, machine learning
Mathematics and digital technologies division
UMR TAP	Chemometrics
UMR MAIAGE	mathematical statistics/applied statistics/bioinformatics
Nutrition, Chemical Food Safety and Consumer Behaviour division
CSGA Centre des Sciences du Goût et de l'Alimentation	Chemometrics
UNH Unité Nutrition Humaine	Bioinformatics, metabolomics, chemometrics
PhAN	Perinatal nutrition and metabolic diseases, Bioinformatics, Data analysis, metagenomics and metabolomics
LABERCA	Metabolomics, Chemometrics, Expology, Epidemiology
Microbiology and the food chain division
Micalis	Biologist/Microbiota/Data Analysis
Prose
Ecology and Biodiversity division
BioForA	Quantitative Genetics/Modelling
LBLGC	Physiology
Plant Biology and Breeding division
AGAP Institut	Quantitative genetics, Genomics, Biochemistry, Evolutionary genetics, Selection, Ecophysiologist, Biostatistics, Bioinformatics
Animal Physiology and Livestock system division
UMR SELMET	Biometrics, Chemometrics, Machine Learning, Agronomy

Partners

Faculté des Sciences, Paris	Expertise
Centre Boreli	Unsupervised learning, Statistics, Graph networks, Bioinformatics
INRIA
Equipe projet LORIA	Knowledge Discovery/Life Sciences
Université de Genève
Sciences Analytiques	Metabolomics, Chemometrics
Université de Toulouse
Institut de mathématique de Toulouse	Statistics, Multi-omics data analysis and integration
ANSES
Laboratoire de Ploufragan-Plouzané	Statistics, multi-block methods Epidemiology
CNAM
EPN6 - Mathématiques et Statistique	Analysis of complex heterogeneous data, Clusterwise methods, High dimensional classification
Université de Paris-Saclay
Signaux et Statistique	AMulti-block data analysis, tensor analysis (high dimensional), Structural equation models
Université de Montpellier
Institut Montpellierain Alexander Grothendieck	supervised component models / classification
ADLIN (partenaire privé)
ADLIN	Finance, Strategy, Multi-omics, Bioinformatics, Transcriptomics, Visualisation
Institut du vin et de la vigne
IFV	Chemometrics/Analytical Chemistry

Name of the cookie	Purpose	Shelf life
CAS and PHP session cookies	Login credentials, session security	Session
Tarteaucitron	Saving your cookie consent choices	12 months

Name of the cookie	Purpose	Shelf life
atid	Trace the visitor's route in order to establish visit statistics.	13 months
atuserid	Store the anonymous ID of the visitor who starts the first time he visits the site	13 months
atidvisitor	Identify the numbers (unique identifiers of a site) seen by the visitor and store the visitor's identifiers.	13 months