Data harmonization, creation and use of a database for relevant data from participant partners

Leaders: Geert Opsomer (Universiteit Gent) and Nicolas Gengler (University of Liège)

Task 2.1. Developing trait ontology of key biological traits relating to production efficiency, product quality, health, metabolic status, fertility, environmental footprint and animal welfare status in dairy cows
A detailed ontology of phenotypes of interest will be developed. On the basis of existing literature, results from previous projects and data among the participants, milk parameters describing product quality, physiological status on welfare/production diseases, fertility and environmental footprint (methane emission and nitrogen loss) will be determined. Potential milk biomarkers will be D-lactate, free glucose and isocitrate, beta-hydroxy-butyrate, lactate dehydrogenase, progesterone and uric acid. Traditional phenotypic data on efficiency, health, metabolic status, fertility, environmental footprint and animal welfare state will also be collated and entered for cows in WP3 to enable data analyses.

Task 2.2. Harmonization and organisation of collection of mid-infrared (MIR)-spectral data
MIR data can identify easily measured predictors in milk. The creation of spectral databases allows the preservation of the historical composition of milk samples and re-analysis of all recorded data. If a new equation is later created (WP3), it is then possible to re-analyse all recorded data (both current and historical). For all production efficiency, health, metabolic status, fertility, environmental footprint and animal welfare traits, associated MIR spectral data will be acquired and stored for cows in WP3 and WP4. MIR-spectral data obtained from commercially available MIR spectrometers will be standardized via spectral corrections adapted to each instrument and its collection organized. Access to these data will be organized by sub-contracting MIR-spectral data acquisition to milk recording organisations (MRO), partially through collaboration with the OptiMIR project.

Task 2.3. Glycan profiling in milk as a phenotype predictor
Glycan profiles in physiological fluids are altered by physiological state and phenotype and offer another potential milk based biomarker to predict phenotypes of interest. UCD (NIBRT) has developed the necessary automated platforms to carry out high throughput glycan profiling of large numbers of milk samples. Glycan profiles will be stored for all cows in WP3 and WP4 at a number of key stages during the lactation cycle in the central GplusE database.

Task 2.4. Development of draft formats and protocols to enable data transfer from each partner and rapid loading into the project pooled database
Data exchange from partners involved with Tasks 2.1, 2.2 and 2.3 will allow generation of a central GplusE database. Such data will require standardised precise formats and protocols to ensure reliable transfer and rapid loading.

Task 2.5. Definition and creation of the database for relevant data from participant partners
The database for relevant data will be defined and physically created and data transfers facilitated. Initially it will be established using phenotype data from existing cow studies from participants (n> 400 cows). New data will be included as it becomes available from: (i) Data from research farm studies in WP3 (n~200 cows), (ii) Records from commercial farms in WP4 (n~10,000 cows; less detailed data sets).

Task 2.6. Development and use of data integrity checks throughout the project to consolidate incoming data
Integrity checks for incoming data will be developed and implemented in collaboration with data providers. The incoming data will be consolidated into a harmonized high quality data set to maximize added value.

Task 2.7. Genomic database
The IGenoP database developed and operated by ICBF will be used to store all genotypes obtained and used in the project. Each genotyped animal will be identified using the Interbull International Identification standard. This database stores and shares genotypes from multiple sources, is continually evolving and currently supports all Illumina SNP genotypes (LD, 50K and HD). It will be extended to support the collection of all partial gene sequence data acquired in the course of the project.

Task 2.8. Providing data for other WP throughout the project
Throughout the whole project WP2 will provide required data for other WPs. Therefore data extraction procedures will be developed to harmonize and standardize data for the GPlusE and IGenoP databases.

WP2 | WP3 WP4 | WP5 | WP6 | WP7 WP8 WP9