Corentin Molitor – Cranfield University, UK
13 April 2021
Over the past twenty years, genome sequencing has moved from what was once a 10 years-long international effort and a $300 million bill for the Human Genome Project, to become nowadays a few hours job that would cost less than $1000.
As a result of this unprecedented technological advancement, publicly available genetic data is able to provide a gold mine of genetic information and their links to diseases and phenotypes. Consequently, a new pan of medicine appeared, based on personalisation, instead of using a “one size fits all” strategy, prevention and cure are tailored towards groups and potentially, each individual, however there is still a lot to unravel and understand.
This personalisation is based on the environment, lifestyle, gut microbiome but also the genetics of each individual. An important part of the genetic information we can get from an individual are the variations between their genome and the human reference genome. Indeed, it is estimated that ~0.6% of the nucleotides (the building blocks of the genome) are different between two persons, corresponding to ~20 million variants. Having a variant is not necessarily a bad thing, it can offer protection from a disease, or even having no effect at all. Unfortunately, some of these variants can lead to an increased risk of developing one or more diseases, and therefore it is important to identify them.
Despite the aforementioned wealth of available genetic data, the information is often split between different databases, under different formats, making it hard to get the information quickly and reliably. There is a need for easy-to-use tools to retrieve genetic information across different sources. As part of the NUTRISHIELD project, we have developed VarGen, a publicly available package written in R, which automates the process of retrieving a list of genetic variants linked to a phenotype or disease.
VarGen will help personalisation of the diet as part of the NUTRISHIELD project by making databases of genetic variants linked to food allergies, obesity and diabetes. By assessing the genetic risk of each person to develop these diseases, it will be possible to tailor specific diets according to the genetic markup of each individual.
Manhattan plot produced with VarGen for different studies related to obesity. Each dot is a variant, the x-axis represents the genomic coordinates, split by chromosome, here only 5 chromosomes are represented for the sake of clarity. The y-axis represents the –log10(p-value), a higher value means a stronger relationship between the variant and the trait.
C. Molitor, M. Brember, and F. Mohareb, ‘VarGen: an R package for disease-associated variant discovery and annotation’, Bioinformatics, vol. 36, no. 8, pp. 2626–2627, Apr. 2020, doi: 10.1093/bioinformatics/btz930.