Epigenetics is the study of reversible modifications on the genetic material of cells, affecting gene expression mechanisms. They are partly inherited, and partly imputable to environment and life habits. The International Human Epigenome Consortium (IHEC), which includes the Encyclopedia of DNA Elements (ENCODE), is a collective effort involving over 10 countries worldwide, that aims to understand the extent to which the epigenome has shaped the human genome over generations and in response to the environment. We propose EpiShare as a joint IHEC/ENCODE Driver Project to provide coordinated input and adoption for GA4GH standards by two large international projects that are both dedicated to providing access to epigenetic and RNA expression reference data to users worldwide.

Epigenomics is a logical extension of the work done in genomics to better understand the mechanisms behind a wide range of clinical conditions, from cancer to rare diseases. IHEC members have so far committed over $200M USD to annotate the genome by producing, curating and serving reference epigenomic datasets, as well as providing its uniform processing pipelines, to users worldwide. This data can provide explanations of, and insights into, the study of genetic variants in an individual genome and the interpretation of genome-wide association studies (GWAS). However, accessing epigenomics data, including IHEC and ENCODE data, remains a challenge. To quote an editorial entitled “Sharing epigenomes globally” published in Nature Methods in February 2018: “to have the most impact, easier access to the underlying raw data is needed.” Indeed, one of the main issues for researchers interested in these datasets is that obtaining the original sequence files, often stored at controlled access repositories and bound by different access agreements, can be challenging and time consuming. We need better mechanisms to facilitate epigenomic data discovery and analysis, while addressing the ethical and privacy aspects associated with data sharing.

GA4GH has been developing tools and standards to deal with some of these issues for genomic data but there are currently no active projects to serve epigenomic data in similar ways. To solve this, IHEC and ENCODE have initiated the EpiShare project, aiming to adapt and extend GA4GH resources to make accessing, sharing and analyzing epigenomic data more flexible. Building on existing GA4GH tools and standards, and online resources such as the IHEC Data Portal (epigenomesportal.ca/ihec) and the ENCODE Portal (encodeproject.org), this EpiShare platform will create a web resource to make data more easily discoverable and enable the launch of multi-omics analyses on these controlled-access datasets at their storage location. It will integrate data not only produced by IHEC member consortia, but also from the broad epigenomic research community. All source code for software and tools developed will be made available through the appropriate GA4GH channels.