64. FAIR sharing of cancer GWAS data via the NHGRI-EBI GWAS catalog

Aly Abdelkareem

Maria Cerezo

Maria completed her PhD in Human Genetics at the University of Santiago de Compostela in Spain, with a period at Oxford University. Her research was focused on studying mitochondrial DNA variability with applications mainly to human population genetics but also clinical and forensic genetics. During her postdoc at Wellcome Sanger Institute, she was part of the 1000Genomes Project, in charge of the data analysis of mitochondrial DNA.

In 2015 she moved to EMBL-EBI as scientific curator in GWAS Catalog, where she combines her attention to detail, which is useful for standardisation of the data included in the database, with her personal and communication skills suitable for outreach and training, promoting open science


Maria Cerezoa, Annalisa Bunielloa, Ala Abida, Peggy Hallb, James Hayhursta, Arwa Ibrahima, Sajo Johna, Elizabeth Lewisa, Aoife McMahona, Abayomi Mosakua, Santhi Ramachandrana, Elliot Sollisa, Fiona Cunninghama, Paul Fliceka, Lucia Hindorffb, Laura Harrisa, Helen Parkinsona

aEMBL-EBI, Hinxton, Cambridgeshire, United Kingdom; bNHGRI, Bethesda, MD, USA

The GWAS Catalog is a comprehensive resource of data from genome wide association studies. Top associations and detailed metadata are made available in a standard format alongside full p-value summary statistics. These are re-used by the genomics community, e.g. in meta-analyses, generation of polygenic scores, identification of new drug targets. As of February 2022, the Catalog contains 33,162 GWA studies including 22,675 with summary statistics, from 5,595 publications – covering >5,000 distinct traits. Pre-published datasets are also available. All data is freely available and frequently updated, based on principles of open access and data sharing.

Since starting to host summary statistics in 2017, the Catalog has worked to engage the genomics community and promote open sharing of full datasets. Interestingly, analysis performed in March 2021 showed the rate of data sharing differs among different cohorts, with the lowest submission rate in cancer genetics. A survey revealed the most common reasons included data embargoed for use in future research, privacy/ethics issues and lack of awareness of an appropriate data repository. Given the importance of cancer research to human health, we pursued dedicated outreach to the cancer community including a series of workshops and individual discussions to identify and address barriers to sharing. This has already had effects with a sharp increase in sharing from cancer studies in recent quarters. Future efforts in open sharing of cancer GWAS data will lead the way to new, improved therapies and shed some light on the molecular mechanisms involved in such complex disease.