132. Variant analysis for exploration of cancer datasets on the Cancer Genomics Cloud, powered by Seven Bridges

Aly Abdelkareem

Zélia Worman

Dr. Zélia Worman is a Program Manager at Seven Bridges, working on the CGC and other Seven Bridges platforms. Prior to Seven Bridges, Zelia was a scientific program manager for the Translational Research Institute for Space Health (TRISH), a cooperative agreement with the NASA Human Research Program.

Zélia was born in Porto, Portugal where she received her bachelors in Biochemistry and PhD in Biodiversity, Genetics and Evolution from the University of Porto. Zelia was a postdoctoral fellow at the University of Pittsburgh and Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) at the National Institutes of Health (NIH). Zelia’s research has focused on human population genomics, natural selection, and transposable elements.

While she was a postdoc at NICHD, she was the Chair of the Service and Outreach Subcommittee at FelCom, a member of NICHD Fellow’s Advisory Committee, and did an administrative internship in the NHGRI Education and Outreach Office.

Abstract

Zelia Wormana, Manisha Raya, Nevena Mileticb, Milan Kovačevićb, Ana Stankovicb, Nevena Ilic Raicevicb, Dan Ventrea, Sai Subramaniana, Jelena Randjelovicb, Jack DiGiovannaa, Brandi Davis-Dusenberyb

aSeven Bridges, Charlestown, MA, USA; bSeven Bridges, Belgrade, Serbia

The Cancer Genomics Cloud (CGC), powered by Seven Bridges, is an NCI-funded cloud platform that enables analysis and secure storage of petabytes of private and public multi-omics cancer datasets (such as TCGA, CPTAC, ICGC, CCLE, and many others) through a user-friendly portal with limited need for programming or cloud expertise. The CGC allows researchers to perform reproducible and interactive analyses without downloading data to local computers and share their results easily and securely.

Using the CGC, we analyzed normal and tumor samples using several variant calling tools publicly available on the platform, including ‘Genome Analysis Toolkit’ (GATK). GATK is one of the widely used gold-standard tools for variant calling resulting from high throughput sequencing. We optimized these tools to run on the CGC platform, minimizing cloud cost and run time. We benchmarked their performance using several publicly available datasets, which can be easily accessed and analyzed on the CGC. We used tumor and normal samples from TCGA to detect germline and somatic mutations, both single nucleotide polymorphisms (SNPs) and structural variants. We then calculated summary statistics and provided benchmarking for costs and run time, providing transparency and estimates for running an analysis on the CGC.

We demonstrate that the CGC platform is a great resource for variant curation and standardization given its access to data, tools, and ease of use. By harnessing the scale and flexibility of cloud computing, these workflows are optimized to bring speed, cost efficiency, and reproducibility to multi-omics analysis to any researcher.