29. Tools for functional genomics dataset visualization and analysis

Arpad Danos

Arpad Danos is involved with CIViC (Clinical Interpretation of Variants in Cancer, www.civicdb.org) as an editor, and works on training and development of the CIViC data model to integrate new cancer variant classification guidelines, and keep pace with the rapidly evolving field of clinical cancer variant interpretation.


Arpad Danos, Zach Skidmore, Haolin Shen, Christina Gurnett, Josh Rubin, Dustin Baldridge, Barak Cohen, Malachi Griffith, Obi L. Griffith

Washington University School of Medicine, St. Louis, MO, USA

Functional genomics, which involves massively parallel experimentation on a large set of variants (e.g., saturation mutagenesis of missense mutations) from a single gene, leveraging next generation sequencing, has emerged as an active area to gain large datasets. Visualization and analysis requires specialized tools that work with large volumes of parameterized data. To address this general problem, we developed a functional genomics data visualizer which was used with multiple TP53 datasets associated with loss of function and dominant negative activity. We have curated a validation set of ~100 TP53 variants drawn from biomedical literature, and this curation is being entered into the CIViC database (www.civicdb.org). We also performed clustering analysis on TP53 datasets to classify variants by functional type.

Written in R shiny, our visualizer provides a convenient way to explore data related to these assays, including how well the assay covered the genomic space, the relative signal of a mutation related to the entire assay, and a way to compare across assays. Setup is straightforward, requiring just a yaml file for configuration. For the TP53 validation set, literature is searched for biomedical assays focusing on individual variants, and curated as CIViC Functional Evidence Items. Clustering analysis is performed with R, and compared to the validation dataset. Thus Far, we clustered 4 experimental variables over 1041 TP53 variants drawn from 2 functional genomics datasets into categories of loss of function or no change. When compared to a smaller validation set, classifications were as expected, suggesting possible predictive value.