Rushank Goyal is a high school senior from India conducting research in the field of machine learning-based life sciences, especially cancer genomics and transcriptomics. He has presented his work at numerous fairs and conferences, the most recent being Regeneron’s International Science and Engineering Fair (ISEF) 2022, where he received a third-place award out of 1800+ projects from the American Statistical Association for excellent use of statistical and data science principles in his project. He conducts independent research as part of Betsos and has previously worked under a mentor at the All India Institute of Medical Sciences, Bhopal. His other interests lie in maternal mortality and indigenous Indian ethnomedicine, and he hopes to investigate the translation of biological and technological principles to those fields in the near future. In his free time, he enjoys writing, cycling, and playing badminton.
All India Institute of Medical Sciences, Bhopal, Madhya Pradesh, India
Cancer is a broad term for diseases characterized by uncontrollable and abnormal cell growth. It is the second-leading cause of death worldwide, with 9 million deaths each year; early cancer detection remains crucial for improving survival outcomes, especially in developing countries. In this research, a novel three-step framework based on quantum machine learning was developed using transcriptome data to identify key cancer biomarkers and combine them to create mathematical expressions that can predict the presence of cancer with high accuracy using the expression levels of five or fewer genes. Instead of relying on traditional black-box machine learning, the framework utilized a recently-developed technology called the quantum lattice to produce transparent and explainable models. The framework was trained and tested on ten datasets with data on ten different cancers. For each dataset, after initial filtering through XGBoost and statistical significance testing to identify differentially expressed genes, the quantum lattice was trained for 10 epochs using the Akaike Information Criterion as its loss function. The framework was trained and tested on ten datasets. Median accuracies, sensitivities and specificities of 91%, 92.5% and 87.5% respectively were obtained, with the top three accuracies being 100%, 100% and 99%. Overall, the models show better accuracies than previous research while using far fewer genes for predictions. In all, 38 biomarkers were identified, with 17 novel results, including 4 lncRNAs. The results obtained can be applied in practical settings for efficient early cancer detection and provide insights into associations between certain genes and types of cancer.