Bozdag lab research diagram

Research Statement Summary

My lab’s research goal is to develop open source integrative computational tools to analyze high dimensional biological, clinical and environmental exposure datasets to infer context-specific gene regulatory interactions and modules, and to predict disease associated genes and patient-specific drug response. With the advances in high throughput technologies in biology, numerous national and international consortiums have generated a vast amount of genotype, phenotype, gene expression, and epigenetic data (collectively called multi-omics data), which have been made available to the scientific community. Furthermore, ongoing large initiatives such as UK Biobank, Million Records Project, and All of Us research program will bring vast amounts of multi-omics datasets from millions of individuals. Each of these different data modalities (e.g., mRNA expression, DNA methylation, mutation, microRNA (miRNA) expression, and copy number alteration) describes one facet of the underlying biology. Consequently, there is a tremendous need for scalable methods that can integrate different layers of multi-omics datasets across millions of individuals from different backgrounds. These methods would produce valuable insights into human diseases and pave the way for precision medicine. My research program is devoted to developing integrative computational tools utilizing artificial intelligence, machine learning and data mining methods to analyze these multi-omics datasets.


MIRA (R35GM133657)

Research Themes

1. Building machine learning models integrating multi-modal biomedical datasets

We have been involved in building several classification tools to predict gene expression, disease subtype, and drug carcinogenicity. Specifically, we have developed deep learning architecture called PPAD to integrate cross-sectional and longitudinal datasets of Alzheimer’s disease to predict its progression. We developed a k-nearest neighbor-based model to predict gene expression changes based on DNA methylation signal. We employed ensemble-based learning method to predict in vivo carcinogenicity. We also developed tools to predict subtype of cancer patients, namely predicting epigenetic subtype and molecular subtype of glioblastoma patients. We also developed a tool named NRPReTo to predict nuclear receptor proteins based on sequence- and structure-based features.

Selected Publications

S. S. Madugula, S. Pandey, S. Amalapurapu, and S. Bozdag, “NRPreTo: A Machine Learning-Based Nuclear Receptor and Subfamily Prediction Tool,” ACS Omega, May 2023,

M. Al Olaimat, J. Martinez, F. Saeed, S. Bozdag, and Alzheimer’s Disease Neuroimaging Initiative, “PPAD: a deep learning architecture to predict progression of Alzheimer’s disease,” (ISMB/ECCB 2023) Bioinformatics, vol. 39, no. Supplement_1, pp. i149–i157, Jun. 2023, doi: 10.1093/bioinformatics/btad249.

T. Yang, M. A. Al-Duailij, S. Bozdag and F. Saeed, Classification of Autism Spectrum Disorder Using rs-fMRI data and Graph Convolutional Networks, 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022, pp. 3131-3138,

Baur, B., & Bozdag, S. (2016). A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data. PloS One11(2), e0148977.

Baysan, M., Bozdag, S., Cam, M. C., Kotliarova, S., Ahn, S., Walling, J., … Fine, H. A. (2012). G-cimp status prediction of glioblastoma samples using mRNA expression data. PLoS One7(11), e47839.

2. Graph representation learning methods for network biology

In recent years, we have developed several machine learning tools in network biology to integrate multi-modal biomedical datasets. We developed a random walk with restart method called PhenoGeneRanker that works on multiplex heterogeneous networks of genes and phenotypes (i.e., networks with multiple types of nodes and edges) to rank disease-associated genes. We have developed a node embedding method for multiplex heterogeneous networks called NeCO and used it to compute hypertension-related genes. We developed SUPREME, which integrates multi-layer networks for downstream tasks such as node classification and GRAF, which fuses multi-layer networks into a single network and perform downstream tasks.

Selected Publications

Zitnik M, Li MM, Wells A, Glass K, Gysi DM, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline S, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. arXiv; 2023.

Z. N. Kesimoglu and S. Bozdag, “SUPREME: multiomics data integration using graph convolutional networks,” NAR Genomics and Bioinformatics, vol. 5, no. 2, p. lqad063, Mar. 2023, doi: 10.1093/nargab/lqad063.

Z. N. Kesimoglu & S. Bozdag (2023). GRAF: Graph Attention-aware Fusion Networks. arXiv preprint arXiv:2303.16781.

Dursun C, Kwitek A, Bozdag S. PhenoGeneRanker: Gene and Phenotype Prioritization Using Multiplex Heterogeneous Networks. IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul 20;PP. PMID: 34283720

Dursun C, Smith JR, Hayman GT, Kwitek AE, Bozdag S. NECo: A node embedding algorithm for multiplex heterogeneous networks. 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2020. p. 146–149.

3. Analysis of high dimensional biological datasets to characterize diseases

We apply statistical and data mining methods to analyze high dimensional biological datasets to better characterize the diseases. To this end, we have developed CNAReporter software, which is a tool to analyze and report genome-wide genomic alterations, GliomaPredict, a tool that classifies glioma patients into molecular subtypes based on gene expression, and an R/Bioconductor package rgsepd to analyze RNA-seq datasets to perform differential expression and enrichment analysis, and clustering. We analyzed copy number alteration, DNA methylation, mutation and gene expression datasets in glioblastoma patients to identify age-specific genomic, genetic and epigenetic signatures in glioblastoma. In another project, we studied the effect of DNA methylation on mRNA expression of patient-derived glioma stem cells. Most recently, we have conducted an extensive study to compute cancer cell line and primary tumor tissue similarity across pan-cancer.

Selected Publications

Bose B, Bozdag S. Finding the best cell lines across pan-cancer to use in pre-clinical research as a proxy for patient tumor samples considering immune cells, multi-omics, and cancer pathways. bioRxiv; 2022

Stamm, K., Tomita-Mitchell, A., & Bozdag, S. (2019). GSEPD: a Bioconductor package for RNA-seq gene set enrichment and projection display. BMC Bioinformatics20(1), 115.

Ready, D., Yagiz, K., Amin, P., Yildiz, Y., Funari, V., Bozdag, S., & Cinar, B. (2017). Mapping the STK4/Hippo signaling network in prostate cancer cell. PLoS ONE12(9).

Baur, B., & Bozdag, S. (2017). ProcessDriver: A computational pipeline to identify copy number drivers and associated disrupted biological processes in cancer. Genomics109(3–4), 233–240.

Bozdag, S., Li, A., Riddick, G., Kotliarov, Y., Baysan, M., Iwamoto, F. M., … Fine, H. A. (2013). Age-Specific Signatures of Glioblastoma at the Genomic, Genetic, and Epigenetic Levels. PLoS ONE, 8(4).

Wuchty, S., Arjona, D., Bozdag, S., & Bauer, P. O. (2012). Involvement of microRNA families in cancer. Nucleic Acids Res40(17), 8219–8226

4. Reverse engineering of gene regulatory networks

Reverse engineering of gene regulatory networks is an important and challenging task in computational biology. We have contributed to this research area by developing several tools. Most recently we developed Cancerin and Crinet to infer cancer-specific competing endogenous RNA interactions by integrating various multi-omics datasets and miRDriver to infer miRNA-gene regulatory interactions in pan-cancer. We also implemented a canonical correlation analysis-based algorithm that integrates DNA methylation and copy number alteration with gene expression to infer gene regulatory interactions. We also develop FastMEDUSA , a parallelized version of MEDUSA to infer gene regulatory networks.

Selected Publications

Bose B, Moravec M, Bozdag S. Computing microRNA-gene interaction networks in pan-cancer using miRDriver. Sci Rep. Nature Publishing Group; 2022 Mar 8;12(1):3717.

Kesimoglu ZN, Bozdag S. Crinet: A computational tool to infer genome-wide competing endogenous RNA (ceRNA) interactions. PLoS One. 2021;16(5):e0251399. PMCID: PMC8118266

Do D, Bozdag S. CanMod: A computational model to identify co-regulatory modules in cancer. Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics . New York, NY, USA: Association for Computing Machinery; 2020. p. 1–10.

Do, D., & Bozdag, S. (2018). Cancerin: A computational pipeline to infer cancer-associated ceRNA interaction networks. PLoS Computational Biology14(7), e1006318.

Baur, B., & Bozdag, S. (2015). A canonical correlation analysis-based dynamic bayesian network prior to infer gene regulatory networks from multiple types of biological data. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology22(4), 289–299.

Bozdag, S., Li, A., Wuchty, S., & Fine, H. A. (2010). FastMEDUSA: a parallelized tool to infer gene regulatory networks. Bioinformatics26(14), 1792–1793.