Yunyun Zhou

Yunyun Zhou
​​

Associate Professor, Cancer Signaling & Microenvironment

Educational Background

  • PhD, Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA, 2012
  • BS & MS, Electrical Engineering, Nanjing University, Nanjing, China, 2007

Memberships

  • American Association of Cancer Research (AACR)
  • American Society of Human Genetics (ASHG)

Honors & Awards

  • Leader guest Editor, Journal of Frontier in Psychiatry, Session: Methods of natural language processing in psychiatry research, 2021-2022
  • Leader guest Editor, International Journal of Computational Biology and Drug Design, Session: Integrative Data analysis in System Biology, 2017-2019
  • Leader guest Editor Journal of Cancer Informatics, Session: predictive model for cancer biomarker discovery, 2015- 2017
  • Co-Chair, IEEE International Conference on Bioinformatics and Biomedicine, Session: 8th Workshop on Integrative Data Analysis in Systems Biology, 2017

Research Interests

My research focuses on applying advanced data science techniques to support both basic and clinical research in cancer, ultimately driving progress in precision medicine. My areas of expertise and interest include:

  • Developing innovative computational strategies and workflows for the analysis of short-reads and long-reads sequencing data in cancer research, while also constructing comprehensive network models to facilitate cancer immunology investigations.
  • Employing artificial intelligence methods (AI) to uncover connections between functionally significant cancer-related variants/genes, molecular pathways, and phenotype associations, while also determining key biomarkers crucial for cancer diagnosis, prognosis, and treatment outcomes.
  • Utilizing natural language processing (NLP) techniques to extract and mine complex disease phenotypes from unstructured electronic health record (EHR) data, ultimately enhancing disease diagnostics. Additionally, leveraging NLP-powered knowledge graphs to accelerate drug discovery by harnessing real-world evidence.

Lab Overview

We collaborate with researchers and clinicians to design studies, analyze data, and develop data-driven products. We work on a variety of projects, including:

  • Collaborate with researchers to perform in-depth genomics and clinical data analysis, offer support in preparing manuscripts and grant applications.
  • Partner with clinicians to design studies and analyze data, develop data-driven products, and validate hypotheses using real-world evidence gathered from diverse consortium organizations, such as labs, hospitals, and literature mining.
  • Provide insightful data science consultation to pharmaceutical and biotech companies on new drug development, leveraging both short-reads and long-reads sequencing data analysis, and utilizing structured and unstructured patient's data for clinical research.

Lab Description

We focus on three primary research areas:

  1. Machine Learning and Deep Learning methods for Driver Variant and Gene Discovery. We develop advanced deep learning methods, including Transformer models and Generative Adversarial Networks, to enhance the interpretation of genetic variants with unknown functions for cancer patients in clinical settings. This area also includes the creation of machine learning predictive models for cancer patient subtype stratification, survival outcome prediction, and drug treatment response prediction. In addition, we devise statistical feature selection methods and classification models for discovering and validating biomarkers in clinical cancer research. By linking the associations of genotypes and phenotypes, our goal is to better understand the genetic underpinnings of cancer and inform optimized treatment strategies.
  2. Natural language processing for EHR Clinical Notes and Literature Mining. We utilize cutting-edge natural language processing (NLP) techniques, such as BERT and GPT, to identify psychiatric and behavioral phenotypes in clinical notes and literature. By leveraging real-world evidence, our goal is to improve early diagnosis and stratify neurodevelopmental disorder patients based on their distinct phenotypes. We have developed representation learning models for recognizing medical concepts and extracting phenotypes from temporal patterns in clinical notes. Furthermore, we have employed NLP knowledge graph techniques for knowledge discovery, connecting molecular variations with phenotypes and pathway alterations, which expedites knowledge acquisition across various domains, including COVID-19.
  3. Integrative Network Approaches for Multi-omics and Clinical Outcomes Association Studies. We develop multi-layer network methods for predicting disease driver genes and functions by integrating mutations, genetic/epigenetic interactions, and functional pathways. This integrative approach allows us to gain insights into the complex interactions driving cancer development and progression. To support this research, we also create user-friendly cancer databases equipped with analytical tools that facilitate complex, dynamic, and high-dimensional cancer research. Our work enables both our team and fellow researchers to efficiently explore and analyze cancer-related data for enhanced understanding.

Selected Publications

Machine Learning and Deep Learning Applications

Li Q, Ren Z, Li M, Cao K, Wang K*, Zhou Y*, CancerVar: an Artificial Intelligence empowered platform for clinical interpretation of somatic mutations in cancer, Science Advances, (8)18, 2022. https://pubmed.ncbi.nlm.nih.gov/35544644/

Lei L, Wang X, Mo Y, Cheng S, Zhou Y*, DGM-CM6: A new model to predict distant recurrence risk in operable endocrine-responsive breast cancer, Frontiers in Oncology, (10)783, 2020. https://pubmed.ncbi.nlm.nih.gov/32528885/

Zhang S, Wang J, Ghoshal T, Wilkins D, Mo YY, Chen Y, Zhou Y*, lncRNA gene signatures for prediction of breast cancer intrinsic subtypes and prognosis, Genes, 2018, 9 (2), 65. https://pubmed.ncbi.nlm.nih.gov/29373522/

Natural Language Processing on Clinical Notes and Literature Data Mining

Zhao M, Havrilla J, Peng J, Drye M, Fecher M, Guthrie W, Tunc B, Schultz R, Wang K*, Zhou Y*. Development of a phenotype ontology for autism spectrum disorder by natural language processing on Electronic Health Records, Journal of Neurodevelopmental Disorders, 14(1):32, 2022. https://pubmed.ncbi.nlm.nih.gov/35606697/

Wang L, Jiang L, Pan D, Wang Q, Yin Z, Zhou Y*, Xu H*, Novel Approach by Natural Language Processing (NLP) for COVID-19 Knowledge Discovery, Biomedical Journal, 45(3):472-481, 2022. https://pubmed.ncbi.nlm.nih.gov/35367669/

Wang L, Wang Q, Bai H, Liu Q, Liu W, Zhang Y, Jiang L, Xu H, Wang K*, Zhou Y*, EHR2Vec:representation learning of medical concepts from temporal patterns of clinical notes based on self-attention mechanism, Frontiers in Genetics, 11:630, 2020. https://pubmed.ncbi.nlm.nih.gov/32714371/

Network Analysis for Integrative Study on Multi-omics and Clinical Outcomes

Wang J, Wang X, Bhat A, Chen Y, Xu K, Mo Y, Yi S, Zhou Y*, Comprehensive network analysis reveals alternative splicing-related lncRNAs in hepatocellular carcinoma, Frontiers in Genetics, 11:659, 2020. https://pubmed.ncbi.nlm.nih.gov/32760422/

Peng J, Zhou Y, Wang K*. Multiplex gene and phenotype network to characterize shared genetic pathways of epilepsy and autism, Scientific Reports, 11: 952. 2021. https://pubmed.ncbi.nlm.nih.gov/33441621/

Zhou Y, Simmons J, Jordan C, Sonbol M, Maihle N, Tang S, Aspirin treatment effect and association with PIK3CA mutation in breast cancer: a biomarker analysis, Clinical Breast Cancer, 19(5):354-362, 2019. https://pubmed.ncbi.nlm.nih.gov/31262687/

 

The following ratings and reviews are based on verified feedback collected from independently administered patient experience surveys. The ratings and comments submitted by patients reflect their own views and opinions. Patient identities are withheld to ensure confidentiality and privacy. Learn more about our Patient Experience Ratings.

Ratings Breakdown

Loading ...

Patient comments

Loading ...
​​