Date of Award

2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Program

Biomedical Sciences

Track

Cancer and Developmental Biology

Research Advisor

Yong Cheng

Committee

Chunliang Li; Daniel Savic; Tiffany Seagroves; Xinwei Cao

Keywords

Adenine base editor, Cis-regulatory elements, Deep learning model, GATA1 binding motif, RNA Polymerase II, Transcription readthrough

Abstract

Understanding how non-coding variants influence cellular function remains a significant challenge in genetics, largely due to limited knowledge about regulatory sequence grammar, complex chromatin environments, and the transcriptional regulatory networks that link genotype to phenotype. In this study, we combined base editor-mediated perturbations of regulatory elements, CRISPR-mediated gene disruptions, epigenetic profiling, and chromatin organization data with hybrid deep learning models to quantitatively predict the functional consequences of mutations disrupting transcription factor GATA1 binding sites. Our models achieve high predictive accuracy and reveal key regulatory features underlying these effects. However, we observed that a small subset of GATA1 binding sites with the most substantial functional effects are consistently underestimated by the model. To investigate this discrepancy, we focused on a critical GATA1 binding site located within the PRPF19 gene. Notably, disruption of this site decreased PRPF19 expression but elevated the expression of its downstream gene, PTGDR2. Further analysis revealed that this GATA1 binding site serves dual roles as an enhancer of PRPF19 and as a regulator of proper RNA Polymerase II elongation. Its disruption impaired transcription termination, causing readthrough transcription that aberrantly activated PTGDR2. Our findings uncover novel pleiotropic functions of GATA1 binding sites and illustrate the current limitations of deep learning models in predicting rare yet critical regulatory events.

ORCID

https://orcid.org/0000-0003-4179-0371

DOI

10.21007/etd.cghs.2025.0695

Available for download on Wednesday, June 30, 2027

Share

COinS