Salah Eddine El Herrag

PhD in Cell Biology and Pathology | Oncology & Cancer Research

AI-based prediction of functional impacts of colorectal cancer-associated SNPs


Conference


Salah Eddine El Herrag, Noria Harir, Soraya Moulessehoul
1st National Scientific Day, 2025 Nov 11


Abstract
Cite

Cite

APA   Click to copy
Herrag, S. E. E., Harir, N., & Moulessehoul, S. (2025). AI-based prediction of functional impacts of colorectal cancer-associated SNPs. https://doi.org/10.17605/OSF.IO/F3SZU


Chicago/Turabian   Click to copy
Herrag, Salah Eddine El, Noria Harir, and Soraya Moulessehoul. “AI-Based Prediction of Functional Impacts of Colorectal Cancer-Associated SNPs.” 1st National Scientific Day, 2025.


MLA   Click to copy
Herrag, Salah Eddine El, et al. AI-Based Prediction of Functional Impacts of Colorectal Cancer-Associated SNPs. 2025, doi:10.17605/OSF.IO/F3SZU.


BibTeX   Click to copy

@conference{salah2025a,
  title = {AI-based prediction of functional impacts of colorectal cancer-associated SNPs},
  year = {2025},
  month = nov,
  day = {11},
  series = {1st National Scientific Day},
  doi = {10.17605/OSF.IO/F3SZU},
  author = {Herrag, Salah Eddine El and Harir, Noria and Moulessehoul, Soraya},
  month_numeric = {11}
}

 Abstract 
Introduction: Single nucleotide polymorphisms (SNPs) play a major role in modulating individual susceptibility to colorectal cancer (CRC) and influence treatment response. Artificial intelligence (AI), particularly through machine learning models, offers novel means to integrate multiple genomic annotations to predict the functional impact of these variants.
Methods: Ten SNPs previously associated with CRC were annotated using Ensembl and dbSNP, incorporating bioinformatic predictors such as SIFT, PolyPhen2, CADD, GERP, phyloP, phastCons, and allele frequency data (gnomAD). SNPs were classified according to ClinVar significance into two functional classes: pathogenic and benign. A Random Forest model was trained on 70% of the data and tested on 30% to predict variant pathogenicity. Model performance was assessed using out-of-bag (OOB) error, accuracy, Cohen’s Kappa, and the area under the receiver operating characteristic curve (AUC). 
Results: The Random Forest model achieved an OOB error rate of 50%. Despite the small dataset, the model reached 100% accuracy and perfect discrimination (AUC = 1.0) on the test set, indicating excellent classification between benign and pathogenic variants. The most influential predictors included CADD_PHRED, GERP_RS, and SIFT_score, emphasizing the value of integrating functional and conservation-based features for variant prioritization. 
Conclusions: This pilot study demonstrates the feasibility of using AI-based classification to predict the functional impact of colorectal cancer SNPs. Future work will extend the dataset and apply deep learning architectures to improve model generalizability and reliability in variant effect prediction. 
Keywords: Artificial intelligence, machine learning, single nucleotide polymorphisms, susceptibility. 

Share

Tools
Translate to