Swett, Rebecca J.; Elias, Angela; Miller, Jeffrey A.; Dyson, Gregory E.; Cisneros, G. Andres
Hypothesis driven single nucleotide polymorphism search (HyDn-SNP-S)
DNA REPAIR, 12:733-740, SEP 2013

The advent of complete-genome genotyping across phenotype cohorts has provided a rich source of information for bioinformaticians. However the search for SNPs from this data is generally performed on a study-by-study case without any specific hypothesis of the location for SNPs that are predictive for the phenotype. We have designed a method whereby very large SNP lists (several gigabytes in size), combining several genotyping studies at once, can be sorted and traced back to their ultimate consequence in protein structure. Given a working hypothesis, researchers are able to easily search whole genome genotyping data for SNPs that link genetic locations to phenotypes. This allows a targeted search for correlations between phenotypes and potentially relevant systems, rather than utilizing statistical methods only. HyDn-SNP-S returns results that are less data dense, allowing more thorough analysis, including haplotype analysis. We have applied our method to correlate DNA polymerases to cancer phenotypes using four of the available cancer databases in dbGaP. Logistic regression and derived haplotype analysis indicates that similar to 80 SNPs, previously overlooked, are statistically significant. Derived haplotypes from this work link POLL to breast cancer and POLG to prostate cancer with an increase in incidence of 3.01- and 9.6-fold, respectively. Molecular dynamics simulations on wild-type and one of the SNP mutants from the haplotype of POLL provide insights at the atomic level on the functional impact of this cancer related SNP. Furthermore, HyDn-SNP-S has been designed to allow application to any system. The program is available upon request from the authors. (c) 2013 Elsevier B.V. All rights reserved.

DOI:10.1016/j.dnarep.2013.06.001

Find full text with Google Scholar.