Mechanistic Phenotypes: An Aggregative Phenotyping Strategy to Identify Disease Mechanisms Using GWAS Data
PLOS ONE
Authors: Mosley, Jonathan D.; Van Driest, Sara L.; Larkin, Emma K.; Weeke, Peter E.; Witte, John S.; Wells, Quinn S.; Karnes, Jason H.; Guo, Yan; Bastarache, Lisa; Olson, Lana M.; McCarty, Catherine A.; Pacheco, Jennifer A.; Jarvik, Gail P.; Carrell, David S.; Larson, Eric B.; Crosslin, David R.; Kullo, Iftikhar J.; Tromp, Gerard; Kuivaniemi, Helena; Carey, David J.; Ritchie, Marylyn D.; Denny, Josh C.; Roden, Dan M.
Abstract
A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF < 0.1) non-synonymous SNPs (nsSNPs) associated with "mechanistic phenotypes'', comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 261025, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 461026, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.
Single Nucleotide Polymorphism Network: A Combinatorial Paradigm for Risk Prediction
PLOS ONE
Authors: Das Roy, Puspita; Sengupta, Dhriti; Dasgupta, Anjan Kr; Kundu, Sudip; Chaudhuri, Utpal; Thakur, Indranil; Guha, Pradipta; Majumder, Mousumi; Roy, Roshni; Roy, Bidyut
Abstract
Risk prediction for a particular disease in a population through SNP genotyping exploits tests whose primary goal is to rank the SNPs on the basis of their disease association. This manuscript reveals a different approach of predicting the risk through network representation by using combined genotypic data (instead of a single allele/haplotype). The aim of this study is to classify diseased group and prediction of disease risk by identifying the responsible genotype. Genotypic combination is chosen from five independent loci present on platelet receptor genes P2RY1 and P2RY12. Genotype-sets constructed from combinations of genotypes served as a network input, the network architecture constituting super-nodes (e. g., case and control) and nodes representing individuals, each individual is described by a set of genotypes containing M markers (M = number of SNP). The analysis becomes further enriched when we consider a set of networks derived from the parent network. By maintaining the super-nodes identical, each network is carrying an independent combination of M-1 markers taken from M markers. For each of the network, the ratio of case specific and control specific connections vary and the ratio of super-node specific connection shows variability. This method of network has also been applied in another case-control study which includes oral cancer, precancer and control individuals to check whether it improves presentation and interpretation of data. The analyses reveal a perfect segregation between super-nodes, only a fraction of mixed state being connected to both the super-nodes (i.e. common genotype set). This kind of approach is favorable for a population to classify whether an individual with a particular genotypic combination can be in a risk group to develop disease. In addition with that we can identify the most important polymorphism whose presence or absence in a population can make a large difference in the number of case and control individuals.