Home

Statistical Methods For The Analysis Of ChlP-chip Data

Sunduz Keles
University Of Wisconsin Madison Suite 6401 Madison, Wi 537151218

Grant 1R01HG003747-01A2 from National Human Genome Research Institute, IRG: GCAT

Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.

Keywords: genome, transcription factor, DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training

Project start date: 2007-04-26

Project end date: 2011-03-31

1R01HG003747-01A2 (2007): $282445


Sponsored Links Lab Supply Mall http://www.labsupplymall.com

Qiagen Plasmid Maxi Kit (25), Cat # 12163
For purification of up to 500 ug transfection grade plasmid or cosmid DNA. $454, $395
GenJetTM In Vitro DNA Transfection Reagent
A more affordable alternative to Invitrogen's lipofetacmine 2000. $178, $139
Qiagen EndoFree Plasmid Maxi Kit (10), Cat # 12362
For purification of up to 500 ug advanced transfection grade plasmid or cosmid DNA. $266, $210
Invitrogen Human Cot-1 DNA Cat# 15279-011
Block non-specific hybridization in microarray screening. $155, $120
GR Safe Nucleic Acid Stain
Excellent Alternative to Ethidium Bromide: Safety, Sensitivity, Stability. $78, $58
Qiagen QIAEX II Gel Extraction Kit (150), Cat # 20021
For batch purification of DNA fragments (40 bp to 50 kb) from agarose gels and from solutions. $137, $105
Invitrogen NuPAGE Novex 4-12% Bis-Tris Gels
Best resolution and most consistent results,long shelf-life - at least 8 months! . $117.5, $95
Qiagen QIAprep Spin Miniprep Kit (250), Cat # 27106
For purification of up to 20 ug molecular biology grade plasmid DNA. $328, $285
QIAGEN Plasmid Maxi Kit (10), Cat # 12162
For purification of up to 500 ug transfection grade plasmid or cosmid DNA. $192, $150
Amersham ECL Plus Western Blotting Detection Reagents, Cat # RPN2132
Superior sensitivity.. $230, $55

Statistical Methods For The Analysis Of ChlP-chip Data

Sunduz Keles
University Of Wisconsin Madison Suite 6401 Madison, Wi 537151218

Grant 1R01HG003747-01A2 from National Human Genome Research Institute, IRG: GCAT

Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.

Keywords: genome, transcription factor, DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training

Project start date: 2007-04-26

Project end date: 2011-03-31

1R01HG003747-01A2 (2007): $282445



Grants awarded to Sunduz Keles

Statistical Methods For The Analysis Of ChlP-chip Data

Sunduz Keles
Biostatistics/med Informaticsuniversity Of Wisconsin Madison

Grant 5R01HG003747-02 from National Human Genome Research Institute, IRG: GCAT

Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments

Keywords: genome, transcription factor DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training

Project start date: 2007-04-26

Project end date: 2011-03-31



Related Publications

Wozniak RJ, Keles S, Lugus JJ, Young KH, Boyer ME, Tran TM, Choi K, Bresnick EH.
Abstract Molecular hallmarks of endogenous chromatin complexes containing master regulators of hematopoiesis. Mol Cell Biol. 2008 Nov; 28( 21): 6681-94. Epub 2008 Sep 8. PMID: 18779319

Keleş S, Warren CL, Carlson CD, Ansari AZ.
Free in PMC CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data. Nucleic Acids Res. 2008 Jun; 36( 10): 3171-84. Epub 2008 Apr 13. PMID: 18411210

Wei H, Kuan PF, Tian S, Yang C, Nie J, Sengupta S, Ruotti V, Jonsdottir GA, Keles S, Thomson JA, Stewart R.
Free in PMC A study of the relationships between oligonucleotide properties and hybridization signal intensities from NimbleGen microarray datasets. Nucleic Acids Res. 2008 May; 36( 9): 2926-38. Epub 2008 Apr 1. PMID: 18385155

Jörnsten R, Keleş S.
Abstract Mixture models with multiple levels, with application to the analysis of multifactor gene expression data. Biostatistics. 2008 Jul; 9( 3): 540-54. Epub 2008 Feb 5. PMID: 18256042

Kuan PF, Chun H, Keleş S.
Free Full Text CMARRT: a tool for the analysis of ChIP-chip data from tiling arrays by incorporating the correlation structure. Pac Symp Biocomput. 2008: 515-26. PMID: 18229712

Isogai Y, Keles S, Prestel M, Hochheimer A, Tjian R.
Free in PMC Transcription of histone gene cluster by differential core-promoter factors. Genes Dev. 2007 Nov 15; 21( 22): 2936-49. Epub 2007 Oct 31. PMID: 17978101

Goff LA, Davila J, Jörnsten R, Keles S, Hart RP.
Free in PMC Bioinformatic analysis of neural stem cell differentiation. J Biomol Tech. 2007 Sep; 18( 4): 205-12. PMID: 17916793

Sweitzer NK, Shenoy M, Stein JH, Keles S, Palta M, LeCaire T, Mitchell GF.
Free Full Text Increases in central aortic impedance precede alterations in arterial stiffness measures in type 1 diabetes. Diabetes Care. 2007 Nov; 30( 11): 2886-91. Epub 2007 Aug 8. PMID: 17686834

Hopkins DR, Keles S, Greenspan DS.
Abstract The bone morphogenetic protein 1/Tolloid-like metalloproteinases. Matrix Biol. 2007 Sep; 26( 7): 508-23. Epub 2007 May 18. Review. PMID: 17560775

Shim H, Keles S.
Abstract Integrating quantitative information from ChIP-chip experiments into motif finding. Biostatistics. 2008 Jan; 9( 1): 51-65. Epub 2007 May 28. PMID: 17533175

Keleş S.
Abstract Mixture modeling for genome-wide localization of transcription factors. Biometrics. 2007 Mar; 63( 1): 10-21. PMID: 17447925

Bembom O, Keles S, van der Laan MJ.
Abstract Supervised detection of conserved motifs in DNA sequences with cosmo. Stat Appl Genet Mol Biol. 2007; 6: Article8. Epub 2007 Feb 23. PMID: 17402923

Isogai Y, Takada S, Tjian R, Keleş S.
Free in PMC Novel TRF1/BRF target genes revealed by genome-wide analysis of Drosophila Pol III transcription. EMBO J. 2007 Jan 10; 26( 1): 79-89. Epub 2006 Dec 14. PMID: 17170711

Keleş S, van der Laan MJ, Dudoit S, Cawley SE.
Abstract Multiple testing methods for ChIP-Chip high density oligonucleotide array data. J Comput Biol. 2006 Apr; 13( 3): 579-613. PMID: 16706714

van der Laan MJ, Dudoit S, Keles S.
Abstract Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol. 2004; 3: Article4. Epub 2004 Mar 22. PMID: 16646820 [PubMed]
Keles S, van der Laan MJ, Dudoit S, Xing B, Eisen MB.
Abstract Supervised detection of regulatory motifs in DNA sequences. Stat Appl Genet Mol Biol. 2003; 2: Article5. Epub 2003 Aug 25. PMID: 16646783 [PubMed]
Lu F, Keles S, Wright SJ, Wahba G.
Free in PMC Framework for kernel regularization with application to protein clustering. Proc Natl Acad Sci U S A. 2005 Aug 30; 102( 35): 12332-7. Epub 2005 Aug 18. PMID: 16109767

Cinar H, Keles S, Jin Y.
Abstract Expression profiling of GABAergic motor neurons in Caenorhabditis elegans. Curr Biol. 2005 Feb 22; 15( 4): 340-6. PMID: 15723795

Keles S, van der Laan MJ, Vulpe C.
Free Full Text Regulatory motif finding by logic regression. Bioinformatics. 2004 Nov 1; 20( 16): 2799-811. Epub 2004 May 27. PMID: 15166027

De Freitas JM, Kim JH, Poynton H, Su T, Wintz H, Fox T, Holman P, Loguinov A, Keles S, van der Laan M, Vulpe C.
Free Full Text Exploratory and confirmatory gene expression profiling of mac1Delta. J Biol Chem. 2004 Feb 6; 279( 6): 4450-8. Epub 2003 Oct 8. PMID: 14534306