Statistical Methods For The Analysis Of ChlP-chip Data
Sunduz Keles
University Of Wisconsin Madison Suite 6401 Madison, Wi 537151218
Grant 1R01HG003747-01A2 from National Human Genome Research Institute, IRG: GCAT
Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.
Keywords: genome, transcription factor, DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training
Project start date: 2007-04-26
Project end date: 2011-03-31
1R01HG003747-01A2 (2007): $282445
Sponsored Links Lab Supply Mall http://www.labsupplymall.com
Statistical Methods For The Analysis Of ChlP-chip Data
Sunduz Keles
University Of Wisconsin Madison Suite 6401 Madison, Wi 537151218
Grant 1R01HG003747-01A2 from National Human Genome Research Institute, IRG: GCAT
Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.
Keywords: genome, transcription factor, DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training
Project start date: 2007-04-26
Project end date: 2011-03-31
1R01HG003747-01A2 (2007): $282445
Grants awarded to Sunduz Keles
Statistical Methods For The Analysis Of ChlP-chip Data
Sunduz Keles
Biostatistics/med Informaticsuniversity Of Wisconsin Madison
Grant 5R01HG003747-02 from National Human Genome Research Institute, IRG: GCAT
Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments
Keywords: genome, transcription factor DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training
Project start date: 2007-04-26
Project end date: 2011-03-31
Related Publications
Molecular hallmarks of endogenous chromatin complexes containing master regulators of hematopoiesis. Mol Cell Biol. 2008 Nov; 28( 21): 6681-94. Epub 2008 Sep 8. PMID: 18779319
CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data. Nucleic Acids Res. 2008 Jun; 36( 10): 3171-84. Epub 2008 Apr 13. PMID: 18411210
A study of the relationships between oligonucleotide properties and hybridization signal intensities from NimbleGen microarray datasets. Nucleic Acids Res. 2008 May; 36( 9): 2926-38. Epub 2008 Apr 1. PMID: 18385155
Mixture models with multiple levels, with application to the analysis of multifactor gene expression data. Biostatistics. 2008 Jul; 9( 3): 540-54. Epub 2008 Feb 5. PMID: 18256042
CMARRT: a tool for the analysis of ChIP-chip data from tiling arrays by incorporating the correlation structure. Pac Symp Biocomput. 2008: 515-26. PMID: 18229712
Transcription of histone gene cluster by differential core-promoter factors. Genes Dev. 2007 Nov 15; 21( 22): 2936-49. Epub 2007 Oct 31. PMID: 17978101
Bioinformatic analysis of neural stem cell differentiation. J Biomol Tech. 2007 Sep; 18( 4): 205-12. PMID: 17916793
Increases in central aortic impedance precede alterations in arterial stiffness measures in type 1 diabetes. Diabetes Care. 2007 Nov; 30( 11): 2886-91. Epub 2007 Aug 8. PMID: 17686834
The bone morphogenetic protein 1/Tolloid-like metalloproteinases. Matrix Biol. 2007 Sep; 26( 7): 508-23. Epub 2007 May 18. Review. PMID: 17560775
Integrating quantitative information from ChIP-chip experiments into motif finding. Biostatistics. 2008 Jan; 9( 1): 51-65. Epub 2007 May 28. PMID: 17533175
Mixture modeling for genome-wide localization of transcription factors. Biometrics. 2007 Mar; 63( 1): 10-21. PMID: 17447925
Supervised detection of conserved motifs in DNA sequences with cosmo. Stat Appl Genet Mol Biol. 2007; 6: Article8. Epub 2007 Feb 23. PMID: 17402923
Novel TRF1/BRF target genes revealed by genome-wide analysis of Drosophila Pol III transcription. EMBO J. 2007 Jan 10; 26( 1): 79-89. Epub 2006 Dec 14. PMID: 17170711
Multiple testing methods for ChIP-Chip high density oligonucleotide array data. J Comput Biol. 2006 Apr; 13( 3): 579-613. PMID: 16706714
Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol. 2004; 3: Article4. Epub 2004 Mar 22. PMID: 16646820 [PubMed]
Supervised detection of regulatory motifs in DNA sequences. Stat Appl Genet Mol Biol. 2003; 2: Article5. Epub 2003 Aug 25. PMID: 16646783 [PubMed]
Framework for kernel regularization with application to protein clustering. Proc Natl Acad Sci U S A. 2005 Aug 30; 102( 35): 12332-7. Epub 2005 Aug 18. PMID: 16109767
Expression profiling of GABAergic motor neurons in Caenorhabditis elegans. Curr Biol. 2005 Feb 22; 15( 4): 340-6. PMID: 15723795
Regulatory motif finding by logic regression. Bioinformatics. 2004 Nov 1; 20( 16): 2799-811. Epub 2004 May 27. PMID: 15166027
Exploratory and confirmatory gene expression profiling of mac1Delta. J Biol Chem. 2004 Feb 6; 279( 6): 4450-8. Epub 2003 Oct 8. PMID: 14534306 