Statistical Methods For The Analysis Of ChlP-chip Data
Sunduz Keles
University Of Wisconsin Madison Suite 6401 Madison, Wi 537151218
Grant 1R01HG003747-01A2 from National Human Genome Research Institute IRG: GCAT
Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.
Keywords: genome, transcription factor, DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training
Project start date: 2007-04-26
Project end date: 2011-03-31
1R01HG003747-01A2 (2007): $282445
Sponsored Links Excellgen http://Excellgen.com
Statistical Methods For The Analysis Of ChlP-chip Data
Sunduz Keles
University Of Wisconsin Madison Suite 6401 Madison, Wi 537151218
Grant 1R01HG003747-01A2 from National Human Genome Research Institute IRG: GCAT
Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.
Keywords: genome, transcription factor, DNA, Drosophilidae, base, binding site, cell, cell line, chromatin immunoprecipitation, community, comprehension, computer program /software, computer system design /evaluation, conditioning, density, element, gait, gene, human, human genetic material tag, microarray technology, model, oligonucleotide, play, role, training
Project start date: 2007-04-26
Project end date: 2011-03-31
1R01HG003747-01A2 (2007): $282445
Grants awarded to Sunduz Keles
STATISTICAL METHODS FOR THE ANALYSIS OF CHLP-CHIP DATA
Sunduz Keles
University Of Wisconsin Madison, 21 N. Park Street, Suite 6401, Madison, Wi 53715-1218
Grant 5R01HG003747-04 from National Human Genome Research Institute
Abstract: With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments
Keywords: Affinity; Algorithms; Analysis, Data; Basal Transcription Factor; Base Pairing; Binding; Binding (Molecular Function); Binding Sites; Bioconductor; Biologic Characteristic; Biological; Biological Characteristics; CHIP assay; Cell Function; Cell Process; Cell physiology; Cellular Function; Cellular Physiology; Cellular Process; ChIP (chromatin immunoprecipitation); Characteristic, Biologic; Characteristics; Cmyc Staining Method; Collection; Combining Site; Common Rat Strains; Communities; Comprehension; DNA; DNA Binding; DNA Binding Interaction; DNA Chips; DNA Microarray; DNA Microarray Chip; DNA Microchips; Data; Data Analyses; Data Set; Dataset; Deoxyribonucleic Acid; Dependency; Dependency (Psychology); Development; Disease; Disorder; Drosophila; Drosophila genus; Dysfunction; ERYF1; ERYF1 protein, human; Erythroid Transcription Factor; Exhibits; Fruit Fly, Drosophila; Functional RNA; Functional disorder; GATA Binding Protein 1; GATA binding protein 1, human; GATA-1; GATA1; GATA1 protein, human; GF1; GWAS; General Transcription Factors; Genome; Genome, Human; Genomics; Human; Human Genome; Human, General; In element; Indium; Internet; Investigators; JUN Family Gene; JUN Proto-oncogene Family; JUN gene; Length; Link; Location; MYC; Mammals, Mice; Mammals, Rats; Man (Taxonomy); Man, Modern; Method LOINC Axis 6; Methodology; Methods; Mice; Microarray Analysis; Microarray-Based Analysis; Modeling; Molecular Interaction; Murine; Mus; NFE1; Nature; Non-Coding; Non-Coding RNA; Oligonucleotide Array; Oligonucleotide Microarrays; Physiopathology; Play; Procedures; Programs (PT); Programs [Publication Type]; RNA Polymerase II Transcription Factor D; RT-PCR; RTPCR; Rat; Rattus; Reactive Site; Research; Research Personnel; Researchers; Resolution; Reverse Transcriptase Polymerase Chain Reaction; Role; Running; SEQ-AN; Sample Size; Sensitivity and Specificity; Sequence Analyses; Sequence Analysis; Simulate; Staging; Statistical Methods; Structure; Subcellular Process; TF2D; TFIID; Technology; Time; Training; Transcription Factor GATA1; Transcription Factor IID; Transcription Factor TFIID; Transcription Factors, General; Validation; WWW; c jun; c-Myc Staining Method; c-jun Gene; chromatin immunoprecipitation; cmyc; cost; density; design; designing; develop software; developing computer software; disease/disorder; emotional dependency; erythrold transcription factor 1; experiment; experimental research; experimental study; fruit fly; genome sequencing; genome wide association scan; genome wide association studies; genome wide association study; genome-wide; genome-wide scan; genomewide association scan; genomewide association studies; genomewide association study; genomewide scan; globin transcription factor 1; globin transcription factor 1, human; human ES cell lines; human GATA1 protein; human embryonic stem cell line; improved; innovate; innovation; innovative; microarray technology; novel; open source; pathophysiology; programs; research study; response; reverse transcriptase PCR; serial analysis of gene expression; social role; software development; tool; transcription factor; web; web site; whole genome association studies; whole genome association study; world wide web
Project start date: 2007-04-26
Project end date: 2011-03-31
Budget start date: 1-APR-2010
Budget end date: 31-MAR-2011
PFA/PA: PAR-06-410
5R01HG003747-04 (2010): $281878
5R01HG003747-03 (2009): $284725