SOFTWARD TO FACILITATE ADOPTION OF THE SEQUENCE ONTOLOGY FOR GENOME MANAGEMENT
Karen Louise Eilbeck
University Of Utah, 75 South 2000 East, Salt Lake City, Ut 84112
Grant 5R01HG004341-04 from National Human Genome Research Institute
Abstract: Genome annotations combine sequence, the results of bioinformatics analyses, and the knowledge of human curators into models of gene structure. These annotations provide a basic resource for investigations into the genetic causes of human disease. Despite their potential as a resource for such studies, genome annotations have proven difficult to use. A major reason for this has been the lack of community standards for describing them, which has resulted in the proliferation of arbitrary file formats and database schemas. In order to solve this problem, the Gene Ontology Consortium has developed the Sequence Ontology (SO). The purpose of SO is unify the of genome annotations. Many model organism databases such as SGD, WormBase and FlyBase have now adopted SO, and release their annotations in SO-compliant formats. Many other genome databases are attempting to follow suite, but are finding it difficult to do so. One reason for their difficulties is the lack of publicly available software for managing and distributing SO- compliant genome annotations. The goal of this proposal is to further develop, improve and consolidate existing software tools that will help the broader genomics community to use the Sequence Ontology as a tool to produce, manage, and disseminate SO-compliant genome annotations. Our proposed data adapters and converters will help bring old annotation data and software forward; our SO-based quality control pipelines will ensure that the data produced by different databases is indeed interoperable; and our navigation and database search tools will help human curators to produce higher quality SO-compliant genome annotations
Keywords: Adopted; Adoption; Algorithms; Alternate Splicing; Alternative Splicing; Animal Model; Animal Models and Related Studies; Base Sequence; Bio-Informatics; Bioinformatics; Biological; Code; Coding System; Codon, Stop; Codon, Termination; Codon, Terminator; Communities; Computer Programs; Computer Software Tools; Computer software; Controlled Vocabulary; Data; Data Banks; Data Bases; Data Storage and Retrieval; Databank, Electronic; Databanks; Database, Electronic; Databases; Documentation; Drugs, Nonproprietary; Ensure; Exons; Future; Gene Organization; Gene Splicing; Gene Structure; Gene Structure/Organization; Generations; Generic Drugs; Genes; Genetic; Genome; Genomics; Goals; Human; Human, General; Intervening Sequences; Introns; Investigation; Jobs; Knowledge; Label; Libraries; Location; Man (Taxonomy); Man, Modern; Manuals; Maps; Miso; Modeling; Nucleic Acid Regulatory Sequences; Nucleotide Sequence; Occupations; On-Line Systems; Online Systems; Ontology; Output; PROV; Problem Solving; Process; Professional Postions; Provider; Quality Control; RNA Splicing; RNA Splicing, Alternative; Regulator Regions, Nucleic Acid; Regulatory Regions; Regulatory Regions, Nucleic Acid (Genetics); Regulatory Sequences, Nucleic Acid; Research Resources; Resources; Scientist; Semantic; Semantics; Services; Software; Software Tools; Software Validation; Software Verification; Spliced Genes; Splicing; Stop Signal, Translation; Terminator Codon; Terminology; Testing; Tools, Software; Transcript; Update; Validation; base; clinical data repository; clinical data warehouse; computer program/software; data modeling; data repository; data retrieval; data storage; file format; generic; genetic regulatory element; genome database; genome sequencing; human disease; improved; interoperability; model organism; model organisms databases; nucleic acid sequence; online computer; online tutorial; preference; relational database; structural genomics; syntactic; syntax; tool; web based
Project start date: 2007-08-15
Project end date: 2011-06-30
Budget start date: 1-JUL-2010
Budget end date: 30-JUN-2011
PFA/PA: PAR-05-057
5R01HG004341-04 (2010): $181517
Sponsored Links Excellgen http://Excellgen.com
SOFTWARD TO FACILITATE ADOPTION OF THE SEQUENCE ONTOLOGY FOR GENOME MANAGEMENT
Karen Louise Eilbeck
University Of Utah, 75 South 2000 East, Salt Lake City, Ut 84112
Grant 5R01HG004341-03 from National Human Genome Research Institute
Abstract: Genome annotations combine sequence, the results of bioinformatics analyses, and the knowledge of human curators into models of gene structure. These annotations provide a basic resource for investigations into the genetic causes of human disease. Despite their potential as a resource for such studies, genome annotations have proven difficult to use. A major reason for this has been the lack of community standards for describing them, which has resulted in the proliferation of arbitrary file formats and database schemas. In order to solve this problem, the Gene Ontology Consortium has developed the Sequence Ontology (SO). The purpose of SO is unify the of genome annotations. Many model organism databases such as SGD, WormBase and FlyBase have now adopted SO, and release their annotations in SO-compliant formats. Many other genome databases are attempting to follow suite, but are finding it difficult to do so. One reason for their difficulties is the lack of publicly available software for managing and distributing SO- compliant genome annotations. The goal of this proposal is to further develop, improve and consolidate existing software tools that will help the broader genomics community to use the Sequence Ontology as a tool to produce, manage, and disseminate SO-compliant genome annotations. Our proposed data adapters and converters will help bring old annotation data and software forward; our SO-based quality control pipelines will ensure that the data produced by different databases is indeed interoperable; and our navigation and database search tools will help human curators to produce higher quality SO-compliant genome annotations
Keywords: Adopted; Adoption; Algorithms; Alternate Splicing; Alternative Splicing; Animal Model; Animal Models and Related Studies; Base Sequence; Bio-Informatics; Bioinformatics; Biological; Code; Coding System; Codon, Stop; Codon, Termination; Codon, Terminator; Communities; Computer Programs; Computer Software Tools; Computer software; Controlled Vocabulary; Data; Data Banks; Data Bases; Data Storage and Retrieval; Databank, Electronic; Databanks; Database, Electronic; Databases; Documentation; Drugs, Nonproprietary; Ensure; Exons; Future; Gene Organization; Gene Splicing; Gene Structure; Gene Structure/Organization; Generations; Generic Drugs; Genes; Genetic; Genome; Genomics; Goals; Human; Human, General; Intervening Sequences; Introns; Investigation; Jobs; Knowledge; Label; Libraries; Location; Man (Taxonomy); Man, Modern; Manuals; Maps; Miso; Modeling; Nucleic Acid Regulatory Sequences; Nucleotide Sequence; Occupations; On-Line Systems; Online Systems; Ontology; Output; PROV; Problem Solving; Process; Professional Postions; Provider; Quality Control; RNA Splicing; RNA Splicing, Alternative; Regulator Regions, Nucleic Acid; Regulatory Regions; Regulatory Regions, Nucleic Acid (Genetics); Regulatory Sequences, Nucleic Acid; Research Resources; Resources; Scientist; Semantic; Semantics; Services; Software; Software Tools; Software Validation; Software Verification; Spliced Genes; Splicing; Stop Signal, Translation; Terminator Codon; Terminology; Testing; Tools, Software; Transcript; Update; Validation; Vocabulary, Controlled; base; clinical data repository; clinical data warehouse; computer program/software; data modeling; data repository; data retrieval; data storage; databases, model organisms; file format; generic; genetic regulatory element; genome database; genome sequencing; human disease; improved; interoperability; model organism; model organisms databases; nucleic acid sequence; online computer; online tutorial; preference; relational database; structural genomics; syntactic; syntax; tool; web based
Project start date: 2007-08-15
Project end date: 2011-06-30
Budget start date: 1-JUL-2009
Budget end date: 30-JUN-2010
PFA/PA: PAR-05-057
5R01HG004341-03 (2009): $178010
5R01HG004341-02 (2008): $172825