Creating a Model Organism Database: The Importance and Process
In the realm of bioinformatics, the creation of Model Organism Databases (MODs) plays a crucial role in advancing genomic research and understanding the complexities of various organisms. MODs facilitate pathway analyses, omics data analysis, and the development of metabolic models, serving as valuable resources for the scientific community. These databases are curated by experts and integrate genetic and biochemical information essential for in-depth studies. The rationale behind MODs lies in addressing incomplete genome information and inaccurate gene function assignments, enabling global analyses and characterization of metabolic and genetic networks. Continuous curation of MODs ensures accuracy by updating gene functions, pathways, and regulatory networks, incorporating experimental findings, and refining predictions. However, challenges such as project scoping, community engagement, funding, and technical setup need to be addressed to establish effective public MODs.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Creating a Community Database Organism-Specific Database Model-Organism Database
SRI International Bioinformatics Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism, update on ongoing basis Create a metabolic model Perform comparative analyses
SRI International Bioinformatics Model Organism Databases DBs that describe the genome and other information about an organism Curated by experts for that organism No one group can curate all the world s genomes Distribute workload across a community of experts to create a community resource Every sequenced organism with an active experimental community requires a MOD Integrate genome data with information about the biochemical and genetic network of the organism Integrate literature-based information with computational predictions
SRI International Bioinformatics Rationale for MODs Each complete genome is incomplete in several respects: 40%-60% of genes have no assigned function Roughly 7% of those assigned functions are incorrect Many assigned functions are non-specific MODs are platforms for global analyses of an organism Interpret omics data in a pathway context In silico prediction of essential genes Characterize systems properties of metabolic and genetic networks
SRI International Bioinformatics What is Curation? Ongoing updating and refinement of a PGDB Correct false-positive and false-negative predictions Incorporate information from experimental literature Update genome sequence Update gene functions, gene positions, gene names Author comments and citations Add new pathways, modify existing pathways Enter information about regulatory networks
SRI International Bioinformatics Issues in Creating Public MODs Scope/prioritize the project Identify user community Obtain buy-in and help from scientific community Obtain funding IT: Set up database server, Web server Hire and train curators
SRI International Bioinformatics New Pathway Tools Releases Major releases = External software releases Twice per year Announced on ptools-users mailing list Minor releases twice per year affect only our BioCyc.org Web site and flatfile distributions We support one prior release only Releases announced on ptools-users@ai.sri.com Read release notes at http://brg.ai.sri.com/ptools/release-notes.html Install process: Upgrade schema of your DB (software assisted)
SRI International Bioinformatics PGDB Storage: File or Relational Database File storage: Advantages: No RDBMS installation and configuration Disadvantages: Must be loaded and saved in its entirety No transaction history No concurrent access for multiple users MySQL storage: Advantages: Faster read access, faster saves Concurrent update access for multiple users Stores transaction history of all PGDB updates Disadvantages: RDBMS must be installed and configured
SRI International Bioinformatics Multiuser Access to PGDBs PGDB stored within one MySQL server Each curator installs PTools on their computer Curator computers query RDBMS server via internet For each frame access, PTools queries In-memory cache, disk cache, RDBMS server After curator saves changes, all changes made by other users are loaded into curator s session
SRI International Bioinformatics How to Release a PGDB? Decide on release frequency and schedule Don t wait until it s perfect to release it! Quality assurrance Run consistency checker Tools -> Consistency Checker Also updates organism-summary statistics Update publications, authors in organism frame Update via Organism editor Create new version of PGDB ptools-local/pgdbs/yeastcyc/1.0/kb/yeastbase.ocelot Edit against the new version, release the old version Author release notes Register PGDB in SRI PGDB registry Will allow SRI to include it in BioCyc
SRI International Bioinformatics Pathway Tools Data Import/Export File->Export File->Import Export/import to/from tab-delimited files Export to Genbank, GFF3 (soon), SBML, BioPAX Export to attribute-value files Attribute-value files can be imported into BioWarehouse Relational database system for bioinformatics database integration
SRI International Bioinformatics Registry: Public PGDB Sharing PGDB registry maintained by SRI at URL http://biocyc.org/registry.html Registry operations List contents of registry Download PGDBs listed in the registry Register PGDBs you have created
SRI International Bioinformatics Registry Details Why register your PGDB? Facilitate its download by other scientists Facilitate its inclusion in BioCyc.org Why download a PGDB? Desktop Navigator provides faster/more functionality than Web Comparative operations Programmatic querying and processing of PGDB
SRI International Bioinformatics Changes Planned for BioCyc.org BioCyc will be starting a subscription model July 1
SRI International Bioinformatics Why? Government funding for databases shrinking BioCyc funding cut 27% as number of genomes climbed 5X in 5 years No other foreseeable sources of funding for "Big Knowledge" in life sciences Goals: Create high-quality curated EcoCyc-like DBs for many organisms Couple with extensive user-friendly bioinformatics tools
SRI International Bioinformatics How? Subscription access to BioCyc.org by institutions, individuals Subscription rates will depend on usage levels from previous year EcoCyc and MetaCyc will remain free Pathway Tools will remain free