An Overview of Biological Databases in Bioinformatics

Slide Note
Embed
Share

Biological databases play a crucial role in bioinformatics, storing vast amounts of data related to nucleotide sequences, protein sequences, and more. These databases are publicly accessible and essential for research in biological fields. Primary databases, such as GenBank, EMBL, and DDBJ, contain original biological data and are key resources for scientists and researchers worldwide.


Uploaded on Jul 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Biological databases Pinaki Kr. Rabha

  2. A database is a vast collection of data pertaining to a specific topic, e.g., nucleotide sequence, protein sequence, etc. environment. Databases are at the heart of bioinformatics. There is a very large number of databases, which is biological information can be stored in different databases. Each database has its own website with unique navigation tools. The biological databases are, in general, publicly accessible. in an electronic growing rapidly. The

  3. Types of databases: 3 types on the basis of source. Primary databases: Primary databases contain original biological data. They are archives of raw sequence or structural data submitted by the scientific community. It can also be called an archival database since it archives the experimental results submitted by the scientists. The primary database experimentally derived data macromolecular structure, etc. The data are given accession numbers when they are entered into the database. The same data can later be retrieved using the accession number. Accession number identifies each data uniquely and it never changes. is populated genome with like sequence,

  4. Examples Nucleotide sequence: European Molecular Biology Laboratory (EMBL) database, GenBank and DDBJ Protein Databases are: PDB, PIR, Metacyc, etc.

  5. GenBank GenBank is physically located in the USA and is accessible through the NCBI portal over the intern. The GenBank sequence database is open access, annotated collection available nucleotide their protein translations. This database is produced and maintained by the National Center for Biotechnology Information the International Nucleotide Sequence Database Collaboration (INSDC). GenBank has become an important database for research in biological fields and has grown in recent years at an exponential rate by doubling roughly every 18 months. of all publicly sequences and (NCBI) as part of

  6. EMBL (European Molecular Biology Laboratory) EMBL (European Molecular Biology Laboratory) is in UK. The European Molecular Biology Laboratory (EMBL) Database is a comprehensive collection of primary nucleotide sequences maintained at the European Bioinformatics Institute (EBI). Data are received from genome sequencing centers, individual scientists and patent offices. Nucleotide Sequence

  7. DDBJ (DNA databank of Japan) It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is the only nucleotide sequence data bank in Asia. Although DDBJ mainly receives its data from Japanese researchers, it can accept data from contributors from any other country.

  8. The three nucleotide sequence databases are closely collaborate and exchange new data daily. They together constitute the International Nucleotide Sequence Database Collaboration. This means that by connecting to any one of the three databases, one should have access to the same nucleotide sequence data.

  9. Secondary Database : Sequence annotation information in the primary database is often minimal. To turn the raw sequence information into more knowledge, much post processing of the sequence information is needed. This begs the need for secondary databases, which contain computationally processed sequence information derived from the primary databases. Secondary databases comprise data derived from the results of analysing primary data. Secondary databases often draw upon information from numerous sources, including other databases (primary and secondary), controlled vocabularies and the scientific literature. sophisticated biological

  10. Computational algorithms are applied to the primary database and informative data is secondary database. They are highly curated, often using a complex combination of computational algorithms and manual analysis and interpretation to derive new knowledge from the public record of science meaningful stored and the inside

  11. Examples A prominent example of secondary databases is SWISS- PROT, which provides detailed sequence annotation that includes structure, function, and protein family assignment. The sequence data are mainly derived from TrEMBL, a database of translated nucleic acid sequences stored in the EMBL database. Other Examples of Secondary databases are as follows. InterPro (protein families, motifs, and domains) UniProt Knowledgebase (sequence and functional information on proteins) Ensembl (variation, function, regulation and more layered onto whole genome sequences).

  12. Composite Databases : Composite database amalgamates a variety of different primary database sources. Different composite database use different primary database and different criteria in their search algorithm. Various options for search have also been incorporated in the composite database. The data entered in these types of databases are first compared and then filtered based on desired criteria. The initial data are taken from the primary database, and then they are merged together based on certain conditions. It helps in searching sequences rapidly. Composite Databases contain non-redundant data. Examples Examples of Composite Databases are as follows. Composite Databases -OWL,NRD and Swissport +TrEMBL

  13. Importance of biological database: Databases act as a store house of information. Databases are used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria. It allows knowledge discovery, which refers to the identification of connections between pieces of information that were not known when the information was first entered. This facilitates the discovery of new biological insights from raw data. Secondary databases have become the molecular biologist s reference library over the past decade or so, providing a wealth of information on just about any gene or gene product that has been investigated by the research community. It helps to solve cases where many users want to access the same entries of data. Allows the indexing of data.

More Related Content