An Overview of Biological Databases in Bioinformatics

Biological databases
Pinaki Kr. Rabha
 
 
A database is a vast collection of data pertaining
to a specific topic, e.g., 
nucleotide sequence
,
protein sequence
, etc. in an electronic
environment. Databases are at the heart of
bioinformatics. There is a very large number of
databases, which is growing rapidly. The
biological information can be stored in different
databases. Each database has its own website
with unique navigation tools. The biological
databases are, in general, publicly accessible.
 
Types of databases: 3 types on the basis of source.
Primary databases:
 Primary databases 
contain original biological data. They are
archives of raw sequence or structural data submitted by the
scientific community. It can also be called an archival database
since it archives the experimental results submitted by the
scientists. The primary database is populated with
experimentally derived data like genome sequence,
macromolecular structure, etc.
The data are given accession numbers when they are entered
into the database. The same data can later be retrieved using
the accession number. Accession number identifies each data
uniquely and it never changes.
 
Examples –
N
ucleotide sequence:
European Molecular Biology Laboratory (EMBL) database,
GenBank and
 DDBJ
Protein Databases are: PDB, PIR, Metacyc, etc.
 
 GenBank
GenBank is physically located in the USA and is
accessible through the NCBI portal over the intern.
The GenBank sequence database is open access,
annotated collection of all publicly
available nucleotide sequences and
their protein translations. This database is produced
and maintained by the National Center for
Biotechnology Information (NCBI) as part of
the International Nucleotide Sequence Database
Collaboration (INSDC).  GenBank has become an
important database for research in biological fields
and has grown in recent years at an exponential
rate by doubling roughly every 18 months.
 
EMBL (European Molecular Biology Laboratory)
EMBL (European Molecular Biology Laboratory)
is in UK. 
The European Molecular Biology
Laboratory (EMBL) Nucleotide Sequence
Database is a comprehensive collection of
primary nucleotide sequences maintained at the
European Bioinformatics Institute (EBI). Data are
received from genome sequencing centers,
individual scientists and patent offices.
 
DDBJ (DNA databank of Japan)
It is located at the National Institute of
Genetics (NIG) in the Shizuoka prefecture of
Japan. It is the only nucleotide sequence data
bank in Asia. Although DDBJ mainly receives its
data from Japanese researchers, it can accept
data from contributors from any other country.
 
The three nucleotide sequence databases are
closely collaborate and exchange new data daily.
They together constitute the 
International
Nucleotide Sequence Database Collaboration
.
This means that by connecting to any one of the
three databases, one should have access to the
same nucleotide sequence data.
 
 Secondary Database :
 
Sequence annotation information in the primary
database is often minimal. To turn the raw sequence
information into more sophisticated biological
knowledge, much post processing of the sequence
information is needed. This begs the need for
secondary databases, which contain computationally
processed sequence information derived from the
primary databases.
 
Secondary databases comprise data derived from the
results of analysing primary data. Secondary databases
often draw upon information from numerous sources,
including other databases (primary and secondary),
controlled vocabularies and the scientific literature.
 
 
Computational algorithms are applied to the
primary database and meaningful and
informative data is stored inside the
secondary database.
 
They are highly curated, often using a complex
combination of computational algorithms and
manual analysis and interpretation to derive
new knowledge from the public record of
science
 
Examples –
 
A prominent example of secondary databases is 
SWISS-
PROT,
 which provides detailed sequence annotation
that includes structure, function, and protein family
assignment. The sequence data are mainly derived
from TrEMBL, a database of translated nucleic acid
sequences stored in the EMBL database.
 
Other Examples of Secondary databases are as follows.
 
InterPro (protein families, motifs, and domains)
 
UniProt Knowledgebase (sequence and functional
information on proteins)
 
Ensembl (variation, function, regulation and more
layered onto whole genome sequences).
 
 
 
Composite Databases :
Composite database 
amalgamates a variety of different
primary database sources
. Different composite database
use different primary database and different criteria in their
search algorithm. Various options for search have also been
incorporated in the composite database.
      The data entered in these types of databases are first
compared and then              filtered based on desired criteria.
The initial data are taken from the primary database, and then
they are merged together based on certain conditions.
It helps in searching sequences rapidly. Composite Databases
contain non-redundant data.
Examples –
Examples of Composite Databases are as follows.
Composite Databases -OWL,NRD and Swissport +TrEMBL
 
 
Importance of biological database:
Databases act as a store house of information.
Databases are used to store and organize data in such a way that
information can be retrieved easily via a variety of search criteria.
It allows knowledge discovery, which refers to the identification of
connections between pieces of information that were not known
when the information was first entered. This facilitates the
discovery of new biological insights from raw data.
Secondary databases have become the molecular biologist’s
reference library over the past decade or so, providing a wealth of
information on just about any gene or gene product that has been
investigated by the research community.
It helps to solve cases where many users want to access the same
entries of data.
Allows the indexing of data.
 
Slide Note
Embed
Share

Biological databases play a crucial role in bioinformatics, storing vast amounts of data related to nucleotide sequences, protein sequences, and more. These databases are publicly accessible and essential for research in biological fields. Primary databases, such as GenBank, EMBL, and DDBJ, contain original biological data and are key resources for scientists and researchers worldwide.

  • Biological databases
  • Bioinformatics
  • GenBank
  • EMBL
  • DDBJ

Uploaded on Jul 22, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Biological databases Pinaki Kr. Rabha

  2. A database is a vast collection of data pertaining to a specific topic, e.g., nucleotide sequence, protein sequence, etc. environment. Databases are at the heart of bioinformatics. There is a very large number of databases, which is biological information can be stored in different databases. Each database has its own website with unique navigation tools. The biological databases are, in general, publicly accessible. in an electronic growing rapidly. The

  3. Types of databases: 3 types on the basis of source. Primary databases: Primary databases contain original biological data. They are archives of raw sequence or structural data submitted by the scientific community. It can also be called an archival database since it archives the experimental results submitted by the scientists. The primary database experimentally derived data macromolecular structure, etc. The data are given accession numbers when they are entered into the database. The same data can later be retrieved using the accession number. Accession number identifies each data uniquely and it never changes. is populated genome with like sequence,

  4. Examples Nucleotide sequence: European Molecular Biology Laboratory (EMBL) database, GenBank and DDBJ Protein Databases are: PDB, PIR, Metacyc, etc.

  5. GenBank GenBank is physically located in the USA and is accessible through the NCBI portal over the intern. The GenBank sequence database is open access, annotated collection available nucleotide their protein translations. This database is produced and maintained by the National Center for Biotechnology Information the International Nucleotide Sequence Database Collaboration (INSDC). GenBank has become an important database for research in biological fields and has grown in recent years at an exponential rate by doubling roughly every 18 months. of all publicly sequences and (NCBI) as part of

  6. EMBL (European Molecular Biology Laboratory) EMBL (European Molecular Biology Laboratory) is in UK. The European Molecular Biology Laboratory (EMBL) Database is a comprehensive collection of primary nucleotide sequences maintained at the European Bioinformatics Institute (EBI). Data are received from genome sequencing centers, individual scientists and patent offices. Nucleotide Sequence

  7. DDBJ (DNA databank of Japan) It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan. It is the only nucleotide sequence data bank in Asia. Although DDBJ mainly receives its data from Japanese researchers, it can accept data from contributors from any other country.

  8. The three nucleotide sequence databases are closely collaborate and exchange new data daily. They together constitute the International Nucleotide Sequence Database Collaboration. This means that by connecting to any one of the three databases, one should have access to the same nucleotide sequence data.

  9. Secondary Database : Sequence annotation information in the primary database is often minimal. To turn the raw sequence information into more knowledge, much post processing of the sequence information is needed. This begs the need for secondary databases, which contain computationally processed sequence information derived from the primary databases. Secondary databases comprise data derived from the results of analysing primary data. Secondary databases often draw upon information from numerous sources, including other databases (primary and secondary), controlled vocabularies and the scientific literature. sophisticated biological

  10. Computational algorithms are applied to the primary database and informative data is secondary database. They are highly curated, often using a complex combination of computational algorithms and manual analysis and interpretation to derive new knowledge from the public record of science meaningful stored and the inside

  11. Examples A prominent example of secondary databases is SWISS- PROT, which provides detailed sequence annotation that includes structure, function, and protein family assignment. The sequence data are mainly derived from TrEMBL, a database of translated nucleic acid sequences stored in the EMBL database. Other Examples of Secondary databases are as follows. InterPro (protein families, motifs, and domains) UniProt Knowledgebase (sequence and functional information on proteins) Ensembl (variation, function, regulation and more layered onto whole genome sequences).

  12. Composite Databases : Composite database amalgamates a variety of different primary database sources. Different composite database use different primary database and different criteria in their search algorithm. Various options for search have also been incorporated in the composite database. The data entered in these types of databases are first compared and then filtered based on desired criteria. The initial data are taken from the primary database, and then they are merged together based on certain conditions. It helps in searching sequences rapidly. Composite Databases contain non-redundant data. Examples Examples of Composite Databases are as follows. Composite Databases -OWL,NRD and Swissport +TrEMBL

  13. Importance of biological database: Databases act as a store house of information. Databases are used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria. It allows knowledge discovery, which refers to the identification of connections between pieces of information that were not known when the information was first entered. This facilitates the discovery of new biological insights from raw data. Secondary databases have become the molecular biologist s reference library over the past decade or so, providing a wealth of information on just about any gene or gene product that has been investigated by the research community. It helps to solve cases where many users want to access the same entries of data. Allows the indexing of data.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#