Newest Viewed Downloaded

Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark nikob@cbs.dtu.dk ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803

Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark nikob@cbs.dtu.dk ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803

Outline

Magnitudes and Scales Resources: Data Sources & Tools Primary DNA sources Sequence Repositories Structure Repositories Functional Categorization Integration of Databases The Human Genome Genome Browsers Prediction Tools Evaluation of Prediction Servers Starting points Link collections

Resources: Sources & Tools

There is A LOT OF biomolecular databases/sources A LOT OF overlap of information/redundancy A LOT OF TOOLS Personal picks/preferences User-friendliness Update intervals Curation efforts / error correction Linkage to other DBs

Faster than Moore’s law...

Human Genome Published HUGO: Nature, 15.feb.2001 Celera: Science, 16.feb.2001

Magnitudes and Scales

Human genome 3,200,000,000 bp Single basepair  full genome is 9 orders of magnitude Genome = Football field: ~3 billion leaves of grass Single base A T G C (or SNP) = 1 leaf of grass Genome browsing Zooming from whole stadium to single leaf

How we got the sequence

Sanger chain termination method

Primary DNA sources

Trace files repositories Single read: 500-1000 bp (~golf ball size / jig saw puzzle) Variable quality WashU-Merck Human EST Project / Trace files ”Base-calling” non-trivial

Assembly is Non-trivial!

Sequence repositories - GenBank et al.

GenBank / EMBL / DDBJ Highly redundant (many versions of same gene) Cross-updated daily Version history is recorded Previous sequence records can be retrieved Contigs/HTGS (100-200 kb) finishing at different stages Draft  Finished Includes genomic DNA, cDNA, ESTs, translated peptides

Non-redundant and Curated databases

Non-redundant Manual or automatic curation DNA RefSeq (NCBI; semi-automated) Ensembl gene index (automated) Protein RefSeq (NCBI; semi-automated) TrEMBL (EMBL; automated)

Curated database: UniProt/SwissProt

SIB - Swiss Institute of Bioinformatics Protein Knowledgebase / Sequence Database Highly curated Experimental evidence evaluated (e.g. modifications) All 80,000 entries checked by Amos Bairoch himself ;-) ExPASy - Expert Protein Analysis System Proteomics tools: links + local servers

Structure databases / Protein Data Bank (PDB)

X-ray , NMR biomolecular structures Protein Data Bank (PDB) >22,000 structures (April 2003) http://www.rcsb.org/pdb/

Functional Categorization

Gene Ontology (GO) Hierarchical Controlled vocabulary

Functional Categorization

Gene Ontology (GO) http://www.geneontology.org/ Molecular Function - the tasks performed by individual gene products; examples are transcription factor and DNA helicase Biological Process - broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component - subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

Integration of databases - Webs of web-sites

http://srs.ebi.ac.uk/ Links, links, links... SRS = Sequence Retrieval System Powerful, complex query language BioDAS – Distributed Annotation System

For ’my gene’, how do I:

Get an overview of the sequence information known? (GeneCards) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) (Evaluate the value of predicted features)

GeneCards http://nciarray.nci.nih.gov/cards/

GeneCards-II

GeneCards-III

Showing 1 - 20 of 54 items Details

Name: 
nb_27803_resources
Author: 
Lars Juhl Jensen
Company: 
CBS
Description: 
Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark nikob@cbs.dtu.dk ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803
Tags: 
genom | predict | browser | http | gene | server | ncbi | sequenc
Created: 
11/1/2000 10:47:23 AM
Slides: 
54
Views: 
3
Downloads: 
2
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap