Skip to Main Content

Genomics: Genomic information resources video tutorials

Video tutorials guiding the use of online genomic information resources

The videos below and corresponding description (including timeline of contents) were found by Kyle Peppler, an undergraduate microbiology major who worked with the Health Science Center Library on a bioinformatics internship funded by Smathers Libraries' internship program.  For questions regarding the videos contact Kyle at pepplerk@ufl.edu.

Gene sequence retrieval using NCBI web and Edirect tools

This video demonstrates the use of NCBI web resources and Edirect Unix command interface for retrieval of gene sequence information

Timeline of contents

  • Accessing sequences on the web through NCBI’s Entrez system (1:26)
  • Distinguishing between the GenBank and FASTA formats (2:39)
  • Downloading a sequence in GenBank or FASTA format from the NCBI Nucleotide database (3:38)
  • Obtaining a sequence through a direct NCBI URL and bookmark for easy access (4:50)
  • Provides link to video tutorial on the use of Edirect (7:36).  The link to the Edirect video is https://youtu.be/BLNnYW33Mtb0
  • Provides link to additional online help resources for Edirect, which can be found at https://github.com/NCBI-Hackathons/EDirectCookbook (9:16)
  • Live search on NCBI’s web interface using human adenine phosphoribosyltransferase as an example (9:38)
  • Live search using NCBI’s Unix command line interface (EDirect) with human adenine phosphoribosyltransferase as an example (15:39)

 Organism(s) in which the information in the database was derived: Humans, plants, insects, marine life, etc.

Retrieving gene information using the NCBI Gene database

The NCBI Gene database is a one-stop location for obtaining gene information including sequence, transcripts, proteins, and expression levels by tissue.  It links to other NCBI as well as external resources for identifying sequence similarity, genetic variations, population genetics, etc. 

Timeline of contents:

Using the EME1 gene (involved in DNA repair) as an example, this video demonstrates

  • How to determine the directionality of a gene (0:35)
  • How to select the portion of the gene in which we would like to obtain the sequence (1:17)
  • How to obtain exon coordinates from the gene table display (1:52)

Organism(s) in which the information in the database was derived: Humans, plants, insects, and marine life, etc.  Refer to https://www.ncbi.nlm.nih.gov/gene/statistics/ for a complete list

Viral genome information from NCBI

This video covers several viral genome information resources from NCBI including the Viral genome database, Viral Variation tool, as well as Retrovirus database and shows how to

  • Access a list of viral genomes for a range of species in the Viral genomes database (4:46)
  • Retrieve reference genome sequences for all viruses that infect a particular viral host in the Viral genomes database (6:17)
  • Use the Viral Variation tool to identify all projects where a particular viral gene has been sequenced (12:09)
  • Use the Viral Variation tool to trace the evolutionary history of viruses using alignment or tree viewer (21:21)
  • Retrieve information regarding the interactions between HIV-1 and human proteins using the Retrovirus database (25:26)

Organism(s) in which the information in the database was derived: This database houses the genetic information on numerous viruses, such as a DNA virus or retro virus. For a complete list, please refer to https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239

Introduction to Ensembl genome browser

Ensembl houses genomic information for a variety of organisms. Ensembl features many of the tools found on the NCBI Gene database, such as viewing of gene structure (i.e. introns and exons). In Ensembl, sequences can be downloaded in two different file extensions: FASTA and RTF.  Ensembl is hosted by the European Bioinformatics Institute.

This video uses the human BRCA2 gene (a tumor suppressor gene) as an example search and demonstrates

  • How to navigate through the Ensembl website from genome to transcript with ease using the tabs feature of Ensembl (0:45)
  • How to access information on an individual transcript via the transcript table (1:24)
  • How to easily identify whether a record pertains to a gene, transcript, or protein by the prefix of its accession number ENST, ENSG, or ENSP (2:05)
  • How to download the gene sequences of interest in FASTA or rich text format (3:43)

Organism(s) in which the information in Ensembl was derived include

  • Humans
  • Birds
  • Reptiles
  • Rodents

A full list of available organisms can be found in the dropdown box on the Ensembl site found at the following link: https://useast.ensembl.org/info/about/species.html

Introduction EuPathDB: a pathogen genomic database

EuPathDB is a family of databases that house genetic information (sequence, variation, orthologs) on various types of pathogens.

This video uses EuPathDB’s CryptoDB database as an example.  Other databases housed within EuPathDB share the same interface.  The video discusses

  • How files are organized in the downloads tab (0:28)
  • Cryptosporidium parvum Iowa II (a protozoa that infects the mammalian intestinal tract and causes abdominal pain as well as diarrhea) was used as an example for how to download the sequence for a genome in its entirety in FASTA format and general feature format (A GFF file contains the annotated information of a genome such as location of genes, transcripts, coding sequences, introns, and exons.  For an example of a GFF file refer to this link) (0:38)
  • How to download a sequence in FASTA format (1:07)
  • How to download a GFF file (1:48)

Organism(s) in which the information in a database was derived: This database houses genetic information on various pathogens such as the following:

  • Amoeba                                                      
  • Cryptosporidium                                      
  • Giardia                                                       
  • Plasmodium                                             
  • Trichomonas    
  • Kinetoplastid (Tritryp)     
  • Microsporidia
  • Fungi
  • Piroplasma
  • Toxoplasma

Guide for using viral genome information resource - viruSite

viruSITE houses genomic and proteomic information for a range of viruses.  Data from viruSITE are extracted from NCBI RefSeq, Uniprot Knowledgebase (UniprotKB), ViralZone, and PubMed.  This database is maintained by the Institute of Molecular Biology at the Slovak Academy of Science.  You can access viruSITE through its URL http://www.virusite.org/.

The video above introduces to us the general layout and the type of information that we can find in a viruSITE record.  Using the human herpesvirus 7 as an example

  • This video points to the main entry for human herpesvirus 7, which contains taxonomic and genomic information.  In addition, the page with the main entry tells us the host in which this virus infects and provides external links to additional information (0:35).
  • This video points to additional features in the herpesvirus 7 record including links to the genome browser, genomic sequence, and viruses that are closely related to human herpesvirus 7.  BLAST is used to help infer which viruses are closely related to the human herpesvirus 7 (0:42).
  • The author of the video gives a breakdown of the genomic information found in the human herpesvirus 7 record including genomic length (i.e. the number of nucleotides in the genome), percentage of the genome that codes for protein, and the number of proteins that are encoded by the genome (3:43).
  • The author also shows the information contained in a typical protein record found in viruSITE.  Here, he used the envelope glycoprotein UL37 from the human herpesvirus 7 as an example and points to information such as the link to amino acid sequence, link to homologous proteins, the genomic position that encodes this protein, length amino acid sequence, molecular weight as well as the isoelectric point (5:30).
  • This video was published on 4/8/2016 and is 8 minutes 50 seconds in duration.

The following two videos focus on viruSITE features that allow for the identification of sequence similarity and protein domains. Protein domains are blocks of conserved amino acids among similar proteins that are essential to protein function. 

The video above shows how to identify similar sequences in viruSITE from a query nucleotide or amino acid sequence.  viruSITE presents results by either listing proteins that are similar to the query sequence or viruses whose genome in which the query sequence can be found.  This video was published on 3/29/2016 and is 3 minutes 23 seconds in duration.

The above video addresses methods to

  • Retrieve genome or protein sequences (0:20).
  • Identify sequence similarity using built in BLAST either through searching the NCBI databases or the viruSITE database (2:44).
  • Obtain genome alignment as illustrated by a circos plot (4:06).
  • Obtain multiple sequence alignment for proteins (4:38).
  • Find protein domains from information derived from the PFAM database (4:53).
  • This video was published on 3/31/2016 and is 7 minutes 10 seconds in duration.

 

viruSITE also allows us to browse by taxon using a taxonomy browser and the video that demonstrates how to do this is below

Similar to NCBI and Ensembl, the virusSITE contains an interactive genome browser and the video posted below shows how to use this feature of the viruSITE database

Guides for using the UCSC Genome Browser

The UCSC Genome Browser is a comprehensive genome database that allows users to accesses graphical models of genes, gene expression, epigenetic (i.e. histone modifications), genetic variants information.  The following videos were published by Katherine West at the University of Glasgow in March, 2018 and thus represent some recent tutorials for the use of the UCSC Genome Browser.

The introductory video above discusses the following

  • Access of mirror sites (0:17)
  • Reference genome assembly (0:34)
  • Searching methods (using human hemoglobin β as an example) (1:28)
  • Graphical display of genes (zooming, panning, etc.) (2:15)
  • Creation of personal account (3:09)

The second video in the series is listed above and it shows how to

  • Customize the gene graphical display to show only the desired information (0.05)
  • Determine direction of transcription, exons, and introns from the gene graphical display (1:36)
  • Viewing clinically relevant SNPs for a gene (2:20)

The third video in the series found above demonstrates how to view gene expression data.

The last video in this series is found above and demonstrates how to view epigenetic information such as histone modifications in genes.

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.