Skip to main content

Genomics: NCBI Gene database

NCBI Gene database

  • NCBI Gene is a convenient database for obtaining gene information including sequence, exon coordinates, expression, etc. 
  • NCBI Gene can be accessed via its URL (https://www.ncbi.nlm.nih.gov/gene).  The Gene database is well curated, well annotated, and non-redundant (i.e. one record per gene per organism). 
  • A user guide for Gene is provided by NCBI and can be found here (https://www.ncbi.nlm.nih.gov/books/NBK3841/) or by clicking on the “help” tab on the top right corner of the Gene page (Figure 1 – a).  

Figure 1

  • We can perform basic keyword searches (Figure 1 – b).
  • We can also perform advanced searches if we click the “advanced” tab (Figure 1 – c).  
  • The search strategy in Gene is similar to other NCBI database.  In advanced search we use Booleans such as “And”/”OR”.  
  • We also have the option to use search fields.  

Figure 2 - search fields for NCBI Gene

In the advanced search interface, we can see the search fields available in Gene if click on the “all fields” drop down (Figure 2 – a).  Among the list of options are gene name and organism.

Figure 3 - sample search in Gene

As an example, let’s search for the human

adenosine deaminase (ADA) gene. 

To do this, we can use Gene’s advanced

search to make our search more specific. 

  • We set our search fields to gene name (ADA) and organism (human). 
  • We will tie these two search fields together with the Boolean AND since we want to find the human ADA gene.
  • Note that the syntax for the advanced search appears in the search builder (Figure 3 – a).
  • This search strategy takes us directly to the human ADA gene record (Figure 4).

Figure 4 - typical record for a gene

  • The first thing that we will see in any record housed within the NCBI Gene database is the gene summary. 
  • There is a also a side bar containing quick links to information such as differential gene expression by tissue, variations for a particular gene, and molecular biology pathways in which the gene is a part of.

Figure 5 - gene location and coordinates

NCBI Gene also provides the location and chromosomal coordinates of genes. 

  • The human ADA gene is located on chromosome 20’s q-arm (Figure 5 – a). 
  • Its most up to date chromosomal coordinates are also provided (Figure 5 – b).   
  • The direction of transcription is denoted by the red arrow (Figure 5 – c).  

 

Figure 6 - gene model viewer - organization

Each gene record within NCBI’s Gene database contains a gene model viewer showing models of transcripts, location of variants, etc.  Information in the gene model viewer is organized into tracks.  For instance, there is a transcript model track (Figure 6 – a).  There is also a track indicating the location of the single nucleotide variants for the human ADA gene (Figure 6 – b).  We can custom configure the tracks that are shown in the gene model viewer (more information in Figure 9).

Figure 7 - gene model viewer - tools & features

  • The gene model viewer provides a search box to find specific features of the gene (Figure 7 – a).
  • We also have arrows for panning left and right in the gene model viewer as well as zoom functions (Figure 7 – b).
  • The m-RNA transcript models are pointed by feature c in Figure 7.  Note that NCBI reports four transcript variants for the human ADA gene. 
  1. Transcripts have accession numbers that begin with NM or NR where NM are transcripts that are translated to proteins and NR are non-protein coding. 
  2. The transcripts accession numbers are listed on the left side of the transcript model.  Thus, the arrows on the transcript models point to the direction of translation. 
  3. The accession numbers for the proteins coded by each transcript is listed on the right side of the transcript model and these numbers begin with NP.
  4. Transcript and protein accession numbers may also begin with XM and XP, denoting predicted transcripts or proteins.
  • Locations of single nucleotide variants in the human ADA gene can be found on the tracks pointed to by feature d in Figure 7.                 

Figure 8 - gene model viewer - goto gene location

Clicking on the “tools” tab (Figure 8 – a) will reveal an option to go to any location within the gene.

Figure 9 - gene model viewer - configure tracks shown

Recall from Figure 6 that the information in the gene model viewer are organized into tracks (i.e.  transcript model track; variation track).  We can custom configure the tracks that we want to see in the gene model viewer.  To do this

  • Click on the “tracks” tab (Figure 9 – a).  Then select “configure tracks” from the list of options (Figure 9 – b)  and the "configure page" box will appear.
  • In the “configure page” box we can check to select a track that we want to view or uncheck if we do not want to view a particular track.

Figure 10 - superimposing sequence on top of mRNA models

If we click on the “ATG” tab (Figure 10 – a) in the gene model browser we can see the gene sequence overlaid on top of the transcript models (Figure 10 – b).

Figure 11 - viewing six-frame amino acid translation

A very useful feature in the gene model viewer is the ability to view the amino acid translation of a particular gene in all six frames.  We can configure the gene model viewer to show the six frame translation in the “configure page” dialogue box (refer to Figure 9 for instructions on accessing this box).  In the “configure page” box, select sequence (Figure 11 – a).  From there, scroll down the list of options and check the box that says “six-frame translations” (Figure 11 – b).  Finally, hit the “configure tab” (Figure 11 – c).

Figure 12 - identifying start codon from six frame translation

The six-frame amino acid translation can inform us of locations for open reading frames (i.e. methionine pointed to by the red arrow in Figure 12).  Note that the methionine pointed to by the red arrow in Figure 12 corresponds to the ATG start codon (red oval).

Figure 13 - downloading gene sequence

 

To download the sequence for a gene of interest, click on the “FASTA” tab on top of the gene model viewer (Figure 13 – a) and we will be directed to a page containing the gene sequence (human ADA gene in this example (Figure 14)).

Figure 14 - selecting region of the gene sequence to download

Figure 15 - downloading FASTA sequence

After we have selected the region of the sequence that we want to download we can click on the “send to” tab (Figure 15 – a).  From there select “complete record” (Figure 15 – b), choose file under “select destination” (Figure 15 – c), and select FASTA under format (Figure 15 – d).  We can then finally hit “create file” (Figure 15 – e) and a file containing our gene interest will be downloaded.  We can view this gene sequence in offline software such as UGENE (ugene.net).

Figure 15 - obtaining exon coordinates

To obtain exon coordinates from a gene record in the NCBI Gene database, click on the “full report” tab on the top left-hand corner of the gene record (Figure 16 – a).  From the list of options, select “gene table (Figure 16 – b) and we will be directed to a page containing exon tables for all of the documented transcript variants (Figure 17).

Figure 16 - exon coordinate table

In the exon table note that there is a column labeled “exon”.  This column shows the coordinates for the exons appearing in the particular mRNA transcript.  The column labeled “coding” are coordinates for the exons that are translated into protein.

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.