Genomics: Databases and Tools


 NCBI Genome Resource Guides Provides access to genome resource guides for selected organisms.

 The database of Genotypes and Phenotypes (dbGaP) contains studies that have investigated the interaction of genotype and phenotype.

Gemma: meta-analysis of functional genomics fata

Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles

Wellcome Sanger Cancer Genome

The Genomics of Drug Sensitivity in Cancer project is an academic research program to identify molecular features of cancers that predict response to anti-cancer drugs

 Catalog of Somatic Mutations in Cancer (COSMIC). This database stores and displays information about somatic mutations in cancer.

Toxicogenomics databases

ctdComparative Toxicogenomics Database: Provides a better understanding of the effects of environmental chemicals on human health. Contains curated data on chemical– and gene–disease relationships.



(Pharmacogenomics Knowledge Base) This curated database collects, encodes, and disseminate knowledge about the impact of human genetic variations on drug response. Here you will find genotype and phenotype data, annotate gene variants and gene-drug-disease relationships, and drug pathways.

 List of FDA-approved drugs with pharmacogenomic information in their labels.

Population-focused databases

 Y chromosome haplotype reference database (YHRD) This is a collaborative effort to collect population data on Y-chromosomal sequences and to create a sufficiently large reference database for use in forensic genetics, paternal lineage and genealogical studies. 

 The Allele Frequency Database (ALFRED) is curated compilation of allele frequency data on DNA sequence polymorphisms in anthropologically defined human populations. It generally includes "frequency estimates only for polymorphisms that have been studied in at least six distinct population samples".
 Ethnic & National Variation Databases. A list of databases created by the Human Genome Variation Society. It includes the Catalogue of Transmission Genetics in Arabs (CTGA) Database and others.

UF Research Computing - Galaxy

 "UF researchers now have access to an instance of Galaxy at the HPC Center - a web-based framework for accessible, reproducible and transparent biological computing. Galaxy has arisen as the most popular framework for providing biological researchers access to the tools they need for the analysis of their data"

Training sessions on Galaxy can be seen here. Please, contact Oleksandr Moskalenko or Matt Gitzendanner for further questions on Galaxy.



 International Sequence Consortium Database

 Mammalian Gene Collection (MGC) provides researchers with unrestricted access to sequence-validated full-length protein-coding (FL-CDS) cDNA clones for for all RefSeq human and mouse genes, and at least 6200 rat genes. contains over 2000 Pathway/Genome Databases, including a number of complete bacterial genomes from the Human Microbiome Project.  Each pathway/genome database in the BioCyc collection describes the genome and metabolic pathways of a single organism.

 The ORFeome Collaboration is an unrestricted source of fully sequence- validated full-ORF human cDNA clones in a format allowing easy transfer of the ORF sequences into virtually any type of expression vector.

GenomeNet Database is a Japanese network of database and computational services for genome research and related research areas in biomedical sciences, operated by the Kyoto University Bioinformatics Center.

 This web page includes a list of model organism databases supported by the National Human Genome Research Institute. It includes FlyBase (database of Drosophila genes and genomes), Mouse Genome Informatics, the Rat Genome Database, Saccharomyces Genome Database, WormBase (The C. elegans Genome Database), and the Zebrafish Information Network.

Developed at the Crown Human Genome Center, Department of Molecular Genetics, Weizmann Institute of Science, The GeneCards human gene database integrates a subset of gene-related transcriptomic, genetic, proteomic, functional and disease information.

 Plant Comparative Genomics Database (PlantGDB) is an NSF-funded project to develop plant species-specific EST and GSS databases, to provide web-accessible tools and inter-species query capabilities, and to provide genome browsing and annotation capabilities.

 Gramene is a curated, open-source, data resource for comparative genome analysis in the grasses.

 Mouse Genome Informatics (MGI) is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease

 Rat Genome Database is a repository of rat genetic and genomic data, as well as mapping, strain, and physiological information.

Microbial Genome Database for Comparative Analysis (MBGD) is a database for comparative analysis of completely sequenced microbial genomes maintained by the National Institute for Basic Biology, National Institutes of Natural Sciences, Japan

 The Human Genome Variation Society Maintains a comprehensive lists of variation-related databases such as Locus Specific Mutation Databases, Disease Centered Central Mutations Databases, Chromosomal Variantion Databases, Clinical and Patient Aspects Databases, among others. Click here for the complete list.

 The Mammalian Gene Mutation Database (MGMD) compiles mutagen-induced gene mutations following analysis of the literature on mutagen-induced mutational spectra in mammalian tissues. It is a reference source for mutagen-induced mutational spectra of interest to researchers in genetic toxicology.


 The Connectivity Map (cmap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms. It enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes. Supported by the Broad Institute, a research collaboration between MIT, Harvards and the Whitehead Institute.

 The Gene Expression Omnibus: a public functional genomics data repository and tools to help users query and download experiments and curated gene expression profiles.

 Genvestigator biomedical is a "high performance search engine for gene expression. It elegantly integrates thousands of manually curated public microarray experiments and nicely visualizes gene expression across different biological contexts (diseases, drugs, tissues, cancers, genotypes, etc.)."

 The ArrayExpress Archive is a database of functional genomics experiments including gene expression. It includes a Gene Expression Atlas that contains a "subset of curated and re-annotated Archive data which can be queried for individual gene expression under different biological conditions across experiments"

The Cancer Genome Atlas Browser

The Cancer Genome Atlas Data Portal (TCGA) allows you search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high-throughput sequencing analysis of the tumor genomes  

Nucleosome Prediction

The Online Nucleosomes Position Prediction by Genomic Sequence allows you to submit a genomic sequence and to recieve a prediction of the nucleosomes positions on it.

HuGE Navigator

 The Human Genome Epidemiology Network (HuGENet) was established by the Office of Public Health Genomics in 1998. It is a knowledge base of genetic associations and human genome epidemiology, including population prevalence of genetic variants, genetic associations, gene-gene and gene- environment interactions, and evaluation of genetic tests. These are some of its tools:

Phenopedia: genetic associations and human genome epidemiology summaries by disease.

Genopedia: genetic associations and human genome epidemiology summaries by gene.

GWAS Integrator: tool that provides lookup and analytic functionalities for all published GWAS studies that are available in the GWAS catalog curated by the National Human Genome Research Institute.

HuGE Literature Finder:  is a search engine for finding published literature on genetic associations and other human genome epidemiology.

