Guides @ UF: Genomics &amp; Bioinformatics Dictionary: Genomics

The enzymatic process of adding an acetyl group to a lysine residue on histone tails or on other proteins.

A chromosome with a centromere near to one end so that one arm is very short.

Evolution as a result of selection.

An evolutionary process that is directed by natural selection, which makes a population better adapted to live in an environment.

Adaptive walks

A metaphor used to describe the sequence of fixation of beneficial mutations that transform a low-fitness genotype into a genotype that is well-adapted to its environment.

Admixed

An admixed population contains hybrids or offspring of individuals originating from genetically divergent parental populations.

Admixed population

A population formed recently from the mixing of two or more groups whose ancestors had long been separated.

Admixture

The mixture of two or more genetically distinct populations.

(Hybridized)

Affymetrix gene-chip analysis

The examination of gene-expression profiles by the high-density array of single-stranded DNA nucleotides.

Allelic heterogeneity

When multiple variants in the same gene affect the same disease.

Allozymes

Co-dominant nuclear DNA markers that consist of enzymes that differ in their mobility on a charged gel.

Alu

An interspersed DNA sequence of 300 bp that belongs to the short interspersed element (SINE) family and is found in the genome of primates. Alu elements are composed of a head-to-tail dimer in which the first monomer is 140 bp long and the second is 170 bp long. In humans, there are ∼1.1 million copies of Alu elements, of which ∼500,000 copies are located in introns.

Alu element

An interspersed DNA sequence of ∼300 base pairs (bp) that is found in the genomes of primates, which can be cleaved by the restriction enzyme AluI. They are composed of a head-to-tail dimer, with the first monomer ∼140-bp long and the second ∼170-bp long. In humans, there are 300,000–600,000 copies of Alu elements.

Amplification

Gene amplification refers to an increase in the number of copies of a gene in a genome.

(Gene amplification)

Amplified fragment length polymorphism

A DNA fragment-length polymorphism that is revealed by a PCR-based DNA fingerprinting technique that generates dozens of polymorphic marker bands (presence or absence of a restriction enzyme site) in a single gel lane. The marker bands are usually dominant in that we generally cannot see the difference between a heterozygote and homozygote.

(AFLP)

Ancestry-informative markers

Genetic markers ascertained for large differences in allele frequency between subpopulations that are genotyped to infer genetic ancestry in new samples.

Aneuploidy

The presence of an abnormal number of chromosomes, either more or less than the diploid number. It is associated with cell and organismal inviability, birth defects and cancer.

Anneal

In molecular biology, the process by which two single strands of DNA hydrogen bond at complementary nucleotides to form a double-stranded molecule.

Aptamer

Oligonucleic acids that bind to a specific target molecule, such as a small molecule, protein or nucleic acid. Nucleic acid aptamers are typically developed through in vitro selection schemes but are also found naturally (for example, RNA aptamers in riboswitches).

Array capture

A method for enriching whole genomic DNA for many regions of interest by hybridization to an array containing RNA or DNA sequences complementary to the regions of interest.

Ascertainment bias

The bias in patterns of variation that results from using pre-ascertained SNPs.

Assay for transposase accessible chromatin sequencing

A method that uses the activity of a hyperactive transposase to cleave exposed DNA and add sequencing adapters. Regions that cannot be sequenced are inferred to be chromatin interacting.

(ATAC-seq, ATAC-sequencing, ATAC sequencing)

Associated interval

A stretch of sequence surrounding a polymorphism that has been associated with a phenotype, in which linkage disequilibrium levels between polymorphisms and the associated marker might be sufficiently high to drive the originally observed association.

Association studies

A set of methods that are used to correlate polymorphisms in genotype to polymorphisms in phenotype in populations.

Autosomal

“Autosomal” means that the gene in question is located on one of the numbered, or non-sex, chromosomes.

Axial elements

Linear structures that assemble along the length of meiotic chromosomes. Axial elements become the lateral elements of the mature synaptonemal complex.

BAC-by-BAC sequencing

A sequencing method where a physical map is generated from overlapping bacterial artificial chromosome (BAC) clones tiled across a chromosome. Each BAC is then fragmented and sequenced. The sequenced fragments are aligned with the knowledge of the originating BAC.

Backcross

Originally, backcross referred to the mating of an offspring with one of its parents, in which the offspring is heterozygous, with the parent being homozygous for one of the alleles in the offspring's genotype. Nowadays, backcross simply refers to a mating between individuals with those two genotypes.

Bacterial artificial chromosome

Bacterial artificial chromosomes (BACs) are DNA molecules assembled in vitro from defined constituents and are stably maintained as one large DNA fragment in Escherichia coli. Artificial chromosomes are useful for genome sequencing programs, for transduction of DNA segments into eukaryotic cells, and for functional characterization of genomic regions and entire viral genomes such as cytomegalovirus (CMV) genomes.

(BAC)

Balancing selection

A form of selection in which multiple phenotypes (or alleles) are maintained in a population.

Barcodes

A series of known bases added to a template molecule either through ligation or amplification. After sequencing, these barcodes can be used to identify which sample a particular read is derived from.

(DNA barcode)

Barcoding

DNA barcoding is a tool for rapid species identification based on DNA sequences. DNA barcodes consist of a standardized short sequence of DNA (400–800 bp) that in principle should be easily generated and characterized for all species on the planet. DNA barcoding aims to use the information of one or a few gene regions to identify all species of life.

(DNA barcoding)

Basal splicing

A conserved mRNA splicing mechanism. It is composed of the splicing signals and the core of the machinery is formed by five spliceosomal small nuclear ribonucleoproteins and an unknown number of proteins.

Base excision repair

A cellular mechanism that repairs damaged DNA and is initiated by the activity of DNA glycosylases.

Base-space

A system used by most next-generation sequencing platforms. When a one-base-encoded probe or a sequencing-by-synthesis approach is used, each signal is correctly correlated to a base.

Bayesian

A framework of statistical inference in which previous beliefs (or data) and likelihoods are combined to estimate a parameter of interest given the observed data.

Bayesian approach

A statistical perspective that focuses on the probability distribution of parameters before and after observing the data.

Best-guess genotype

Most imputation methods provide a probabilistic prediction of the missing genotypes. The best guess genotype is that genotype which has the largest probability.

Biallelic

Refers to two (possibly different) variants located on both alleles of the same gene.

Biomarker

An individual protein that is uniquely produced in a diseased state.

Biotechnology

The use of artificial methods to modify the genetic material of living organisms or cells to produce novel compounds or to perform new functions.

Bisulphite sequencing

A technique in which the treatment of DNA with bisulphite, which converts cytosines into uracils but does not modify methylated cytosines, is used to determine the DNA methylation pattern.

Bivalent

Two paired or synapsed homologous chromosomes, each formed of two sister chromatids.

Bonferroni correction

When n statistical tests are carried out, each has the potential (probability, p, the significance level) to return a false-positive result. If tests are independent of each other, the so-called experiment-wise probability that one or more tests show a false-positive result is approximately np. So, to achieve an experiment-wise false-positive rate of p, each individual test must only be allowed a false-positive error rate of p/n, which is referred to as the Bonferroni correction.

Bootstrap

A statistical approach that is often used to generate confidence intervals (measures of variation) around parameter estimates in which the data are re-sampled repeatedly (with replacement) using computer Monte Carlo simulations.

Bottleneck

A marked reduction in population size that often results in the loss of genetic variation and more frequent matings among closely related individuals.

Breakage-fusion-bridge cycles

A mechanism of chromosomal instability caused by a cycle of telomere breaks and dicentric chromosome formation.

(BFB cycles, BFB's)

Bromodomain

A conserved structural domain of ~40–50 amino acids that is commonly found in proteins associated with chromatin remodeling and with proteins that bind to acetylated lysine residues in histones.

Cajal bodies

Distinct sub-nuclear structures present in eukaryotic cells associated with RNA metabolism and ribonucleoprotein biogenesis.

Cap analysis of gene expression

The high-throughput sequencing of concatamers of DNA tags that are derived from the initial nucleotides of 5′ mRNA.

(CAGE)

Capsid

The proteinaceous shell that packages the genetic material of the virus. Its structure is important in determining viral stability, delivery and host interactions.

(Viral envelope)

Cargo gene

Any gene (or genes) harboured on the sequence of an extrachromosomal DNA (ecDNA) element.

Catenations

Topological linkages between duplex DNA. Catenations between sister chromatids arise during replication.

Causal variant

A genetic marker that is functionally responsible for altering the severity of the phenotype.

Cell-autonomously

A mode of gene effect that is restricted to the cell in which the gene is expressed.

Census population size

Actual population size (total number of individuals) as compared to the theoretical effective population size.

CentiMorgan

1 centimorgan (relative distance between genes on a chromosome having a crossover value of 1%) OR A centimorgan (abbreviated cM) is a unit of measure for the frequency of genetic recombination. One centimorgan is equal to a 1% chance that two markers on a chromosome will become separated from one another due to a recombination event during meiosis (which occurs during the formation of egg and sperm cells).

(cM)

Centromere

Repetitive region of the chromosome that attaches to the mitotic spindle and is responsible for ensuring accurate transmission of the genome during cell division.

Checkpoint

A mechanism that monitors the fidelity of cellular events and triggers cell cycle arrest and possibly apoptosis when errors are not corrected. In meiosis, unrepaired DNA damage and synapsis failure trigger checkpoints that can halt meiotic progression.

Chimaera assay

A technique that assesses the mode of action of gene products by generating animals from a mixture of cells that are derived from two or more genetically distinct animals.

Chromatid

The product of chromosome replication in meiosis I. Chromatids are distinguished from chromosomes by the fact that the two daughter chromatids of one chromosome remain attached at their centromeres through meiosis I cell division.

Chromatin

A complex of DNA and histone proteins. The basic unit of chromatin is the nucleosome.

Chromatin accessibility

The extent to which proteins are able to interact with chromatinized DNA, which is regulated through nucleosome occupancy and other factors occluding access to DNA.

Chromatin immunoprecipitation

A technique that is used to identify the location of DNA-binding proteins and epigenetic marks in the genome. Genomic sequences containing the mark of interest are enriched by binding soluble DNA chromatin extracts (complexes of DNA and protein) to an antibody that recognizes the mark. Related techniques — such as methylated DNA immunoprecipitation — use antibodies to recognize DNA modifications directly.

(ChIP)

Chromatin immunoprecipitation followed by sequencing

A method used to analyse protein interactions with DNA by combining ChIP with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins.

(ChIP-seq, ChIP-sequencing)

Chromatin loop extrusion

A motor-driven process in which a loop-extruding factor translocates along the chromatin fibre in opposite directions, thereby growing a chromatin loop.

Chromatin remodeling

An ATP-dependent enzymatic process that alters histone–DNA interactions or regulates the position of nucleosomes. Chromatin remodelling can also be ATP-independent in the case of the facilitates chromatin transcription (FACT) complex.

Chromatin sensitivity

The propensity of transcription factors (TFs) to be inhibited from binding motifs due to reversible chromatin features such as modification of DNA (CpG methylation) or presence and actions of chromatin proteins (such as nucleosomes).

Chromodomain

A conserved structural domain of ~40–50 amino acids that is commonly found in proteins associated with chromatin remodelling and with proteins that bind to methylated lysine residues in histones.

Chromosomal instability

A condition in which the rate of chromosome mis-segregation is elevated.

(CIN)

Chromosome conformation capture

A technique used to study the long-distance interactions between genomic regions. These interactions can be used to study the three-dimensional architecture of chromosomes in a cell nucleus.

Chromosome territories

Specific, largely non-overlapping areas in the nucleus that each chromosome occupies.

Chromosome-associated regulatory RNAs

Regulatory RNAs associated with the chromatin.

(carRNAs)

Chromothripsis

A massive chromosomal rearrangement resulting from a chromosome shattering event, characterized by more than 20 DNA fragments stitched together in an abnormal order.

Circular dichroism

Absorption spectroscopy method to detect the differential absorption of left- and right-handed light spectra for rapid evaluation of the secondary structures of macromolecules such as protein and DNA.

Cis-regulatory elements

Non-coding DNA sequences that regulate transcription of genes located on the same chromosome. They include enhancers, promoters, insulators, silencing elements and tethering elements. Different classes of CREs can be identified using a combination of molecular markers, including chromatin accessibility and epigenetic modifications.

(CRE, CREs)

Cline

A gradient of variation across space. It usually refers to increased differences among populations in the frequency of an allele or trait with increased geographic distance.

Clock genes

Core clock genes are directly involved in the primary transcriptional–translational feedback loops. By contrast, clock-controlled genes are those genes whose expression is driven by the transcriptional–translational feedback loops within cells and tissues, resulting in circadian oscillations in their function.

Clones

Cells that originate from a common cell ancestor (progenitor) with identical genetic identity.

Cloning

The production of an exact copy—specifically, an exact genetic copy—of a gene, cell, or organism.

Cluster analysis

A mathematical algorithm that organizes a set of items according to their similarity. For example, genes can be clustered according to their similarity in pattern of expression.

Clusters

Groups of DNA templates in close spatial proximity, generated either though bead-based amplification or by solid-phase amplification. Bead-based approaches rely on emulsions to maintain template isolation during amplification. Solid-phase approaches rely on the template-to-bound-adapter ratio to probabilistically bind template molecules at a sufficient distance from each other.

Coalescence

The joining of genetic lineages to common ancestors when they are traced backwards in time.

Coalescent

Relating to the mathematical and statistical properties of genealogies. A modelling framework in which two DNA sequence lineages converge in a common ancestral sequence, going backwards in time.

Coding region

The portion of a gene or an mRNA which actually codes for a protein.

(Coding sequence, CDS)

Co-dominant markers

Genetic markers that allow the determination of both alleles at a diploid locus (for example, microsatellites, allozymes and single nucleotide polymorphisms); these differ from dominant markers in which the determination of heterozygotes is not always possible (or example, RAPDs and AFLPs).

Codon

A codon is a sequence of three DNA or RNA nucleotides that corresponds with a specific amino acid or stop signal during protein synthesis.

Cohesion complex

A multisubunit protein complex that mediates sister-chromatid cohesion in mitosis and is essential for topologically associating domain (TAD) formation.

Common neutral mutation

A non-synonymous SNP present in at least 1% of the human population that is either overtly neutral or not known to influence disease in appreciable ways.

Comparative anchor-tag sequences

Exon sequences that are conserved across taxa allowing the design of primers that amplify in divergent species (for example, across mammal orders). CATS-like primers speed the discovery of SNPs (in exons or introns) and comparative genome mapping across taxa.

(CATS)

Complementation

Complementation occurs when two mutations together result in a wild-type phenotype.

Compound heterozygote

When an individual inherits two different recessive mutations, one from each parent, in the same gene that cause the same phenotype. An example would be a single-nucleotide variant causing a codon for an amino acid to be changed into a stop codon in one allele and a 4-bp deletion in the other allele: each of these variants knock out their respective allele, resulting in neither copy functioning.

Compound heterozygous

The existence of distinct mutations on opposite alleles of a single gene located on an autosomal chromosome.

Confounder

A spurious association between a risk factor (a gene, exposure or interaction) and disease induced by the joint associations of some other variable with the risk factor and the disease that are independent of the risk factor. Confounding can also distort the magnitude of the association of a true risk factor with disease or mask it.

Conjugation

The transfer of genetic information from a donor to a recipient cell by a conjugative or mobile genetic element, often a conjugative plasmid.

Consensus sequence

In next-generation sequencing (NGS) routines that allow multiple overlapping reads from a single molecule of DNA, all related reads are aligned to each other and the most likely base at each position is determined. This process helps to overcome high, single-pass error rates. A high-quality consensus sequence derived from the circular template from Pacific Biosciences (PacBio) is called a circular consensus sequence (CCS).

Conserved sequence

A base sequence in a DNA molecule (or an amino acid sequence in a protein) that has remained relatively unchanged throughout evolution.

Constitutive heterochromatin

A permanently condensed chromatin conformation that is repressive for transcription and is commonly found at repetitive regions of the genome, such as centromeres and telomeres.

Convergent evolution

Independent evolution from different ancestors that leads to similar characteristics.

Convolutional neural networks

Algorithms designed to learn from the data to uncover connections. CNNs are frequently used in image recognition and have been increasingly used to uncover relationships in biological data.

(CNN, CNNs)

Copy number variants

Copy number variants (CNVs) are regions of the genome that vary in integer copy number.

(CNV, CNVs)

Co-segregation

In the pedigree of a family with a condition, the segregation pattern shows how often the putative causal variant is found to coincide with the condition. When a variant coincides with the condition in a family, the condition and the variant are said to co-segregate.

Cosmid

A bacterial recombination vector that contains long inserted DNA sequences.

Coverage

The number of sequence reads that have alignments that overlap a certain position. Because current sequencing strategies produce random reads, resulting in an uneven distribution of reads across the genome, a high average coverage is required to assure that most bases in the genome are covered by multiple reads.

CpG island

A sequence of at least 200 bp with a greater number of CpG sites than expected for its GC content. These regions are often GC rich, typically undermethylated, and are found upstream of many mammalian genes.

Crossing over

Integration, excision, and inversion of defined DNA segments commonly occur through site-specific recombination, a process of DNA breakage and reunion that requires no DNA synthesis or high-energy cofactor.

(Recombination)

Crossover

A reciprocal exchange of DNA along chromatids such that the proximal end of one homologue becomes attached to the distal end of the other.

CTCF

The CCCTC binding factor (CTCF) is a zinc-finger transcription factor that is enriched at the boundaries of TADs.

Cybrid

Created by introducing a donor nucleus introduced into a cytoplast. Because cybrids contain the nuclear genes from one cell and the mitochondrial genes from another, they can be used to assess the contributions of mitochondrial genes and nuclear genes independently.

Degenerate mutation

A mutation that does not affect fitness but is damaging to gene function.

Discretization

The conversion of a continuous signal to a discrete signal.

Discriminant functions

Classical statistical pattern-recognition methods that are used to categorize samples into two classes of data.

Disjunction

The separation of chromosomes or chromatids during anaphase of mitosis or meiosis.

Disruptive selection

A form of selection in which extreme phenotypes are more fit than intermediate forms.

DNA barcoding

The addition of a unique molecular tag to each fragment of an individual's DNA so that after pooling with other DNA samples, the genotype of each individual in the pool can be reconstructed.

(DNA bar-coding, DNA-barcoding, barcoding)

DNA gyrase

A type II DNA topoisomerase that catalyses the ATP-dependent supercoiling of closed-circular dsDNA by strand breakage and rejoining reactions. Control of chromosomal topological transitions is essential for DNA replication and transcription in bacteria, making gyrase an effective target for antimicrobial agents.

DNA helicases

A class of motor proteins that move along DNA and transiently separate duplexes into two single strands using energy from ATP hydrolysis.

DNA looping

Physical DNA–DNA interaction in the genome within 3D nuclear space.

DNA restriction

The destruction of foreign dsDNA by a restriction endonuclease. The protection of self DNA from restriction is achieved by DNA methylation.

DNA scars

Irreversible and unintended DNA changes caused mainly due to off-targeting by DNA targeting modules with functional nucleases.

DNase I hypersensitive site

A chromatin region with a high rate of cleavage by DNase I due to its preference for open chromatin. DNase I hypersensitivity generally reflects transcription factor (TF) binding and a local reduction in nucleosome occupancy.

DNase I hypersensitivity site footprinting

An assay that identifies regions of the genome that lack nucleosome structure and are therefore readily degraded by the enzyme DNase I. Such regions tend to be associated with transcriptional activity. When coupled with sequencing, the ends of DNA fragments generated by treatment of chromatin with DNase I are sequenced.

Domestication

The process of genetically adapting an animal or plant to better suit the needs of human beings (for example, breeding cattle for milk production).

Dosage compensation

The phenomenon whereby the expression levels of sex-linked genes are made equal in males and females of heterogametic species.

Dot plot matrix

A visualization technique that allows the easy identification of matching nucleotides or amino acids (letters) between two sequences. For example, for two sequences X and Y, each letter has a unique coordinate on the x axis and the y axis respectively. When two letters are the same at a specified coordinate, a dot is plotted in the matrix at that position.

Double-strand break

A serious form of DNA damage that is created enzymatically during meiosis and that stimulates repair by crossover or non-crossover recombination.

Dyad symmetry

A twofold rotational symmetry relationship (in this case, a DNA arrangement in which a 5′→3′ sequence on one strand is juxtaposed with the same 5′→3′ sequence on the opposite strand). Transcripts from such regions have the capacity to form stem–loop structures.

Ecotype

A genetically distinct population within a widely spread species.

Ectopic recombination

Recombination between nonhomologous sequences.

Effect size

The increase in risk (or proportion of population variation) that is conferred by a given causal variant.

Effective population size

The size of the ideal constant-size population, in which the effects of random drift would be the same as those seen in the actual population.

(Ne)

Embryonic-stem-cell-mediated transgenesis

A method in which DNA is introduced into embryonic stem (ES) cells and integrates randomly, or through gene targeting, into the genome. Transgenic ES cells are delivered to the germline through the generation of (ES cell↔embryo) chimaeras.

Endogenous retrovirus RNAs

The prevalent endogenous viral elements that are derived from retroviruses that have become integrated into the genome.

(ERV RNAs, ERVs, ERV RNA)

Endophenotype

An intermediate phenotype that is heritable and associated with a disease but is not itself a symptom of the disease. Although there is little evidence to support the theory, it has been argued that endophenotypes would be a more tractable target for genetic analysis than the relevant disease state itself.

Enhancer hijacking

A process in which a somatic structural genomic rearrangement brings an enhancer into physical proximity of a gene it does not normally interact with, and activates it ectopically.

Enthalpic

Interactions driven by the binding energy between molecules, such as homotypic interactions among chromatin states.

Entropic

Changes that increase the number of accessible microstates in the system and do not require input energy.

Epidermal differentiation complex

A gene complex of >50 genes that encode proteins involved in terminal differentiation and cornification of skin epidermal keratinocytes.

(EDC)

Epigenetic

Literally means 'outside conventional genetics'; this term describes any heritable change in gene expression that is not caused by a change in DNA sequence.

Epigenetic events

Heritable phenotypic changes that are independent of changes to the DNA sequence.

Epigenetic modifications

Chemical additions to DNA and histones that are associated with changes in gene expression and are heritable but do not alter the primary DNA sequence.

Epigenome

The combined features that enable stable propagation of different gene expression patterns from the same genome sequence. These include methylation of DNA at cytosine bases (mC), chemical modification of the histone proteins, chromatin accessibility and higher-order chromatin structures.

Epigenotype

The state of those mechanisms that regulate gene expression and are transmitted to daughter cells.

Episomal

In the context of transient transfection, this term refers to a plasmid target that is extra chromosomal.

Episomes

Circular DNA that is not integrated in the genome.

Epistatic miniarray profiles

These are created by screening the fitness of double mutants in a high-throughput manner. The results, when analysed as a whole, can reveal both positive and negative genetic interactions between genes and provide insights into biological pathways and protein–protein complexes in the cell.

Euchromatin

Non-condensed chromatin state that is enriched in genes and permissive for transcription.

Exaptations

Features (such as feathers) that evolved by selection for one purpose (such as warmth) and were later adapted to a new purpose (such as flight).

Exome

The exome is the collection of known exons in our genome: this is the portion of the genome that is translated into proteins. As exons comprise only 1% of the genome and contain the most easily understood, functionally relevant information, sequencing of only the exome is a cheaper method of identifying most of the variants that are most likely to affect a trait.

Exon-primed intron-crossing PCR

EPIC primers are designed in conserved exons and amplify intron sequences that are generally more polymorphic than exons, which are therefore useful for the development of SNP or RFLP markers.

(EPIC-PCR, EPIC PCR)

Expected heterozygosity

The probability for a locus that two alleles drawn from its allele-frequency distribution are distinct.

Expressed mutation rate

The rate of phenotypic change that results from the continuing accumulation of new mutations (expressed mutation rate = total mutation rate − neutral mutation rate).

Expressed sequence tags

Short DNA sequences (several hundred base pairs) that are produced by reverse transcription of mRNA into DNA. ESTs are cDNAs that consist of exons and the sequences that flank exons. The sequencing of ESTs allows rapid identification ('tagging') of genes and can expedite DNA marker (SNP) development in coding genes.

(EST, ESTs)

Extrachromosomal DNA concatenation

A structure in which two or more closed circular DNAs are interlinked.

(ecDNA concatenation)

Facultative heterochromatin

Reversibly condensed chromatin conformation that is transcriptionally silent.

FAIRE-seq

This technique isolates nucleosome-free regions of DNA from chromatin during phenol:chloroform extraction.

(Formaldehyde-assisted isolation of regulatory elements followed by sequencing)

False discovery rate

The proportion of false-positive test results out of all positive (significant) tests (note that the FDR is conceptually different to the significance level).

(FDR)

Family studies

A study design in which many members of a family across several generations are sequenced. These studies are used to understand how phenotypes manifest within a particular genotype background.

Fiducial markers

Markers used to correct for drift that may occur during an experiment. These can be fluorescent beads or labels on the DNA that remain constant throughout the imaging experiment.

Flow cells

Disposable parts of a next-generation sequencing routine. Template DNA is immobilized within the flow cell where fluid reagents can be streamed into the cell and flushed away.

Fluorescence resonance energy transfer

A system in which energy can be transferred from one light-sensitive molecule to another. When the two molecules are in close proximity (≤30 nm), energy transferred between the two molecules modulates the intensity of a fluorescence signal.

(FRET, Förster resonance energy transfer, Forster resonance energy transfer)

Focal amplification

A DNA region that only spans a sub-chromosomal arm proportion of the chromosome and is amplified at a high level; that is, more than eight copies.

Four-gamete test

If all four possible gametes are observed for two bi-allelic loci then this test infers that a recombination event must have occurred between them (under an infinite sites mutation model).

(FGT)

Fourier transform infrared spectroscopy

A spectroscopy method that simultaneously collects the absorption, emission and photoconductivity of a wide spectral range at high resolution to measure the intensity and wavelength of light required to vibrate molecules in a sample.

Fragmentation

The process of breaking large DNA fragments into smaller fragments. This can be achieved mechanically (by passing the DNA through a narrow passage), by sonication or enzymatically.

Functional sequence

A genomic sequence that provides a function that is under selection and tends to be conserved between species. For example, a protein-coding region or transcription-factor binding site.

Gap penalty

Alignment programs deal with insertions and deletions (indels) by introducing a 'gap' in the sequence that contains the deletion. The introduction of gaps and their extension decreases the overall alignment score by a certain value. This value is defined by a gap-opening penalty and a gap-extension penalty, both of which are used as parameters in alignment programs.

Gel electrophoresis

A technique used to separate molecules on the basis of their ability to migrate through a semisolid gel in response to an electric current.

Gene amplification

The emergence of a non-heritable extra copy of a gene in a somatic tissue. In microorganisms this term can be used interchangeably with gene duplication.

Gene conversion

Originally coined to describe non-Mendelian segregation of alleles obtained from a single meiosis, this typically (but not always) refers to a non-reciprocal form of non-crossover recombination that results in the alteration of the sequence of a gene (or DNA sequence) to that of its homologue. In ectopic gene conversion, the donor and recipient DNA strands are not allelic copies of the same locus.

Gene dosage

The amount of product produced from a gene; broadly equivalent to gene expression.

Gene duplication

The emergence of a heritable copy of a gene.

Gene flow

The movement of genes among populations. Often expressed as the proportion of gene copies (or breeding individuals) that are immigrants from a different population.

Gene sharing

An early term describing situations in which a gene has more than one function. Modern studies describe such genes as multifunctional.

Gene therapy

The technique used to cure heritable diseases by replacing mutant genes with good genes.

Gene-environment independence

The independent distribution of genotype and environment in the source population.

Gene-environment-wide interaction study

A scan of the entire genome for interactions with various environmental exposures.

Genetic anticipation

A phenomenon observed in autosomal dominant diseases in which some clinical manifestations develop earlier and are more severe with successive generations.

Genetic drift

The random fluctuations in allele frequencies over time that are due to chance alone.

Genetic engineering

Alteration of the genetic makeup of an organism using the molecular methods of biotechnology.

Genetic interference

The presence of a recombinational event in one region that affects the occurrence of recombinational events in adjacent regions. Positive interference, which is seen in eukaryotes, reduces the probability of using nearby hot spots in the same meiosis and causes a more even spacing of crossovers than would occur by chance.

Genetic map

An outline of genes and their location on a chromosome that is based on recombination frequencies between markers.

Genetic mosaics

Animals in which homozygous mutations are carried by only a small clone of cells.

Genetic pleiotropy

The effect of a single gene on multiple phenotypic traits. The underlying mechanism is related to the effects of the gene product on various targets.

Genetic testing

Identifying gene variants in an individual that may lead to a genetic disease in that individual.

Genetically modified organism

An organism whose genome has been artificially changed.

(GMO)

Genome phasing

A method to identify which chromosome a DNA sequence is derived from. By examining polymorphisms, the chromosome of origin can be inferred by matching the reads that share the same variation.

Genome typing

The simultaneous genotyping of hundreds of loci from across the genome, which ideally includes mapped loci and different classes of loci such as allozymes, microsatellites and AFLPs, or synonymous (non-coding) and non-synonymous nucleotide polymorphisms.

Genome-wide association study

An examination of common genetic variation across the genome that is designed to identify associations with traits, such as common diseases.

(GWAS)

Genomic footprinting

The use of an ectopically supplied enzyme that adds chemical groups to DNA but is itself sensitive to the factors binding DNA, such as transcription factors (TFs) or nuclesomes. Its activity can subsequently be read out by sequencing.

Genomic imprinting

Epigenetic marks that are differentially established during male and female gametogenesis and lead to allele-specific gene expression after fertilization.

Genomics

The study of entire genomes, including the complete set of genes, their nucleotide sequence and organization, and their interactions within a species and with other species.

G-quadruplexes

Four DNA stranded secondary structures formed in G-rich sequences in which four guanines form a planar array via Hoogsteen base-pairing. These structures can cause replication stress.

Group selection

Selection on traits that increase the relative fitness of populations or lineages of organisms at some fitness cost to individuals. All of the feasible mechanisms require selection on lineages or small interbreeding groups of related individuals in subdivided populations.

Haploinsufficiency

This occurs when a diploid organism only has one copy of a gene and both copies are required for correct function. This is one way that a protein-truncating mutation can influence predisposition to a disease.

Haplotype

A set of genetic markers that are present on a single chromosome and that show complete or nearly complete linkage disequilibrium — that is, they are inherited through generations without being changed by crossing over or other recombination mechanisms.

Haplotype blocks

Long stretches (tens of megabases) along a chromosome that have low recombination rates (and relatively few haplotypes). Adjacent blocks are separated by recombination hot spots (short regions with high recombination rates).

Haplotype-based approach

An approach to association studies in which the co-inheritance of phenotypes and haplotypes — as opposed to single markers — is statistically analysed.

Helicos Genetic Analysis System

A sequencing technology based on single nucleotide addition. Each nucleotide contains a ‘virtual terminator’ that prevents the incorporation of multiple nucleotides per cycle.

Hemimethylation

Methylation of a residue on one strand within a palindromic target sequence but not of the corresponding residue within the palindromic target sequence on the complementary DNA strand. Not be confused with monoallelic methylation, in which one allele of a locus is methylated in a diploid organism.

Hemizygote

An animal with a transgene insertion on one chromosome of a homologous pair, rather than on each of the two homologous chromosomes (homozygote).

Hemizygous

The type of zygosity in which only one allele contains a gene or mutation.

Heritability

The proportion of total phenotypic variation that can be attributed to genetic effects (broad sense) or purely additive genetic effects (narrow sense). Narrow-sense heritability predicts the initial response of a population to selection and decreases over the course of selection.

Heritable

A phenotype that is at least partially transmitted genetically from parents to offspring.

Heterochromatin

A densely packaged form of chromatin that is associated with repressive histone modifications, DNA methylation and gene silencing.

Heteroduplex

Double-stranded DNA in which the sequences of the strands are not perfectly complementary.

Heteroplasmy

The co-existence of mutant and wild-type mitochondrial DNA molecules within the same mitochondrion or within a cell.

Hidden Markov Model

A probabilistic model that is applied to protein- and DNA-sequence pattern recognition. HMMs represent a system as a set of discrete states and as transitions between those states. Each transition has an associated probability. HMMs are valuable because they enable a search or alignment algorithm to be built on firm probabilistic bases, and the parameters (transition probabilities) can be easily trained on a known data set.

(HMM)

Histone

A family of small, highly conserved basic proteins that are found in the chromatin of all eukaryotic cells and that associate with DNA to form a nucleosome. Two each of the core histones H2A, H2B, H3 and H4 make up an octameric nucleosome, around which DNA winds.

Histone modifications

Covalent modifications to histone proteins, such as methylation, acetylation, phosphorylation, ubiquitylation and sumoylation, that take place at lysine, serine, threonine, arginine and other residues. Histone modifications are catalysed by a diverse panel of enzymes referred to as writers, removed by a different set of proteins known as erasers, and recognized by chromatin-binding proteins known as readers. Activity of CREs is directly linked to distinct histone modifications due to the activities of writers, erasers and readers.

Histone octamer lateral surface

The positively charged outer surface of the histone octamer around which DNA is wrapped.

Histone variants

Structurally distinct, non-typical versions of histone proteins. They are encoded by independent genes and are often subject to regulation that is distinct from that of the canonical histones.

HITS-CLIP

A technique similar to ChIP–seq in which proteins bound to RNA — such as splicing factors — are immunoprecipitated and the RNA fragments are sequenced.

Holiday junction

The point at which the strands of two dsDNA molecules exchange partners as an intermediate step in crossing over. Typically, two Holliday junctions are formed in the recombination pathway that gives rise to crossovers.

Homogenously staining regions

Chromosomal regions with DNA amplification presenting a uniformed staining pattern with Giemsa nucleic acid stain.

(HSR, HSRs)

Homologous recombination

A template-based mechanism for accurate repair of double-stranded breaks in DNA.

Homopolymer

A sequence run of identical bases.

Hoogsteen base pairing

An alternative base pairing in which the purine is flipped and form different hydrogen bonds with partner bases. For adenines, the second hydrogen bond with the pyrimidine base is formed with N6 rather than N1. These alternative base pairs allow for additional structures beyond double helix including triplexes and quadruplexes.

Hox clusters

A group of linked regulatory homeobox genes that are involved in patterning the animal body axis during development. Homeobox genes are defined as those that contain an 180-base-pair sequence that encodes a DNA-binding helix–lturn–helix motif (a homeodomain).

Hybrids

Offspring that are produced by crossing two different populations within a single species.

Hypomorph

Low activity of forms of a gene.

Hypomorphic

Refers to a variant that results in reduced but not eliminated function of the gene product.

Identical by descent

Two or more alleles are identical by descent if they are identical copies of the same ancestral allele.

Identical by state

Two or more alleles are identical by state if they are identical. Alleles which are identical by state may or may not be identical by descent owing to the possibility of multiple mutation events.

Illegitimate recombination

Nonhomologous sequence recombination at the genomic DNA level.

Imprinted

A locus with monoallelic expression determined by the parental origin of the allele.

Imprinted genes

Genes in which one allele is expressed in a parent-of-origin-specific manner.

Imprinting

The epigenetic marking of a gene on the basis of parental origin, which in somatic tissues results in monoallelic expression.

Incomplete penetrance

Refers to the phenomenon of some individuals who carry a pathogenic variant who do not exhibit clinical signs.

Indel

A small insertion or deletion of nucleotides. If it occurs in an exon and is not a multiple of three in length, it results in a frameshift and usually the loss of gene function.

Infinite sites mutation model

A model that assumes that there are an infinite number of nucleotide sites and consequently that each new mutation occurs at a different locus.

Insulator

A genomic element that acts as a barrier, preventing interactions between contiguous regions of the genome.

Integrated

The tendency of different traits to vary jointly in a coordinated manner throughout a morphological structure or even a whole organism.

Interaction odds ratio

The ratio of odds ratios for the relationship of one factor (for example, a gene) with disease across the levels of another factor (for example, an environmental exposure); as such, it is a measure of departure from a multiplicative joint effect.

Interactome

A set of molecular components of the cell, such as proteins, and the interactions between them. The interactions can be physical (protein A binds protein B) or correlative (perturbing protein A alters protein B's activity).

Interference

A phenomenon in which the occurrence of a crossover recombination at one position on a chromosome suppresses the frequency of additional, nearby crossovers; inhibition decreases with physical distance.

Introgression

The transfer or genetic material from one species to another by hybridization and repeated backcrossing.

Intron phase

The relative position of an intron within or between codons. Phase zero, one and two are defined by the position of an intron between two codons or after the first or second nucleotide of a codon, respectively.

Isodicentric chromosome

A cytogenetically anomalous chromosome characterized by the presence of two centromeres, with additional, identical copies of DNA segments joined end to end.

Isoforms

Proteins produced from the same genetic locus but which differ in exon order or combination.

Isoschizomers

Pairs of structurally distinct restriction enzymes with the same recognition sequence and the same cleavage positions.

Kin

Individuals that share some of their genes by recent common descent.

L1 retro-element

A member of the long interspersed transposable element (LINE) family, which is a type of large repetitive DNA sequence that inserts itself throughout the genome through retroposition. L1 retro-elements are ∼6,400 base pairs long and are abundant in the human genome.

Lamina-associated domains

Megabase-scale regions of the genome that interact with the nuclear lamina, are gene-poor, late-replicating and that correspond to heterochromatin and the B compartment.

Lateral gene transfer

The transfer of DNA, frequently cassettes of genes, between organisms.

Linkage disequilibrium

The non-random association of alleles. For example, alleles of SNPs that reside near one another on a chromosome often occur in non-random combinations owing to infrequent recombination. Linkage disequilibrium is useful in genome-wide association studies as it reduces the number of SNPs that must be interrogated to determine genotypes across the genome. Conversely, strong linkage disequilibrium can complicate the identification of functional variants.

(LD)

Linked reads

Reads derived from the 10X Genomics synthetic long-read platform. These are discontinuous reads each sharing the same barcode, thus they are derived from the same original long molecule.

Locus heterogeneity

This occurs when a phenotype is caused by mutations at more than one gene locus, which suggests that the products of the genes belong to the same metabolic pathway.

(Genetic heterogeneity)

LOD score

The logarithm of the likelihood ratio (odds) for genetic linkage versus no linkage at a given value of the recombination fraction.

(Logarithm of odds score)

Logistic regression model

A statistical model for the dependency of a binomial (two-class) phenotype on a number of risk factors. The probability, p, for one of the two phenotype states is expressed in the form of its logit, log(p/(1 – p)), which is assumed to be predicted by the linear combination (weighted sum) of the risk factors.

Long non-coding RNAs

Non-coding RNAs longer than 200 nucleotides.

(lncRNA, lncRNAs)

Loop extrusion

A model of how CTCF and cohesin are thought to form topologically associating domains (TADs), whereby cohesin is loaded onto the DNA and extrudes a loop until it is blocked by CTCF bound at the base of the loop.

Map based

An approach to genetic association studies that is focused on putatively functional SNPs, for example, identified by re-sequencing exons and other functional regions in relatively large samples, or directly in patients. This approach is also sometimes called direct.

Marginal effects

The effects of a specific risk factor (gene or exposure) in the population as a whole, averaging over all other variables.

Marginal genealogy

The part of a genealogical graph that corresponds to a single locus or stretch of DNA that is inherited without recombination.

Marginal penetrance

In epistatic interactions between two loci asscoiated with disease, each with three genotypes, the nine genotype pairs might each be associated with a certain penetrance — that is, the probability that the genotype pair leads to disease. From these penetrances and the genotype frequencies, (marginal) penetrances might be computed — that is, penetrances that are associated with the genotypes at one of the two loci.

Marker ascertainment

The process by which new genetic markers are obtained — for example, by re-sequencing a subset of chromosomes in a population sample. If those markers are population-specific then inferences that are based on them in other populations might be biased through so-called ascertainment bias.

Markov chain monte carlo

A computational technique for the efficient numerical calculation of likelihoods.

(MCMC)

Maximum-likelihood

A method that selects the phylogenetic tree that has the highest probability of explaining the sequence data, under a specific model of substitution (changes in the nucleotide or amino-acid sequence).

McDonald-Kreitman test

A statistical test that is commonly used for the comparison of between-species divergence and within-species polymorphism at replacement and synonymous sites to infer adaptive protein evolution.

Mediator complex

A multisubunit protein complex that bridges transcription factors and the basal RNA polymerase II transcriptional machinery.

MeDIP–seq

Methylated DNA is immunoprecipitated with an antibody against methylated cytosine and then sequenced by next-generation sequencing.

Mendelian disease

A disease that is carried in families in either a dominant or recessive manner and that is typically controlled by variants of large effect in a single gene.

Mendelian randomization

A technique for studying the relationship between a biomarker and disease indirectly by studying the relationship of each to a gene that influences the biomarker.

Metagenomics

Ordinary genomics studies the genome of a single organism. Metagenomics is the simultaneous study of a collection of many different species’ genomes in a single sample, typically that of microbial communities.

Methylation

The enzymatic process of adding a methyl group to a lysine or an arginine residue on histone tails or other proteins. Alternatively, methyl groups can be added to DNA itself on cytosine bases.

Methylation-preferred

Refers to transcription factor (TF) motifs that are bound with higher affinity when CpG dinucleotides within the motif are methylated.

Methylation-sensitive

Refers to transcription factor (TF) motifs that are bound with lower affinity when CpG dinucleotides within the motif are methylated.

MethylC–seq

Methylated DNA is identified by shotgun sequencing of bisulphite-converted DNA.

(Bisulphite conversion followed by sequencing, BS–seq, BS sequencing)

Micrococcal nuclease

An enzyme that generates cuts preferentially within linker DNA between nucleosomes and in nucleosome-depleted regions. Coupling MNase digestion of chromatin with next-generation sequencing generates maps of nucleosome position and density.

(MNase)

Microevolution

Evolutionary processes or changes over relatively short time periods — such as change in allele frequencies, genotypic composition or gene expression — within or between populations.

Micronuclei

The small nuclear structures that reside in the cytoplasm and contain damaged DNA fragments which were not incorporated into the main nucleus after mitosis.

Microsatellite

A type of genetic marker in which individuals vary in their number of tandemly repeated copies of a short DNA unit.

Migration-drift genetic equilibrium

The balance between the loss of alleles through genetic drift and the gain of alleles through migration.

Minimum-description length approaches

A concept from information theory, in which all of the information contained in a system (for example, a sample of DNA sequences) is described in the most compact form possible.

Minisatellite

A region of DNA in which repeat units of 10–50 bp are tandemly arranged in arrays 0.5–30 kb in length.

Minor allele frequency

Ranging from 0 to 50%, this is the proportion of alleles at a locus that consists of the less frequent allele. This number does not take genotype into account.

(MAF)

miRNA microprocessor complex

A protein complex involved in the early stages of processing microRNA (miRNA) and RNA interference in animal cells.

Mismatch repair

A DNA-repair pathway that removes mismatched bases and corrects the insertion or deletion of short stretches of (repeated) DNA.

Mitochondrial RNA granule

A heterogeneous complex composed of mitochondrial RNA and proteins involved in RNA regulation.

(MRG, MRGs)

Molecular clock

Molecular mechanism driving circadian rhythms, consisting of transcriptional–translational feedback loops of core clock genes.

Molecular typing

The use of molecular genetic techniques — for example, multiplex PCR, pulse-field gel electrophoresis, Southern blotting and multilocus sequence typing — to genetically compare and characterize bacterial genomes.

Mono-adducts

A form of DNA lesion induced by DNA damaging agents, such as ultraviolet radiation, which on longer exposure can be converted into covalent crosslinks in the DNA. Mono-adducts can, to an extent, induce recombination in yeast, mammalian and bacterial cells.

Monoallelic

Refers to one genetic variant located on one allele of a gene.

Morphotypes

Distinctive phenotypes. Organisms that are classified together on the basis of similar physical features without knowledge of their genetic relationships.

Mosaic

An organism that consists of cells of more than one genotype. The strict definition requires that the genotypically different cells all derive from a single zygote. The term mosaic is also used more broadly to describe any organism comprised of cells of different genotypes.

Mosaicism

A condition in which an animal contains multiple cell lineages with different genotypes.

Multi-locus genetic approaches

Genetic methods that make use of information from many loci; such approaches use nuclear loci because mitochondrial genes are typically inherited as one locus.

Mutation load

The accumulated deleterious alleles that are carried by a population at any given time.

Mutational hotspots

A region in which the frequency of mutation is greater than expected, owing to specific structural and/or functional features of the protein or gene.

Negative supercoiling

A segment of underwound DNA in which the two strands wind around the helical axis less than 360° every 10.5 bp and retain twist strain (free energy).

Neofunctionalization

The random acquisition of a new function in the course of the accumulation of neutral mutations in duplicated genes.

Neoschizomers

Pairs of structurally distinct restriction enzymes with the same recognition sequence but with different cleavage positions.

Neutral drift

The process by which a DNA sequence acquires many mutations over time that have no phenotypic effect, and are not acted on by Darwinian selection.

Neutral loci

Loci that are not evolving directly in response to selection, the dynamics of which are controlled mainly by genetic drift and migration. These loci can, however, be influenced by selection on nearby (linked) loci.

New gene

A gene that has originated recently in the relevant evolutionary timescale.

Next-generation sequencing

Here, we define this as the use of established sequencing platforms, including the Illumina/Solexa Genome Analyzer, Roche/454 Genome Sequencer and Applied Biosystems SOLiD platforms, as well as newer platforms, such as the Helicos and Pacific Biosciences platforms.

(Next generation sequencing, NGS)

Nickase Cas9

Cas9 that has either its HNH or RuvC nuclease domain catalytically inactivated, resulting in a Cas9 enzyme that can only cut one strand of targeted double-stranded DNA.

(nCas9)

Non-functionalization

The process of the accumulation of neutral mutations in a duplicated gene that renders the gene copy non-functional.

(Pseudogenization)

Non-homologous end-joining

An error-prone mechanism for repairing double-stranded breaks in DNA involving the ligation of two free DNA ends.

(NHEJ)

Non-synonymous variant

A genetic variant that changes a codon for one amino acid to another amino acid. Many non-synonymous variants are well-tolerated, but others can cause a disease.

Nuclear periphery

The area at the edge of the nucleus. It is normally associated with gene silencing.

Nuclear run-on

An assay that directly measures the transcriptional activity of a gene by incorporation of labelled UTP into its mRNA.

Nucleoid

A complex composed of mitochondrial DNA and its associated proteins that regulate the organization and expression of the mitochondrial genome.

Nucleosome

The basic unit of chromatin, containing ∼147 bp of DNA wrapped around a histone octamer (which is composed of two copies each of histone 3 (H3), H4, H2A and H2B).

Null distribution

The distribution (or range) of values across which we expect to observe the value of the test statistic if the null hypothesis is true (for example, neutrality). When conducting a standard t-test, t is the test statistic and the null distribution is the normal (Gaussian) distribution with t degrees of freedom.

(Neutral distribution)

Odds ratio

The odds of carrying a genetic variant (or other hazard exposure) in cases compared with controls. It can be used as a measure of effect size in case–control association studies. An odds ratio significantly different from one suggests that the genetic variant is associated with the disease or trait.

Off-targeting

The effects arising due to non-specific and unintended targeting of DNA targeting modules such as zinc fingers, transcription activator-like effector (TALE) and CRISPR in the genome.

Okazaki fragments

Short fragments of DNA produced by discontinuous replication on the lagging strand during DNA replication. Because the template for lagging strand synthesis is exposed in the 5′–3′ direction at the progressing replication fork, the nascent strand is composed of sequential Okazaki fragments created by DNA polymerase working backwards from the replication fork.

One-base-encoded probes

Oligonucleotides that contain a single interrogation base in a known position. The base corresponds to a fluorescent label on each probe. The remaining bases are either degenerate (any of the four bases) or universal (unnatural bases with nonspecific hybridization), allowing the probe to interact with many different possible template sequences.

Ontology

A formal system for organizing knowledge, here used in the context of biological pathways as a means of synthesizing information about the function of genes and exposures and their joint roles in disease causation.

Orphan genes

Genes that do not share any homology with genes from other species.

Orthogonal environment

A cellular environment or host into which genetic material is transplanted to avoid undesired native host interference or regulation. Orthogonal hosts are often organisms with sufficient evolutionary distance from the native host.

Orthologues

Sequences, or genes, that have originated from a common ancestral sequence, or gene, by a speciation event.

Outlier loci

Genome locations (or markers or base pairs) that show behaviour or patterns of variation that are extremely divergent from the rest of the genome (locus-specific effects), as revealed by simulations or statistical tests.

Padlock capture

A method for simultaneously capturing and amplifying large numbers of regions of interest from whole genomic DNA. Each padlock probe has two complementary oligonucleotide sequences that flank a region of interest. The sequences are joined by a loop of DNA that ensures efficient joint hybridization and contains sequences for PCR with universal primers.

Paired-end sequencing

In paired-end sequencing, a DNA template is sequenced from both sides; the forward and reverse reads may or may not overlap. A deviation in the expected genome alignment between two ends of a paired-end read can indicate astructural variation.

Pair-rule gene

A class of segmentation gene that determines segments along the anterior–posterior axis. The expression of pair-rule genes in a pattern of seven stripes that are perpendicular to the axis is regulated by another class of segmentation genes: the gap genes.

Pairwise linkage disequilibrium

The strength of association between alleles at two different markers.

(Pairwise LD)

Paralogues

Sequences, or genes, that have originated from a common ancestral sequence, or gene, by a duplication event.

Parsimony

As applied to phylogenetic reconstruction, a criterion for estimating historical changes by minimizing the number of substitution events that are required to explain how one DNA sequence evolves into another.

Partial epigenetic reprogramming

Delivery of factors that can de-differentiate cells into induced pluripotent stem cells, typically short term, to de-age the epigenetic state of cells.

Pathogenicity island

Genomic islands that contain genes that are required for virulence. These islands are usually absent from non-pathogenic organisms and are acquired by horizontal gene transfer.

Pharmacogenomics

The study of drug interactions with the genome or proteome; also called toxicogenomics.

Phase separation

A process through which polymer chains (or segments of a polymer chain) spontaneously de-mix and segregate through the formation of immiscible phases.

Phasing

Determining the haplotype phase (the arrangement of alleles at two loci on homologous chromosomes) from genotype data using statistical methods.

Phenocopy

The production of a phenotype as a result of environmental factors, such as stress, which closely resembles a phenotype that normally results from specific gene expression or from gene mutation.

Phylogeography

The study of the geographic distribution of phylogenetic lineages, usually within species and to reconstruct the origins and diffusion of lineages.

Physical map

A representation of the physical distance between genes or genetic markers.

Plasmid

A small circular molecule of DNA found in bacteria that replicates independently of the main bacterial chromosome; plasmids code for some important traits for bacteria and can be used as vectors to transport DNA into bacteria in genetic engineering applications.

Pleiotropy

A phenomenon in which a gene can influence two or more independent characteristics.

Polycomb-associating domains

Self-associating compartment-like structures marked by histone 3 lysine 27 trimethylation (H3K27me3).

(PADs)

Polygenic diseases

Diseases that are mediated by numerous genetic variants that each individually contribute small effects.

Polygenic score

Also known as polygenic risk score. A score that summarizes genetic liability to a trait or disease and is typically calculated by aggregating the weighted effect of many trait-associated genetic variants.

Polymerase chain reaction

A technique used to make multiple copies of DNA.

(PCR)

Polymerase reads

Contiguous sequences of nucleotides incorporated by the DNA polymerase while reading a template. These reads include sequences from adapters and can represent sequences from multiple passes around a circular template.

Polytene chromosomes

DNA structures containing many paired sister chromatids, which are produced by multiple rounds of DNA replication without cell division.

Population bottleneck

A marked reduction in population size followed by the survival and expansion of a small random sample of the original population. It often results in the loss of genetic variation and more frequent matings among closely related individuals.

Population genetic analysis

The process of making inferences about the evolutionary and demographic history of a gene (or organism) on the basis of data on genetic variation in a species.

Population parameters

Parameters that characterize populations such as gene flow, migration rates, effective size, change in size, relatedness and phylogeny.

Population stratification

The phenomenon of an apparently homogeneous population that is actually composed of subgroups of individuals with distinct ancestral origins and differing allele frequencies at many loci. This leads to bias in the assessment of the significance of associations of a trait with particular loci.

Population structure

Genetic differences between individuals as a consequence of the distribution of individuals in partially isolated populations.

Position effect variegation

Variegated expression patterns that arise owing to intercellular differences in epigenetic gene silencing, typically observed when reporter genes are brought into proximity with heterochromatin.

Positive selection

A process by which natural selection favours a single beneficial genotype over other genotypes and may drive this genotype to a high frequency in a population.

Pre-ascertained single nucleotide polymorphisms

SNPs that have already been detected in previous studies, usually from an extremely small sample of chromosomes.

(Pre-ascertained SNPs)

Private SNPs

SNPs that are confined to a single population.

Proteoform

A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications.

Proteomics

Study of the function of proteomes.

Proto-oncogene

A gene that promotes the specialization and division of cells; however, when it is mutated or expressed at high levels, it causes abnormal cellular growth.

Protospacer

Phage or plasmid sequences that match one or more clustered, regularly interspaced short palindromic repeat (CRISPR) spacer sequences and are targeted during CRISPR interference.

Pseudoautosomal

A region on a sex chromosome that is homologous between the X chromosome and the Y chromosome. Successful meiosis in males requires a crossover in this pseudoautosomal region.

Psoralen

A photosensitizing chemical that is used for determining RNA–DNA structures in cells, and intercalates between two strands in duplex DNA. When attached to an oligonucleotide, psoralen forms interstrand crosslinks. When exposed to ultraviolet light, it forms photoadducts, crosslinked chemical bonds within adjacent bases.

Psuedoknot

A non-nested structural RNA motif formed upon base-pairing between the loop of a secondary structure element (such as a stem-loop (SL)) and any complementary region along the RNA.

Purifying selection

Selection against deleterious alleles that arise in a population, preventing their increase in frequency and assuring their eventual disappearance from the gene pool.

Quantitative trait locus

A locus that controls a quantitative phenotypic trait, identified by showing a statistical association between genetic markers surrounding the locus and phenotypic measurements.

(QTL)

Qualitative trait

Qualitative traits consist of a discrete number of classes, such as 'affected' and 'unaffected'.

(Simple trait)

Quantitative trait

Quantitative traits occur with a continuous distribution.

(Complex trait)

R loop

A three-stranded nucleic acid structure that contains a DNA:RNA hybrid and a displaced strand of DNA.

Radiation hybrid

Radiation-induced interspecific cell hybrids.

(RH)

Radiation hybrid mapping

Radiation hybrid (RH) mapping, a somatic cell genetic technique, was developed as a general approach for constructing long-range maps of mammalian chromosomes. This statistical method depends on x-ray breakage of chromosomes to determine the distances between DNA markers, as well as their order on the chromosome.

(RH mapping)

Random genetic drift

Random fluctuations in allele frequencies between generations owing to sampling effects. It increases as the effective population size decreases.

Rate matrix

Denotes the probability of mutation from one amino acid to another (or from one nucleotide to another) for a given period of evolution. The most well known rate matrices are BLOSUM and PAM.

Read

The sequence of bases from a single molecule of DNA (or RNA).

Read cloud

The means by which the 10X Genomics platform determines a synthetic long read. Discontinuous linked reads from the same genomic region are aligned to each other. No single linked read contains the entire long sequence; however, when they are stacked, full coverage is achieved.

Read of insert

The highest-quality single sequence for an insert, regardless of the number of passes.

Real-time sequencing

A sequencing strategy used in the Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) platforms. In these approaches there is no pause after the detection of a base or series of bases, thus the sequence is derived in real-time.

Recombinant DNA

A combination of DNA fragments generated by molecular cloning that does not exist in nature.

Recombinant protein

A protein that is expressed from recombinant DNA molecules.

Recombination fraction

The proportion of offspring that receives a recombinant haplotype from a parent, or the probability that recombination occurs between two loci.

Recombination hot spot

A region of the genome in which the per-generation recombination rate is substantially elevated above the genome-wide average.

Recombination nodules

The early, visible manifestations of sites of chiasmata and crossing over. They are recognized by immunochemical staining, typically for the protein MutL homologue 1, which is a component of late recombination nodules.

Recursive partitioning

A process for identifying complex relationships in large sets by dividing them into a hierarchy of smaller and more homogeneous subgroups on the basis of the most statistically significant indicators.

Reduced representation bisulphite sequencing

This technique cuts genomic DNA with restriction enzymes to enrich for CG-rich regions, which are then converted through bisulphite treatment and sequenced with next-generation sequencing. Bisulphite treatment converts unmethylated C to uracil — which appears as T in sequencing reads — while leaving methylated C intact.

Regulatory element

Region in genomic DNA that can contribute to gene regulation.

Regulon

A group of transcriptional units or operons that are coordinately controlled by a regulator.

Repeat addition processivity

The ability of telomerase to synthesize multiple telomeric repeats without dissociating from the telomere.

(RAP)

Reproductive cloning

Cloning of entire organisms.

Reproductive fitness

The relative ability of a genotype to pass on its genetic material to the next generation. Often measured as the proportion of offspring generated relative to other genotypes in the population.

Resection

In the context of recombination, strand-biased enzymatic removal of nucleotides at the site of a double-strand break. In most recombination models, resection occurs in the 5′ to 3′ direction.

Restriction enzyme

An enzyme that recognizes a specific nucleotide sequence in DNA and cuts the DNA double strand at that recognition site, often with a staggered cut leaving short single strands or “sticky” ends.

(RE)

Restriction fragment length polymorphism

A fragment length variant in DNA sequences that is generated through the gain or loss of a restriction site owing to a DNA substitution.

(RFLP)

Restriction-modification system

A set of enzymes found in many bacteria and archaea that protects the host genome from genomic parasites. Restriction–modification systems consist of sequence-specific restriction endonucleases, which target invading DNA, and associated DNA methyltransferases with similar recognition sequences, which protect the host genome from the action of the endonucleases.

Retroelement

A mobile genetic element. Its DNA is transcribed into RNA, which is reverse-transcribed into DNA and then inserted into a new location in the genome.

Reverse genetics

A form of genetic analysis that manipulates DNA to disrupt or affect the product of a gene to analyze the gene’s function.

RNA structurome

The full range of RNA structures formed by the transcriptome of an organism.

RNA-sequencing

A method of sequencing cDNA derived from RNA. This approach can be used to sequence both coding and non-coding RNA.

(RNA-seq, RNA sequencing)

Rolling circle amplification

A method of DNA amplification using a circular template. Briefly, DNA polymerase binds to a primed section of a circular DNA template. As the polymerase traverses the template, a new strand is synthesized. When the polymerase completes a full circle and encounters the double-stranded DNA (dsDNA) template, it displaces the template without degradation, thus creating a long ssDNA fragment composed of many copies of the template sequence.

(RCA)

Sanger sequencing

An approach in which dye-labeled normal deoxynucleotides (dNTPs) and dideoxy-modified dNTPs are mixed. A standard PCR reaction is carried out and, as elongation occurs, some strands incorporate a dideoxy-dNTP, thus terminating elongation. The strands are then separated on a gel and the terminal base label of each strand is identified by laser excitation and spectral emission analysis.

Satellites

A subfraction of genomic DNA consisting of short repetitive nucleotide sequences that are repeated a large number of times. These non-coding repeats are important for centromere and heterochromatin construction and separate from the rest of the genomic DNA on a density gradient because of their higher content of AT base pairs.

Scaffolds

Sets of ordered and oriented contigs, with the approximate distances between contigs estimated by traversing paired-end sequences that anchor to different contigs. Scaffolds consist of both sequence contigs and gaps.

Seeds

A short exact, or nearly exact, matching string of characters aligning between two sequences.

Segment-polarity genes

Segmentation genes that are required for patterning the body along the anterior–posterior axis. They are expressed in a pattern of 14 stripes at the onset of gastrulation and following the expression of pair-rule genes.

Selection coefficient

The average proportional reduction in fitness of one genotype relative to another owing to selection.

Selection signature

The molecular footprint of a selection event from the recent past (for example, an excess of rare alleles at a locus relative to the abundance of rare alleles at loci across the rest of the genome).

Selective sweep

The increase in frequency of an allele (and closely linked chromosomal segments) that is caused by selection for the allele. Sweeps initially reduce variation and subsequently lead to a local excess of rare alleles (homozygosity excess) as new unique mutations accumulate.

Seperase

A cysteine protease that cleaves the α-kleisin subunit of cohesin at the onset of anaphase to allow sister chromatid disjunction.

Sequence based

An approach to genetic association studies that is focused on a set of genetic markers, often now called tagging SNPs, which are statistically associated with whichever variants influence the phenotype.

Sequence capture

This uses oligonucleotide microarrays or oligonucleotide-coupled beads to select for regions of the genome, such as all exons (exome sequencing) for targeted sequencing.

Sex-biased genes

Genes that are transcribed at different levels in males and females. Often thought to be a major underlying mechanism for sexually dimorphic phenotypes.

Single nucleotide polymorphism

SNVs that occur in > 1% of individuals in a sampled population are usually referred to as single-nucleotide polymorphisms (SNPs).

(SNP)

Single nucleotide variant

Sequence variations that include insertions and deletions in addition to base substitutions (which are known as SNPs).

(SNV)

Single-end sequencing

In single-end sequencing, a DNA template is sequenced only in one direction.

Single-guide RNA

A single-guide RNA molecule, composed of a CRISPR RNA (crRNA) fused to its corresponding trans-activating CRISPR RNA (tracrRNA) scaffold sequence, that directs the binding and nuclease activity of Cas9 enzymes.

(sgRNA)

Single-pass

The single-molecule real-time (SMRT) sequencing approach from Pacific Biosciences (PacBio) enables a single molecule of DNA to be sequenced multiple times. A single pass is one single iteration through a molecule.

Solution hybrid selection

A method for enriching whole genomic DNA for many regions of interest by hybridization to a complex library of RNA or DNA sequences in solution, followed by retrieval of the annealed hybrids.

Somatic cell nuclear transfer

The process by which the nucleus from an adult cell is transferred into a previously enucleated cell; the reconstructed oocyte is activated, which initiates subsequent development.

(SCNT)

Somatic genetic rescue

In Mendelian disorders, an in vivo somatic genetic event that partially or totally counteracts the deleterious effect of the pathogenic germline mutation and provides a selective advantage over non-somatically modified cells.

SOS response

A complex global response to DNA damage identified in bacteria that includes activation of multiple factors, leading to the stalling of cell division and alteration of DNA replication, recombination and repair to promote genome integrity and cell survival, at the cost of increased mutagenesis.

Spandrels

Features that arise as an unselected byproduct of selectively adaptive features, which are therefore easily co-opted to a new function.

Specialization

A process of improvement of different aspects of gene function in each gene copy, which is driven by positive selection.

Spectral karyotyping

A cytogenetic technique used to simultaneously visualize all chromosomes in a cell by using different fluorescently labelled probes for each chromosome.

(SKY)

Spliceosome

A large RNA–protein complex that catalyses the removal of introns from nuclear pre-mRNA.

Splice-site variant

A variant, usually found at the intron–exon boundary, that alters the splicing of an exon to its surrounding exons.

Stabilizing selection

Selection that favours intermediate phenotypes over extreme phenotypes.

Stepwise regression

The step-by-step build-up of a regression model, which represents a dependent variable as a weighted sum (linear combination) of independent (risk) variables.

Stretching tension

When both ends of a segment of DNA are anchored (for example, by proteins) and the DNA is pulled mechanically, it carries stretching tension coupled with twisting torsion along the helix and can be elongated by up to 70% without disrupting base pairs.

Structural variant

A variation larger than single-nucleotide polymorphisms (SNPs). This can include the insertion or deletion of blocks of DNA, inversions or translocations of DNA segments, and copy-number differences.

Subfunctionalization

The process of the accumulation of degenerate mutations in gene copies that subdivides gene function among the duplicated genes. This term has been introduced to describe the mechanism of the duplication–degeneration–complementation model, but it is often used indiscriminately to describe any subdivision of function.

Subreads

The sequences derived from a single pass as a polymerase traverses a DNA molecule multiple times. A subread is trimmed to exclude any adapter sequence.

Substitutions

Changes in the nucleotide sequences of coding genes that result in changes in the peptide sequence (that is, the replacement of an amino acid). These contrast with silent (or synonymous) changes in coding sequences, which do not result in changes in the peptide.

(Replacement changes)

Supercoils

Twists applied to DNA that can occur in the same (positive) or opposite (negative) orientation to the double helix.

Super-enhancers

Multi-kilobase stretches of regulatory DNA that exhibit unusually strong occupancy of transcription factors and co-factors.

Supergenes

Chromosomal regions that encompass multiple genes that are inherited together because of close genetic linkage. Often supergenes are associated with chromosomal inversions, which prevent recombination with the alternative allele.

Surface exclusion

A process that bars conjugative transfer of a plasmid into recipient cells that already harbour a related plasmid.

Sympatric speciation

Genetic divergence that leads to species formation in the same habitat.

Synaptonemal complex

A proteinaceous structure that forms between pairs of homologous chromosomes during synapsis and facilitates crossover recombination.

Syntenic anchors

Short aligned segments between genome sequences from two species, which are believed to define an orthologous relationship.

Syntenic region

A genomic region that is collinear in the order of genes (or of other DNA sequences) in a chromosomal region of two species.

Synthetic aneuploidy effect

Increased transcriptional activity of regions of the genome where extrachromosomal DNAs (ecDNAs) make a physical connection, similar to the effect of aneuploidy.

Synthetic lethal

A genetic interaction in which the deletion of two genes at the same time results in lethality. An organism in which one gene is deleted and the other gene is present will still be viable.

Systems-based approach

An approach that investigates a biological phenomenon by assaying a wide range of levels of biological organization, from individual proteins to entire cellular networks.

Tag SNP

A SNP chosen from a larger set of available SNPs for use in an association study. Tag SNPs are generally selected on the basis of favourable linkage disequilibrium properties.

Tagging approach

Identifying sub-sets of markers ('tags') that describe patterns of association or haplotypes among larger marker sets.

Tagging SNP

A genetic marker that is correlated to a number of neighbouring variants such that the genetic information it contains is representative of these variants.

Tagmentation

The process by which double-stranded DNA is cleaved by the transposase Tn5, creating short DNA fragments that are simultaneously tagged with PCR adapters. Tagmentation using Tn5 preferentially occurs at accessible or open chromatin and this property is used in ATAC-seq and other related assays.

Tag-SNP portability

The utility of SNPs chosen as tags in one population for use as tags in another population.

T-circles

Extrachromosomal circular DNA molecules that contain telomeric repeat sequences.

Telomere

A short repeat sequence of DNA at the end of chromosomes, which both protects and ensures the complete replication of chromosome ends.

Template

A DNA fragment to be sequenced. The DNA is typically ligated to one or more adapter sequences where DNA sequencing will be initiated.

Template switching

The process by which RNA templates are switched between viral genomes during reverse transcription.

Test statistic

The summary value (often a summary statistic) of a data set that is compared with a statistical distribution to determine whether the data set differs from that expected under a null hypothesis.

Tethers

These are cis-regulatory elements that function to bring together distal DNA elements.

Threshold traits

Quantitative traits that are discretely expressed in a limited number of phenotypes (usually two), but which are based on an assumed continuous distribution of factors that contribute to the trait (underlying liability).

Topoisomerases

A class of enzymes that are able to cleave one or both strands of DNA to release topological stress on DNA duplex, and to link or unlink, knot or unknot associated DNA molecules.

Topologically associating domains

These are defined on population-level contact-frequency maps as domains of higher interaction frequency within a region than between regions.

(TAD, TADs)

Transcription factories

Molecular complexes consisting of extrachromosomal DNAs (ecDNAs) and transcription machinery components, with high transcriptional activity of ecDNA sequences.

Transcriptional consistency

The uniformity of gene expression in a cell population, defined as a low variance in expression when scaled to the average level of expression.

Transcriptionally quiescent

Describes a cellular state in which very low to no active gene expression is observed, for example, in fully differentiated gametes.

Transduction

The transfer of genetic information from one bacterial or archaeal cell to another by a phage particle containing chromosomal DNA.

Transfection

Transfection is a procedure that introduces foreign nucleic acids into cells to produce genetically modified cells.

Transformation

Genetic alteration of a cell resulting from the acquisition of genes from free DNA molecules in the surrounding environment.

Translesion synthesis polymerases

Polymerases that can catalyze DNA polymerization at damaged templates during replication and/or repair, although often with lower fidelty than replicative polymerases.

Transposable elements

DNA sequences in the genome that replicate and insert themselves into various positions in the genome.

(TE, transposons, mobile elements)

Transvection

The ability of a gene on one chromosome to influence the activity of an allele on the opposite chromosome when the chromosomes are paired.

TUNEL staining

A terminal deoxyuridine 5′-triphosphate nick-end-labelling assay. It involves the enzymatic labelling of the 3′ ends of partially degraded DNA in a cell undergoing apoptosis (and some other forms of cell death).

Two hybrid assay

An assay system in which one protein is fused to an activation domain and the other to a DNA-binding domain, and both fusion proteins are expressed in cells. Expression of a reporter gene indicates that the two proteins physically interact.

Two-base-encoded probes

Oligonucleotides that contain two adjacent interrogation bases in a known position. The bases correspond to a fluorescent label on each probe. The remaining bases are either degenerate (any of the four bases) or universal (unnatural bases with nonspecific hybridization) allowing the probe to interact with many different possible template sequences.

Two-fluorophore system

A system in which bases are discriminated by labelling Cs and Ts with a red or green fluorophore, respectively. Each A base is labelled with either a red or green fluorophore, but the two populations are mixed. During base discrimination, clusters that are either red or green are called either C or T, whereas clusters with a red and green mixed signal are called A. The G base is unlabelled, thus any cluster without a fluorophore signal is called G.

Unequal sister chromatid exchange

A mitotic crossover event that leads to the exchange of genetic material between homologous chromosomes and is also a major repair pathway for double-strand breaks.

Uniparental disomes

Both copies of a chromosome derived from one parent.

Uniparental isodisomy

Refers to both copies of a chromosome originating from one parent (maternal or paternal) and the chromosome from the other parent being absent. Segmental uniparental isodisomy occurs when only part of a chromosome is affected.

Univalent

An unpaired chromosome at metaphase I: usually one that has failed to synapse or recombine with its homologue.

Unphased diploid data

Sequence data in which the phase of double heterozygotes was not determined.

Upstream open reading frame

Upstream open reading frames (uORFs) are cis-acting elements located within the 5'-leader sequence of transcripts and are defined by an initiation codon in-frame with a termination codon located upstream or downstream of its main ORF (mORF) initiation codon.

(uORF, uORFs)

Vegetative nucleus

The nucleus of a terminally differentiated vegetative cell. It does not contribute genetic information to subsequent generations.

Virulence factor

A gene responsible for the production of a molecule that contributes to the establishment of disease by bacterial pathogens.

Virulence plasmid

A plasmid that carries virulence factor genes or pathogenicity islands.

Whole-exome and targeted sequencing

Sequencing of only exons or other selected regions. A system of capture or amplification is used to isolate or enrich for only exons or target regions. This is done by designing probes or primers for the regions of interest.

Whole-genome sequencing

Sequencing of the entire genome without using methods for sequence selection.

(WGS)

Zygotic genome activation

The stage of development, which can vary widely between species, at which expression of the embryonic genome is strongly activated and thus control of development transfers from the maternal to the embryonic contribution.

Genomics & Bioinformatics Dictionary: Genomics

How to Use This Page