Construct Design

Some general guidelines and useful tips for the design of transgenic constructs and gene targeting constructs.


Transgenes are linear pieces of DNA, usually cloned in a plasmid vector. A transgene generally contains a promoter, a cDNA, an intron, and a polyadenylylation signal. To minimize interference from the bacterial vector sequences, the transgene is usually excised from the plasmid vector before injection.  When injected into the pronuclei of fertilized mouse eggs, they are incorporated into random loci, usually as head-to-tail concatemers consisting of varying numbers (1 >1000) of copies. Most of the time, integration takes place at a single locus and is present in all cells of the resulting mouse. In a small percentage of cases, mice may have multiple integration sites, or may contain the transgene in only a portion of their cells.

When a cDNA is used, an intron of some sort is added to stimulate the transport of mRNA out of the nucleus, since this process is coupled to the splicing process. The artificial intron should be placed at the 3-prime end of the cDNA.

Some variations on this theme include:

  • BAC transgenes – these allow insertion of entire genes, including all introns and possibly unknown promoter and enhancer elements, as well as “insulator” sequences that allow expression of the gene even if the transgene is integrated into heterochromatin.
  • Promoterless transgenes – these can act as gene-trap or promoter-trap vectors and generally contain cDNA for a reporter gene such as lacZ or GFP.
  • IRES – Internal Ribosome Entry Sites are added, along with a second cDNA, to achieve expression of two proteins under the control of the same promoter. The second protein could be a reporter such as beta-galactosidase or GFP. In the TMF's experience, ribosomal pausing sequences (e.g. 2A) produce better expression of the downstream coding sequence compared with IRES sequences.
  • Fusion proteins – to allow easier detection of the protein product, by fusion with either a reporter gene or an affinity tag.


The choice of which promoter is best for a given transgene will depend on the exact aims of the research.

Most transgenes are designed to result in over-expression of a protein. Thus, constitutive promoters such as those from the X-linked phosphoglycerate kinase-1 locus (PGK), or the human cytomegalovirus (CMV) immediate early promoter enhancers fused to a minimal chicken beta-actin gene promoter (CAG) are often used to drive expression in all tissues.

Tissue-specific promoters can be used to limit the spatial expression pattern, while inducible promoters can be used to control the timing of expression. Developmentally regulated promoters can be used for control of timing as well.

The most commonly used inducible promoters are turned on or off by tetracycline or its analog, doxycycline. The so-called Tet-On and Tet-Off vector systems are commercially available from Clontech. These systems consist of two transgenes that are injected separately, and the complete system is reconstituted by mating the two lines of transgenic mice.  Another popular system for induction of gene expression uses the tamoxifen-inducible ROSA26-CRE-ERt2 line of mice in which sequences between two loxP sites can be removed or inverted following administration of tamoxifen to an animal to activate cre recombinase function.

Gene Targeting Constructs

Gene targeting contructs are designed to undergo homologous recombination into a specific locus chosen by the investigator, usually with the aim of disrupting the gene to prevent transcription of a functional mRNA (a knock-out), or mutating the gene (a knock-in).

The number of ways in which gene targeting constructs can be designed to produce knock-out or knock-in mice is almost limitless. Thus, we can only offer some general guidelines here, and urge people making their first construct to read the literature and consult with the TMF or other experienced investigators before commencing a project.

The simplest targeting construct consists of 2 long segments of genomic DNA (gDNA), called homology arms, flanking a selection cassette. The most commonly used selection cassette consists of the cDNA and control elements for the aminoglycoside phosphotransferase (G418 or kanamycin resistance) gene (others include resistance genes for puromycin and hygromycin).

When introduced into mouse embryonic stem (ES) cells by electroporation, the gDNA homology arms can undergo a double-reciprocal recombination event with their matching sequences on one chromosome, carrying the selection cassette with them. The gDNA between the regions of homology on the chromosome is thereby replaced by the selection cassette. Where a complete knockout is desired, the sequence being replaced usually includes the TATA box, the start codon, and one or more of the initial exons.  However, for various reasons it is often desirable to delete downstream exons (e.g., to avoid disrupting the promoter).

In many of the electroporated cells, the targeting construct will integrate into a random locus. Any integration event, random or specific, can confer drug resistance to the cell. After growing the transfected cells under selection, the challenge is to screen sufficient clones to find the rare homologous recombination events in a background of frequent random integrants.

We recommend the use of both positive and negative selection cassettes in all targeting constructs. One commonly used negative selection cassette contains the gene for a viral thymidine kinase (tk). The viral tk gene product allows growing cells to incorporate a toxic nucleotide analog into their DNA, thus selecting against those cells. The tk cassette is cloned into the targeting construct outside of the homology arms, so that it will not be incorporated during homologous recombination. It will be incorporated during most random integrations and help to select against those clones. Another negatively selectable marker is the gene for diphtheria toxin A (DTA). The A subunit inhibits protein synthesis but cannot be taken up by other cells. The advantage of DTA over tk is that it works without having to add a second drug to the culture medium.

Homology Arms

The degree to which the homology arm sequences match the locus of interest will help determine the frequency of homologous recombination. The three most important characteristics of homology arms are:

  1. Length – we recommend an overall length of about 7 kilobases, with one arm being 5-6 kb and the other being 1-2 kb. Longer is better, but one is usually limited by the capacity of the cloning vector and the need to maintain a unique restriction enzyme site that can be used to linearize the construct prior to transfection into ES cells.
  2. Sequence homology – whenever possible, clone the homology arms from the genome of the ES cells that will be targeted, or from the mouse strain they were derived from. Long-range PCR with a high-fidelity polymerase can be an effective method for subcloning the homology arms, if done correctly.  Longer homology arms can compensate for sequence mismatches.  Partly for this reason, one partner in the KOMP consortium, Regeneron Pharmaceuticals, does all of its high-throughput targeting with BAC ciones prepared by recombineering.
  3. Limit repetitive sequences – we recommend using the on-line program, RepeatMasker, to search for repetitive sequences in the homology arms. Large regions of repetitive DNA should be avoided, because these will result in a lower frequency of homologous recombination.  Regions of repetitive DNA can also be displayed in Ensembl.  The best targeting constructs contain no repetitive DNA.

Conditional Targeting

One problem with a simple, constitutive targeting strategy is that many genes have multiple functions, or are active in multiple tissues and/or at multiple stages of development. Many knockouts of genes with no known roles in development have resulted in embryonic lethality, preventing the study of the gene’s role in the phenotype of an adult animal. To get around this problem, techniques have been developed to allow the investigator to determine when and where the knockout occurs. Conditional targeting constructs employ recombinase recognition sequences, tissue-specific promoters, developmentally-specific promoters, or inducible promoters (or a combination of these) to limit and control the spatial and temporal expression of the knockout or knock-in phenotype.

The TMF strongly endorses making conditional mutant alleles as these offer considerably more flexibility than a conventional non-conditional null allele.

Similar to restriction enzymes, Cre, Dre, Flp, and other recombinases act on unique DNA recognition sequences.  For Cre, a recombinase from the bacteriophage P1, the recognition sequence, loxP, is 34 base pairs and consists of a "core" sequence of 8bp, flanked by 13bp inverted repeats.  The core sequence is non-palindromic, so loxP sites have directionality and are usually diagrammed as arrowheads.  Illustrated below are two kinds of reactions catalyzed by Cre, which are distinguished by the orientation of the loxP sites to each other.

I. Excision reaction:

When the loxP sites are oriented in the same direction, Cre brings the two loxP sites together and removes the intervening DNA along with one loxP site, in the form of a circular molecule.  The circular DNA product is then degraded by the cell, so this reaction is essentially irreversible.  However, driving the reaction in reverse is possible and allows the insertion of circular DNA molecules containing a loxP site into a locus containing a single loxP site.

The product of the excision reaction above is an allele lacking exon 2, which could produce a partially active protein product.  However, if splicing between exons 1 and 3 results in a frameshift, the resulting mRNA is likely to be destroyed by the nonsense-mediated decay (NMD) pathway, resulting in a null phenotype.

II.  Inversion reaction:

When the loxP sites are oriented in opposite directions, Cre will invert the intervening sequence and leave both loxP sites intact.  With wildtype loxP sites, this reaction is reversible.  To make the reaction irreversible, mutant loxP sites have been developed such that, after Cre acts on them, they will be inactive.  This strategy is made possible by the fact that Cre's mechanism involves pairing the loxP sites with each other, cleaving them within the 8bp core sequence, and causing a crossover at the cleavage site.  When mutant loxP sites are used, the crossover produces an 8bp core that is no longer recognized by Cre.

Most modern targeting constructs, including many of those used by members of the International Knockout Mouse Consortium (KOMP, EUCOMM, NorCOMM, etc.), use both Cre and Flp recognition sites.  One common motif of such constructs is illustrated in the following diagram, where the neomycin resistance cassette is flanked by frt sites while exon 2 is flanked by loxP sites.

The reason for removing the neomycin resistance cassette is that, even though it resides in an intron, it can have unpredictable effects on the expression of the gene of interest and other nearby genes. For example, it may stimulate inappropriate mRNA splicing events, or its promoter may affect downstream genes.  (These effects can also be mitigated by inserting the neo cassette in a reverse orientation to the gene of interest.)

Normally, chimeric mice are made by injecting blastocysts with ES cells that have been targeted with the first allele shown above (containing 2 frt sites and 2 loxP sites).  If the chimeras are crossed with Flp-expressing transgenic mice, some of their offspring will have the second allele pictured above (one frt site and 2 loxP sites).  These animals are said to have a floxed gene, meaning the gene of interest has one or more exons flanked by loxP sites (flox = flanked by loxP).  (The Flp reaction can also be accomplished in vitro, by electroporating a Flp-expressing plasmid into the ES cells.  However, we do not recommend this strategy because it subjects the ES cells to additional time in culture that may compromise their ability to contribute to the germline of chimeras.)

Except for the presence of the frt site and 2 loxP sites, the floxed mice have an otherwise completely wildtype allele that can be knocked out by the expression of Cre inside their cells (assuming removal of the second exon results in a knockout). Breeding a floxed mouse with a mouse that expresses Cre from a transgene will produce some offspring that inherit both the floxed allele and the transgene. These mice will lack the second exon of the gene of interest on one chromosome, due to the action of Cre on the two loxP sites, as illustrated in the third allele above. Further crosses between floxed mice and flox/Cre mice will result in a homozygous knockout.

The real power of this system becomes apparent when one considers the fact that multiple lines of Cre-expressing mice are available, with different promoters driving expression of Cre. If the promoter is neuron-specific, for example, the knockout will occur only in neurons. Cre expression itself can be made conditional (e.g., by using a tetracycline-responsive promoter), allowing the exact time at which the gene is knocked out to be determined by the investigator.

The above examples are only meant as a brief introduction to the concept of conditional targeting.  Many variations on the same themes can be found in the literature and on the websites of members of the International Knockout Mouse Consortium.  Please see our listing of online resources for more information.