Data

What is a Microsatellite?

Microsatellites consist of short, tandem repeats of DNA sequence. The sequence CAGCAGCAGCAG is a tri-nucleotide microsatellite with the motif CAG repeated four times.

Variation in the number of repeats in a microsatellite arises due to DNA-polymerase errors during the DNA replication process. During DNA replication, DNA-polymerase moves along a DNA sequence and adds complementary bases to the template strand. When this template is highly repetitive, as in a microsatellite sequence, the DNA-polymerase may "hiccup" and move forward or backward one full repeat before continuing replication.

For example if we begin with the microsatellite CAGCAGCAGCAG and DNA-polymerase moves back three nucleotide positions before it continues replication, this would result in a microsatellite with five repeats CAGCAGCAGCAGCAG or (CAG)5. If DNA-polymerase moved forward three nucleotide positions this would result in a microsatellite with three repeats instead of four.

This process of gaining or losing a single repeat of a motif is called the stepwise mutation process. DNA-polymerase replication errors of highly repetitive sequences are fairly common thus the mutation rate in microsatellites, particularly microsatellites with large numbers of repeats, is very high.

Differences in the number of motif repeats for a particular microsatellite are also called alleles. Unlike a single nucleotide polymorphism (SNP) allele which can only be one of four possible states (A, C, G or T) there are a large number of possible motif repeats for a particular microsatellite which means there are countless possible alleles.

Microsatellites are considered to be neutral markers because unlike other parts of the genome, they do not code for proteins and thus we can assume they are not under selection. High mutation rate and large number of alleles make microsatellites especially useful molecular markers for population genetic and parentage assignment studies.

Microsatellites are characterized in individuals using polymerase-chain reactions (PCR) to amplify the desired genome region from a DNA sample. The PCR makes thousands of copies of the microsatellite, which are labeled with fluorescent markers. These sequences can then be visualized by a DNA analyzing machine, which separates fragments according to length. The process is conceptually similar to an agarose gel where shorter DNA fragments move faster through the gel matrix than longer fragments. Since microsatellites with varying numbers of repeats vary in total sequence length the alleles are actually scored according to length polymorphism (figure 1).

Micro.jpg

Figure 1: When characterizing microsatellites, the DNA analyzing machine reports fluorescent peaks corresponding to a particular sequence length. Though the actual DNA sequence remains unknown we can infer the number of repeats, and thus the alleles present, based on the total sequence length.

 

QUESTIONS:

1.      What are some of the advantages to using microsatellite markers?

2.      Can you think of any potential disadvantages?

3.      For which sorts of studies are microsatellites most appropriate? Why?

 

ANSWERS:

1.      Microsatellites are very polymorphic because of their rapid mutation rate thus they can provide a lot of genetic information. Microsatellites are also useful because they are neutral markers therefore selection should not interfere with our ability to infer population history. Finally, microsatellites are relatively inexpensive to quantify, even in non-model organisms, so they can be applied to a wide variety of systems.

2.      Microsatellites are limiting in that they sometimes violate the stepwise mutation process, a basic assumption for most population genetic analyses. Furthermore, since mutations can either add or remove tandem repeat units, homoplasy between alleles is possible, particularly when comparing distantly related groups.

3.      Due to the potential for homoplasy between alleles, microsatellites are most appropriate for studies on recent evolutionary timescales such as recent population subdivision or paternity assignment studies.

 

 

 

 

 

 

What is a Heterozygote?

Diploid organisms have two copies of each chromosome, one from each parent. If the each parent contributes a chromosome with the same allele for a particular microsatellite the offspring will have two identical copies of this allele and is called a homozygote. A heterozygote is an individual with two different alleles for the same microsatellite and arises when the two parents contribute different versions of an allele (figure 2).

Heterozygote.jpg

Figure 2: Formation of homozygous and heterozygous diploid offspring.

 

If we know the frequency of two alleles in a population (where p is the frequency of allele one and q is the frequency of allele two), we can estimate the expected frequency of heterozygotes and homozygotes in the population using the Hardy-Weinberg relationship:

p2 + 2pq + q2 = 1

 

p2 is the proportion of the population homozygous for allele one

2pq is the proportion of heterozygotes in the population

q2 is the proportion of the population homozygous for allele two

 

 

 

For example if allele (CAG)3 (or ÒpÓ) has a frequency of 0.7 and (CAG)4 (or "q") has a frequency of 0.3 we simply plug in the numbers:

(0.7) 2 + 2(0.7)(0.3) + (0.3) 2 = 1

As long as the assumptions for Hardy-Weinberg equilibrium are met (random mating, infinite population size, no selection, no new mutations, and no migration) we expect 49% of the population to be homozygous (CAG)3/(CAG)3, 42% of the population to be heterozygous (CAG)3/(CAG)4 and 9% to be homozygous (CAG)4/(CAG)4.

 

QUESTION:

1.      Which allele frequencies for p and q will maximize the Hardy-Weinberg expected proportion of heterozygotes? Which allele frequencies will minimize the expected proportion of heterozygotes?

 

ANSWER:

1.      Heterozygosity is maximized when the allele frequencies p and q are 0.5 (2pq= 2(0.5)(0.5)= 0.50) and minimized when either p or q are at low frequencies. For example if the frequency of p is 0.99 and the frequency of q is 0.01 the expected proportion of heterozygotes is approximately 2%.

 

 

What is a Population Bottleneck?

When populations are under strong natural selection or artificial selection, only a subset of individuals in the population will reproduce therefore relatively few individuals contribute alleles to subsequent generations. Alleles for gene-regions that are not under selection are present in the post-selection population as a random subset of the original allelic diversity. The probability of an allele being present in subsequent generations is equivalent to its frequency in the original population therefore high frequency alleles have a greater probability of being present in the post-selection population than low frequency alleles.

If selection pressure lasts for many generations, rare alleles will be lost simply by chance resulting in a post-selection population with fewer alleles and lower heterozygosity than the original population. This process is referred to as a population bottleneck (figure 3). The overall loss of genetic diversity is proportional to the strength and duration of the selection event.

We can detect population bottlenecks in populations using molecular markers such as microsatellites to observe changes in allelic diversity and heterozygosity within populations. In particular we expect fewer alleles, monomorphic loci, smaller allelic size ranges, lower genetic diversity and less heterozygosity in populations that have undergone severe, prolonged bottlenecks.

Bottleneck.jpg

Figure 3: Strong selection in a population leads to dramatic shifts in allele frequencies in the post-selection population.

QUESTION:

1.      What processes other than artificial or natural selection could lead to a population bottleneck?

ANSWER:

1.      Demographic processes, such as a population that is founded by a small number of individuals (i.e. an island colonization event) or a population that experiences strong reduction in population size due to a natural phenomenon (i.e. any random, mass mortality event) may result in a loss of low frequency alleles and a decrease in heterozygosity as a result of genetic drift.   

 

Population bottlenecks in Dogs:

Dogs were probably domesticated from Eurasian wolves approximately 15,000-40,000 years ago. From the time of domestication, strong artificial selection for specific traits has resulted in the hundreds of dog breeds we recognize today.

 

Unfortunately it is difficult to understand the process of domestication and the genetics of breed formation without knowing anything about the genetic composition of the original domesticated, or ancestral, dog population. Populations of semi-feral, village or indigenous dogs from across the world may provide the best representation of the ancestral gene pool. In contrast with a population that is simply a mix of multiple dog breeds, we expect a true indigenous dog population would show genetic variation that is not present in any of the current dog breed populations.

 

Since we expect indigenous dog populations have not undergone artificial selection and we know dog breeds have undergone extreme artificial selection, we can compare genetic diversity across populations to check for the presence of severe population bottlenecks in breed dogs. We can also compare different sized populations of indigenous dogs to check whether there is evidence of minor population bottlenecks in smaller populations due to demographic processes.

 

 

QUESTION:

1.      Based on the evolutionary histories of the following three dog populations, which do you expect will have the lowest genetic diversity? Which will have the highest genetic diversity? Why?

 

1.      Namibia-North Population: a large population of village dogs in Namibia

2.      Egypt-Kharga Population: a small, isolated population of village dogs in the Kharga Oasis of Sarahan Egypt

3.      Basenji Breed: one of the most ancient dog breeds, originates from Central Africa

 

ANSWER:

1.      We expect Namibia-North the have the greatest genetic diversity because it is a large population of indigenous dogs that has probably been minimally affected artificial selection and genetic drift. Egypt-Kharga should have an intermediate level of genetic diversity because though it has not undergone artificial selection, it has probably experienced genetic drift because of its degree of isolation and small population size. Finally, we expect the Basenji breed to exhibit the least genetic diversity because it has sustained strong selection and small population size.

 

Putting it all together:

Now let's test the predictions with real data. The attached input file consists of 20 microsatellite markers, which have been typed for five individuals from each of the three populations. Using the population genetic program Arlequin, obtain estimates of the following parameters to recreate the evolutionary history of each of these three dog populations.

1.      Mean number of alleles

2.      Number of polymorphic loci

3.      Allelic size range

4.      Average gene diversity

5.      Observed heterozygosity

Note: If you do not have access to this program you can also download the attached output file from Arlequin and retrieve the relevant results.

QUESTIONS:

1.      Do the results satisfy your predictions? Why or why not?

2.      Is there evidence of a strong population bottleneck in any of the populations?

 

 

(FYI) Relevant Results:

Population

Mean # Alleles

# Polymorphic Loci

Allelic Size Range

Avg. Gene Diversity

Observed Heterozygosity

Namibia North

3.65

20

10.5

0.62111

0.61000

Egypt- Kharga

2.85

20

7.75

0.53778

0.56000

Basenji

2.05

15

8.867

0.35667

0.45333

 

 

© Carlos D. Bustamante Lab. All Rights Reserved.