What is a Microsatellite?
Microsatellites consist of short, tandem repeats of DNA
sequence. The sequence CAGCAGCAGCAG is a
tri-nucleotide microsatellite with the motif CAG repeated four times.
Variation in the number of repeats in a microsatellite arises
due to DNA-polymerase errors during the DNA replication process. During DNA
replication, DNA-polymerase moves along a DNA sequence and adds complementary
bases to the template strand. When this template is highly repetitive, as in a
microsatellite sequence, the DNA-polymerase may "hiccup" and move forward or
backward one full repeat before continuing replication.
For example if we begin with the microsatellite CAGCAGCAGCAG and DNA-polymerase
moves back three nucleotide positions before it continues replication, this
would result in a microsatellite with five repeats CAGCAGCAGCAGCAG or (CAG)5. If DNA-polymerase moved forward
three nucleotide positions this would result in a microsatellite with three
repeats instead of four.
This process of gaining or losing a single repeat of a motif
is called the stepwise mutation process. DNA-polymerase replication
errors of highly repetitive sequences are fairly common thus the mutation rate
in microsatellites, particularly microsatellites with large numbers of repeats,
is very high.
Differences in the number of motif repeats for a particular
microsatellite are also called alleles. Unlike a single nucleotide
polymorphism (SNP) allele which can only be one of four possible states (A, C, G or T) there are a large number of possible motif repeats
for a particular microsatellite which means there are countless possible
alleles.
Microsatellites are considered to be neutral markers
because unlike other parts of the genome, they do not code for proteins and
thus we can assume they are not under selection. High mutation rate and large
number of alleles make microsatellites especially useful molecular markers for
population genetic and parentage assignment studies.
Microsatellites are characterized in individuals using polymerase-chain
reactions (PCR) to amplify the desired genome region from a DNA sample. The
PCR makes thousands of copies of the microsatellite, which are labeled with
fluorescent markers. These sequences can then be visualized by a DNA analyzing
machine, which separates fragments according to length. The process is
conceptually similar to an agarose gel where shorter DNA fragments move faster
through the gel matrix than longer fragments. Since microsatellites with
varying numbers of repeats vary in total sequence length the alleles are
actually scored according to length polymorphism (figure 1).

Figure 1: When characterizing microsatellites, the DNA
analyzing machine reports fluorescent peaks corresponding to a particular
sequence length. Though the actual DNA sequence remains unknown we can infer
the number of repeats, and thus the alleles present, based on the total
sequence length.
QUESTIONS:
1.
What are some of the advantages to using microsatellite markers?
2.
Can you think of any potential disadvantages?
3.
For which sorts of studies are microsatellites most appropriate? Why?
ANSWERS:
1.
Microsatellites are very polymorphic because of their rapid mutation
rate thus they can provide a lot of genetic information. Microsatellites are
also useful because they are neutral markers therefore selection should not
interfere with our ability to infer population history. Finally,
microsatellites are relatively inexpensive to quantify, even in non-model
organisms, so they can be applied to a wide variety of systems.
2.
Microsatellites are limiting in that they sometimes violate the stepwise
mutation process, a basic assumption for most population genetic analyses.
Furthermore, since mutations can either add or remove tandem repeat units, homoplasy
between alleles is possible, particularly when comparing distantly related
groups.
3.
Due to the potential for homoplasy between alleles, microsatellites are
most appropriate for studies on recent evolutionary timescales such as recent
population subdivision or paternity assignment studies.
What is a Heterozygote?
Diploid organisms have two copies of each chromosome,
one from each parent. If the each parent contributes a chromosome with the same
allele for a particular microsatellite the offspring will have two identical
copies of this allele and is called a homozygote. A heterozygote
is an individual with two different alleles for the same microsatellite and
arises when the two parents contribute different versions of an allele (figure
2).

Figure 2: Formation of homozygous and heterozygous diploid
offspring.
If we know
the frequency of two alleles in a population (where p
is the frequency of allele one and q is the
frequency of allele two), we can estimate the expected frequency of
heterozygotes and homozygotes in the population using the Hardy-Weinberg
relationship:
p2 + 2pq + q2 = 1
p2 is the proportion of the population
homozygous for allele one
2pq is the
proportion of heterozygotes in the population
q2 is the proportion of the population
homozygous for allele two
For example if allele (CAG)3 (or ÒpÓ)
has a frequency of 0.7 and (CAG)4 (or
"q") has a frequency of 0.3 we simply plug in the numbers:
(0.7) 2
+ 2(0.7)(0.3)
+ (0.3) 2 = 1
As long as the assumptions for Hardy-Weinberg equilibrium
are met (random mating, infinite population size, no selection, no new
mutations, and no migration) we expect 49% of the population to be homozygous (CAG)3/(CAG)3,
42% of the population to be heterozygous (CAG)3/(CAG)4 and 9% to be homozygous (CAG)4/(CAG)4.
QUESTION:
1.
Which allele frequencies for p and q will maximize the Hardy-Weinberg expected
proportion of heterozygotes? Which allele frequencies will minimize the
expected proportion of heterozygotes?
ANSWER:
1.
Heterozygosity is maximized when the allele frequencies p and q are 0.5
(2pq=
2(0.5)(0.5)= 0.50) and minimized when either p
or q are at low frequencies. For example if
the frequency of p is 0.99 and the frequency
of q is 0.01 the expected proportion of
heterozygotes is approximately 2%.
What is a Population Bottleneck?
When populations are under strong natural selection or
artificial selection, only a subset of individuals in the population will
reproduce therefore relatively few individuals contribute alleles to subsequent
generations. Alleles for gene-regions that are not under selection are present
in the post-selection population as a random subset of the original allelic
diversity. The probability of an allele being present in subsequent generations
is equivalent to its frequency in the original population therefore high
frequency alleles have a greater probability of being present in the
post-selection population than low frequency alleles.
If selection pressure lasts for many generations, rare
alleles will be lost simply by chance resulting in a post-selection population
with fewer alleles and lower heterozygosity than the original population. This
process is referred to as a population bottleneck (figure 3). The
overall loss of genetic diversity is proportional to the strength and duration
of the selection event.
We can detect population bottlenecks in populations using
molecular markers such as microsatellites to observe changes in allelic
diversity and heterozygosity within populations. In particular we expect fewer
alleles, monomorphic loci, smaller allelic size ranges, lower genetic diversity
and less heterozygosity in populations that have undergone severe, prolonged
bottlenecks.

Figure 3: Strong selection in a population leads to dramatic
shifts in allele frequencies in the post-selection population.
QUESTION:
1. What
processes other than artificial or natural selection could lead to a population
bottleneck?
ANSWER:
1. Demographic
processes, such as a population that is founded by a small number of
individuals (i.e. an island colonization event) or a population that
experiences strong reduction in population size due to a natural phenomenon
(i.e. any random, mass mortality event) may result in a loss of low frequency
alleles and a decrease in heterozygosity as a result of genetic
drift.
Population bottlenecks in Dogs:
Dogs were
probably domesticated from Eurasian wolves approximately 15,000-40,000 years
ago. From the time of domestication, strong artificial selection for specific
traits has resulted in the hundreds of dog breeds we recognize today.
Unfortunately
it is difficult to understand the process of domestication and the genetics of
breed formation without knowing anything about the genetic composition of the
original domesticated, or ancestral, dog population. Populations of
semi-feral, village or indigenous dogs from across the world may provide
the best representation of the ancestral gene pool. In contrast with a
population that is simply a mix of multiple dog breeds, we expect a true
indigenous dog population would show genetic variation that is not present in
any of the current dog breed populations.
Since we
expect indigenous dog populations have not undergone artificial selection and
we know dog breeds have undergone extreme artificial selection, we can compare
genetic diversity across populations to check for the presence of severe
population bottlenecks in breed dogs. We can also compare different sized
populations of indigenous dogs to check whether there is evidence of minor
population bottlenecks in smaller populations due to demographic processes.
QUESTION:
1.
Based on the
evolutionary histories of the following three dog populations, which do you
expect will have the lowest genetic diversity? Which will have the highest
genetic diversity? Why?
1.
Namibia-North
Population: a large population of village dogs in Namibia
2.
Egypt-Kharga
Population: a small, isolated population of village dogs in the Kharga Oasis of
Sarahan Egypt
3.
Basenji
Breed: one of the most ancient dog breeds, originates from Central Africa
ANSWER:
1. We
expect Namibia-North the have the greatest genetic diversity because it is a
large population of indigenous dogs that has probably been minimally affected
artificial selection and genetic drift. Egypt-Kharga should have an
intermediate level of genetic diversity because though it has not undergone
artificial selection, it has probably experienced genetic drift because of its
degree of isolation and small population size. Finally, we expect the Basenji
breed to exhibit the least genetic diversity because it has sustained strong
selection and small population size.
Putting it all together:
Now let's test the predictions with real data. The attached
input file consists of 20 microsatellite markers, which have been typed for
five individuals from each of the three populations. Using the population genetic
program Arlequin, obtain estimates of the following parameters to
recreate the evolutionary history of each of these three dog populations.
1.
Mean number of alleles
2.
Number of polymorphic loci
3.
Allelic size range
4.
Average gene diversity
5.
Observed heterozygosity
Note: If you do not have access to this program you can
also download the attached output file from Arlequin and retrieve the relevant
results.
QUESTIONS:
1. Do
the results satisfy your predictions? Why or why not?
2. Is
there evidence of a strong population bottleneck in any of the populations?
(FYI) Relevant Results:
|
Population
|
Mean #
Alleles
|
#
Polymorphic Loci
|
Allelic
Size Range
|
Avg. Gene
Diversity
|
Observed
Heterozygosity
|
|
Namibia
North
|
3.65
|
20
|
10.5
|
0.62111
|
0.61000
|
|
Egypt-
Kharga
|
2.85
|
20
|
7.75
|
0.53778
|
0.56000
|
|
Basenji
|
2.05
|
15
|
8.867
|
0.35667
|
0.45333
|