population structure (population subdivision)
when many populations are grouped into smaller sub-pops where mating primarily occurs
Fis
average deviation in heterozygosity within subpops
can be due to selection, drift, mating system, etc
can be considered an "avg" of inbreeding coefficients
= Hs - Hi / Hs
Fst
the average deviation in heterozygosity in subpops relative to the total
varies b/t 0 and 1
**can determine if there is subdivision in a pop since its due to only the effect of subdivision. it is influenced by anything that affects migration b/t subpops.
Fit
the average deviation in heterozygosity in the total population (individual relative to total)
varies between -1 and 1
**due to all effects
= Ht - Hi / Ht
how do you interpret the measures of Fst?
if Fst = 0.09, we can say that 9% of the total genetic variation is attributable to genetic differences between subpopulations (population structure)
what influences Fst values estimated from a single locus?
-chance sampling
-selection
-close linkage to a selected locus
can you get better Fst estimates from one or multiple loci? why?
better estimates are obtained by averaging over many loci. this is because Fst values estimated from a single locus is influenced by chance sampling, selection, and close linkage to a selected locus.
However, we expect all of multiple loci to be affected
what methods are used to look at population structure?
-algorithm STRUCTURE
-visualize population structure
-principal components analysis (PCA)
STRUCTURE
an algorithm that aims to identify hidden structure in data by taking many unlinked markers (SNPs, microsatellites) and groups individual genotypes into random/hidden possible populations
-estimate the K of hidden pop allele frequencies
-initially make a
visualizing population structure
a recent method to look at population structure through visualization
-vertical lines represent individuals
-inferred pops are each a different color
-some indvs may have more than one color, this is to express that they could be from 1 of those pops (adm
PCA (principal components analysis)
a way to look at population structure by comparing variation among SNPs. a data analysis that looks for correlation among SNPs and shows how similar SNPs are. dots closer together are more similar
drift causes populations to....
while gene flow....
drift causes pops diverge and increases ibd within populations
gene flow prevents divergence and decreases ibd within populations
drift migration equilibrium Fst
Feq = 1 / 1 + 4Nm
decreases as migrant number increases
-decrease is very rapid
-5 migrants/ generation is enough to ensure that subpops are almost identical
a _____ (small / large) amount of gene flow is necessary to make 2 pops behave as a single evolutionary lineage
small
**dependent on the effective number of migrants (Nm)
Nm
the absolute number of migrant organisms coming into each subpop each generation, the effective number of migrants
why is differentiation between 2 subpops dependent only on effective number of migrants...?
-opposing forces of drift and migration
-drift/divergence is slow in large populations (big N) , so small amounts of gene flow (m) counterbalances divergence
-drift is stronger in small populations (small N) and larger rates of gene flow (m) are needed to
drift-migration equibibrium heterozygosity (Heq)
0
what are the effects of gene flow?
brings new alleles into a population
-has a similar effect to mutation
-increases variation within a population
-the rate of gene flow is much greater than the rate of mutation
decreases variation between populations
why should we estimate lvls of gene flow? what can we apply it to?
-help us understand the possible effects of releasing GMOs and the risk of passing modified genes to natural/non-GMO pops
-gene flow plays a role in spread of resistance to organophosphate insecticides in mosquito
-determine risk of extinction for isolate
history of natural selection
-conceptualized by Darwin and Wallace
-only evolutionary process that can lead to adaptation
-only process that Darwin thought drove evolution
fitness
the contribution of a genotype to the next generation
what are the components of fitness?
viability
mating success
fecundity
viability
ability to survive in the environment, part of one's fitness
mating success
ability to find a mate, part of one's fitness
fecundity
ability to produce viable gametes, part of one's fitness
absolute fitness (Wij)
average reproductive rate of individuals with a given genotype
given by the ratio of Nij after selection to Nij before selection
if Wij is >1, the genotype has ______ (increased/decreased) in number
increased
relative fitness (wij)
fitness value expressed relative to another genotype, the more important measure of fitness
-choose one genotype as the standard (Wstd) and designate it as w = 1
-scale w of other genotypes: wij = Wij/Wstd
peppered moths and natural selection
-one of the first empirical examples of natural selection in nature
-peppered moths studied by EB Ford, PM Sheppard, and HBD Kettlewell
-two forms: melanic and non melanic moths in GB
-melanic was black and had low numbers since non melanic would blend in
w11p^2: w122pq: w22q^2 meaning
the ratio of genotype frequencies among surviving adults
marginal fitness
allele fitness, the average fitness of genotypes containing a given allele
any allele will increase in frequency only if its marginal fitness is _____ (less than, greater than, equal to) the population average fitness
greater than
3 characteristics of natural selection
1. how natural selection acts depends on allele frequencies
2. natural selection is not necessarily "survival of the fittest"
3. natural selection is opportunistic
T/F: there is population structure in humans
true, the population structure is mostly correlated with geography
the human genome
23 chromosomes
~3 billion base pairs for haploid genome
~21000 genes
?w and ? for humans is around 0.001 (1 SNP every 1000 bp)
on avg, humans differ in about 3-4mil SNPs
humans at 2 pops may differ at ~300000 more SNPS
race
a social categorization of a quantitative phenotype
on average, humans differ in about ____ millions SNPs
3-4
humans at 2 pops may differ at ~_______
300,000 more SNPs
Angelica Dass "humanae" project
-people from different ethnic backgrounds can sometimes have the same pantone color
____ (few/many) affect skin color and not all groups carry same variants
many
there is _____ (more/less) variation within than between human pops
more
when does natural selection equilibrium occur?
when ?p = 0 and allele frequencies stop changing
can happen when...
p = 0
q = 0
heterozygote fitness is highest (w12 > w22 & w12 > w11)
heterozygote fitness is lowest (w12 < w22 & w12 < w11)
internal equilibrium
natural selection equilibrium in which some polymorphism is maintained in the population
*occurs when heterozygote has the highest or lowest fitness
local equilibrium
equilibrium that is approached only from certain region of parameter space
global equilibrium
equilibrium approached regardless of initial conditions
directional selection
also positive selection, occurs when natural selection favors one of the extreme variations of a trait
what kind of equilibrium occurs with directional selection
stable global equilibrium
overdominance
aka heterozygote superiority, an inheritance pattern in which a heterozygote is more vigorous than either of the corresponding homozygotes
what type of equilibrium occurs with overdominance?
global and stable
balanced polymorphism
situation in which selection maintains two or more phenotypes for a specific gene in a population
greater difference b/t w12 and w22 leads to higher equilibrium frequency of A1
example of balanced polymorphism in sickle cell
-A and S are a balanced polymorphism
-AS is only advantageous where malaria is common
Warfarin resistance in rats
-Warfarin prevents blood clotting by preventing restoration of vitamin K
-mutation in vitamin K epoxide reductase gene >> rats less sensitive, but increases vitamin K requirements
underdominance
aka heterozygote inferiority, selection in which the heterozygote has lower fitness than that of either homozygote
heterozygote inferiority in African butterfly
-the orange and blue homozygotes each resemble a local toxic species
-the heterozygote resembles nothing in particular and is attractive to predators
-the underdominant loci is usually observed in crosses b/t varieties. within one pop they are rapidly fix
outcrossing species
-tend to contain a lot of variation in form of rare, harmful, recessive alleles
-alleles that cause inbreeding depression
why doesn't natural selection usually eliminate rare, harmful recessive alleles from outcrossing species?
-the rare alleles are present mostly in heterozygotes
-
s
selection coefficient against homozygous recessive
h
degree of dominance for recessive allele
h = 0 means recessive
h = 1 means dominant
h = 0.5
h = 0 means ______ (recessive/dominant/semi-dominant)
recessive A2 allele
h = 1 means ______ (recessive/dominant/semi-dominant)
dominant A2 allele
h = 0.5 means ______ (recessive/dominant/semi-dominant)
semi-dominance in A2 allele
hs
strength of selection against heterozygotes
when h = 0, mutation-selection balance/equilibrium depends on ______ and _______
mutation rate and strength of selection against homozygotes (s)
when h > 0, mutation-selection balance/equilibrium depends on _______ and ________
mutation rate and strength of selection against heterozygotes (hs)
for a large population, the probability of fixation of a new favorable mutation depends primarily on the ______ rather than population size
strength of selection
for a small population, the probability of fixation depends primarily on the _______
population size
if 4Ns <<< 1, ________ is the most important aspect acting on a new mutation
genetic drift (pop size)
if 4Ns >>> 1, ______ is the most important process acting on new mutations
natural selection
some models for selection
sexual selection
frequency dependent selection
spatially heterogeneous environments
temporally heterogeneous environments
epistasis
kin selection
fecundity slection
meiotic drive
neutral theory
theory that suggests that most polymorphism at the molecular level is selectively neutral
most mutations do have an effect on fitness, but...
-deleterious one are rapidly eliminated
-advantageous should be rapidly fixed
-most are remaining mutations are n
polymorphism
most often used for genetic variation within populations or species
divergence
genetic differences among (between) species
substitution
a change of one amino acid or nucleotide by another
replacement
a change in amino acid
T/F: the number of differences b/t sequences are always the same as the number of substitutions/replacements
false, may or may not be
K
proportion of replacements that have occurred
Dt
proportion of amino acid differences at time t
the difference in magnitude between K and D becomes _____ (smaller/larger) with time
larger
why does the difference in size b/t K and D become larger with time?
-multiple replacements in the same site
assumptions for derivations of the relationship between D and K
-a constant substitution rate
-all amino acid replacements occur with equal likelihood (not always actually true)
is the assumption that all amino acid replacements occur with equal likelihood true? why or why not?
-no, some amino acid changes are more likely to occur than others
-a purine going to a purine or pyrimidine to pyrimidine is more likely than a purine going to a pyrimidine (or vice versa)
Jukes-Cantor model
simplest model for looking at rates of nucleotide substitution
-constant rate
-each nucleotide as likely to mutate to any other
Kimura 2-parameter model
model for looking at the rates of nucleotide substitution
-assumes there is a different rate of substitution for transitions (purine to purine or pyrimidine to pyrimidine) vs transversions (purine to pyrimidine and vice versa)
AA and nucleotide substitutions models allows us to...
-use observed differences to know how much evolutionary change (subs/replacements have occurred)
-use this info to figure out mutation rate for a given protein/gene
-figure out divergence time between two lineages
the molecular clock
model that uses DNA comparisons to estimate the length of time that two species have been evolving independently
-avg molecular evolution for any given gene is sometimes uniform for long periods of time
-the rate depends on the neutral mutation rate, whic
the rate of the molecular clock depends on...
the neutral mutation rate
what are issues with the neutral theory in regards to molecular data?
the molecular clock is overdispersed
-variance is larger than the mean
under neutrality H = ?/(1 + ?)
-but the relationship b/t H and N is poorly predicted by the neutral theory especially for a large N
there is a generation-time effect
over-dispersion of clock and the neutral theory
-the variance of clocks for various loci are larger than their mean
-in a perfect clock, the variance should equal the mean
-this is a property of the Poisson distribution
heterozygosity and population size and the neutral theory
-under neutrality, H = ?/(1 + ?)
-in the neutral theory, the relationship b/t H and N is poorly predicted, especially for a large N
the generation-time effect and the neutral theory
the neutral theory predicts a constant substitution rate when time is measured in generations, but the experimental data shows that the molecular clock rate is ~constant in years
Tomoko Ohta
-examined proportions of synonymous and non synonymous substitutions in DNA separately
-worked on neutral theory of evolution and developed the nearly neutral theory
nearly-neutral theory
hypothesis that says most mildly deleterious/advantageous substitutions (small s) to be effectively neutral. focuses on new mutations that are acted on by the combination of relatively weak natural selection and genetic drift.
-cases where s ~ or < 1/(2N)
overdispersed clock and nearly neutral theory
presence of slightly deleterious mutations can increase clock variance
-in small pops, these mutations behave as neutral and clocklike
-in large pops, they are selected against, departing from clocklike behavior
difference in behavior in different species
H and N relationship and nearly neutral theory
under neutrality, H = ?/(1 + ?) should approach 1 when N is large
-but, slightly deleterious mutations will be removed more efficiently in large pops
-reduces overall H in large pops
generation time effect and the nearly neutral theory
synonymous substitutions are effectively neutral, and behave as predicted by the neutral model
-clock on generation time scale, dependent only on neutral mutation rate > generation time effect seen
-species with shorter generation times have faster clock
synonymous substitution
a silent substitution, a change of one nucleotide in a sequence to another that does not lead to an amino acid change
nonsynonymous substitution
a change in a gene from one nucleotide to another that leads to an amino acid change
synonymous substitutions ____ (are/are not) effectively neutral and behave as predicted by the neutral model
are
species with shorter generation times have _____ (slower/faster) clock in years
faster
organisms with _____ (shorter/longer) generation time tend to be big and have smaller N
longer
organisms with ______ (shorter/longer) generation time tend to be small and have larger N
shorter
what are some ways we can test for selection?
comparing Ka to Ks
-usually b/t species
-usually Ks > Ka is expected (purifying selection)
signature of selective sweeps
-relies on polymorphism within species/pops
Fst outliers
-compares 2 pops for local adaptation
frequency of variants within population
purifying selection
selection against amino acid changes and deleterious changes
positive selection
natural selection that increases the frequency of a favorable allele, favors amino acid changes
Ka
nonsynonymous substitution rate
Ks
synonymous substitution rate
what type of selection occurs when Ks (synonymous) = Ka (non synonymous)
strict neutrality, no selection
what type of selections occurs when Ks (synonymous) > Ka (non synonymous)
purifying selection (against AA changes)
what type of selection occurs when Ks (synonymous) < Ka (non synonymous)
positive selection (favors AA changes), most genes have this
lysozyme
enzyme that cleaves eubacterial cell walls
found in saliva, serum, tears, mammalian milk
defense against bacterial invasion
found in GI tract of foregut fermenters, used to free nutrients from bacterial cell walls
compared to non fermenters, foregut fermenters have a ____ (high/low) Ka/Ks
high
dN/dS comparisons in chimps and humans
733 genes had dN/dS > 1
-genes involved in olfaction, immune defenses, tumor suppression, apoptosis, X-linked genes, spermatogenesis
brain genes had dN/dS < 1
-very conserved among chimps and humans
genetic hitchhiking
the process by which an allele is able to "ride along" with a nearby favorable allele/selected mutation to which it is physically linked and thus increase in frequency, occurs during a selective sweep
**the linked alleles can be neutral or even deleteriou
how can you tell genetic hitchhiking has occurred?
-loss of variation at selected locus
-loss of variation at linked loci
-a large genomic area has lost variation
selective sweep
recent, rapid spread of a favorable mutation
how can genetic variation be introduced after/during a sweep?
mutation or recombination, but it takes time
with selective sweeps, genetic variation _____ (increases/decreases) with distance from the selected locus
increases
amylase
enzyme that breaks down starch
amylases in humans
-amylases are expressed in saliva and pancreas by genes of the AMY family
-AMY genes occur in a 200 kb region on chromosome 1, but humans vary in number of AMY genes
-many more AMY genes in humans than other primates
-also there are many more AMY genes in
selective sweep in humans for AMY locus
-there is low variation in area around AMY locus across all humans
-this is likely due to a selective sweep in humans after a split from the Neanderthals
-selection was before agriculture, but could be tied to when we started cooking/grinding/processing f
Fst for measuring selection
-compares polymorphism across pops
-Fst can vary across loci in the genome b/c all genes in the genome will experience gene flow the same way (reduces Fst) while natural selection just affects the genes and linked loci that underlie local adaptation
-popu
mouse coat color Fst example
-the Fst for the gene responsible for mouse coat color was high b/c there was selection for different colors in different pops
-the mt gene had no differentiation, and therefore it had no relationship with mouse coat color
using Fst to determine adaptive loci in Tibetans
Tibetans live at 3500-4000m and have the ability to use smaller quantities of oxygen more efficiently
procedure to find the loci responsible...
-compare Tibetans and Hans Chinese b/c they split ~3000-5000 years ago
-scan entire genome of 40 people from ea
EPAS1 gene
a gene that regulates aerobic and anaerobic metabolism and allows hemoglobin to carry more oxygen
**found in 87% of Tibetans but 9% of Han Chinese studied
using frequency of variants within populations
use nucleotide diversity (pi) and nucleotide polymorphism (theta W) to test for neutrality via Tajima's D test
D ~ 0: no selection/neutrality (drift is acting, not selection)
D > 0: balancing selection,two variants are being favored at a locus
D < 0: posi
?
nucleotide diversity, tells us how different a pair of sequences are (on average)
*affected by polymorphism (SNP) frequency (SNPs at low freq gives smaller value)
?w
nucleotide polymorphism, tells us how polymorphic a sequence is
*indifferent to allele frequencies
Tajima's D test
-can be used to tell us the selection that is going on using ? and ?w
-when D ~ 0: neutrality
-when D < 0: directional selection, pop growth (many rare SNPs)
-when D > 0: balancing selection, admixture (many intermediate-frequency SNPs)
what type of selection is going on when Tajima's D ~ 0?
neutrality
-a mix of rare and common SNPs occur in the pop
-older alleles tend to be at higher frequency, younger at lower frequency
-drift is acting!
what type of selection is going on when Tajima's D < 0?
directional selection
-many low frequency SNPs are in the population (usually young)
-usually b/c an allele is favored, and only recently has variation started to accumulate again (positive selection)
*could be due to population growth sometimes too
what type of selection is going on when Tajima's D > 0?
balancing selection
-many intermediate-frequency alleles in population
-usually means that two types of variants are being favored at locus
*could be due to admixture sometimes too
tb1 branching gene in maize
reduced branching in maize due to increased apical dominance
-main stem inhibits growth of lateral meristems, resulting in more seeds in fewer branches
-caused by gene tb1, which is a transcription factor. it's D value is -2.14, which means this is under
tb1 gene
a transcription factor (controls expression of others) that causes the reduced branching in maize
the D value is -2.41, indication that this gene is under directional/positive selection