Pop Gen Final

population structure (population subdivision)

when many populations are grouped into smaller sub-pops where mating primarily occurs

Fis

average deviation in heterozygosity within subpops
can be due to selection, drift, mating system, etc
can be considered an "avg" of inbreeding coefficients
= Hs - Hi / Hs

Fst

the average deviation in heterozygosity in subpops relative to the total
varies b/t 0 and 1
**can determine if there is subdivision in a pop since its due to only the effect of subdivision. it is influenced by anything that affects migration b/t subpops.

Fit

the average deviation in heterozygosity in the total population (individual relative to total)
varies between -1 and 1
**due to all effects
= Ht - Hi / Ht

how do you interpret the measures of Fst?

if Fst = 0.09, we can say that 9% of the total genetic variation is attributable to genetic differences between subpopulations (population structure)

what influences Fst values estimated from a single locus?

-chance sampling
-selection
-close linkage to a selected locus

can you get better Fst estimates from one or multiple loci? why?

better estimates are obtained by averaging over many loci. this is because Fst values estimated from a single locus is influenced by chance sampling, selection, and close linkage to a selected locus.
However, we expect all of multiple loci to be affected

what methods are used to look at population structure?

-algorithm STRUCTURE
-visualize population structure
-principal components analysis (PCA)

STRUCTURE

an algorithm that aims to identify hidden structure in data by taking many unlinked markers (SNPs, microsatellites) and groups individual genotypes into random/hidden possible populations
-estimate the K of hidden pop allele frequencies
-initially make a

visualizing population structure

a recent method to look at population structure through visualization
-vertical lines represent individuals
-inferred pops are each a different color
-some indvs may have more than one color, this is to express that they could be from 1 of those pops (adm

PCA (principal components analysis)

a way to look at population structure by comparing variation among SNPs. a data analysis that looks for correlation among SNPs and shows how similar SNPs are. dots closer together are more similar

drift causes populations to....
while gene flow....

drift causes pops diverge and increases ibd within populations
gene flow prevents divergence and decreases ibd within populations

drift migration equilibrium Fst

Feq = 1 / 1 + 4Nm
decreases as migrant number increases
-decrease is very rapid
-5 migrants/ generation is enough to ensure that subpops are almost identical

a _____ (small / large) amount of gene flow is necessary to make 2 pops behave as a single evolutionary lineage

small
**dependent on the effective number of migrants (Nm)

Nm

the absolute number of migrant organisms coming into each subpop each generation, the effective number of migrants

why is differentiation between 2 subpops dependent only on effective number of migrants...?

-opposing forces of drift and migration
-drift/divergence is slow in large populations (big N) , so small amounts of gene flow (m) counterbalances divergence
-drift is stronger in small populations (small N) and larger rates of gene flow (m) are needed to

drift-migration equibibrium heterozygosity (Heq)

0

what are the effects of gene flow?

brings new alleles into a population
-has a similar effect to mutation
-increases variation within a population
-the rate of gene flow is much greater than the rate of mutation
decreases variation between populations

why should we estimate lvls of gene flow? what can we apply it to?

-help us understand the possible effects of releasing GMOs and the risk of passing modified genes to natural/non-GMO pops
-gene flow plays a role in spread of resistance to organophosphate insecticides in mosquito
-determine risk of extinction for isolate

history of natural selection

-conceptualized by Darwin and Wallace
-only evolutionary process that can lead to adaptation
-only process that Darwin thought drove evolution

fitness

the contribution of a genotype to the next generation

what are the components of fitness?

viability
mating success
fecundity

viability

ability to survive in the environment, part of one's fitness

mating success

ability to find a mate, part of one's fitness

fecundity

ability to produce viable gametes, part of one's fitness

absolute fitness (Wij)

average reproductive rate of individuals with a given genotype
given by the ratio of Nij after selection to Nij before selection

if Wij is >1, the genotype has ______ (increased/decreased) in number

increased

relative fitness (wij)

fitness value expressed relative to another genotype, the more important measure of fitness
-choose one genotype as the standard (Wstd) and designate it as w = 1
-scale w of other genotypes: wij = Wij/Wstd

peppered moths and natural selection

-one of the first empirical examples of natural selection in nature
-peppered moths studied by EB Ford, PM Sheppard, and HBD Kettlewell
-two forms: melanic and non melanic moths in GB
-melanic was black and had low numbers since non melanic would blend in

w11p^2: w122pq: w22q^2 meaning

the ratio of genotype frequencies among surviving adults

marginal fitness

allele fitness, the average fitness of genotypes containing a given allele

any allele will increase in frequency only if its marginal fitness is _____ (less than, greater than, equal to) the population average fitness

greater than

3 characteristics of natural selection

1. how natural selection acts depends on allele frequencies
2. natural selection is not necessarily "survival of the fittest"
3. natural selection is opportunistic

T/F: there is population structure in humans

true, the population structure is mostly correlated with geography

the human genome

23 chromosomes
~3 billion base pairs for haploid genome
~21000 genes
?w and ? for humans is around 0.001 (1 SNP every 1000 bp)
on avg, humans differ in about 3-4mil SNPs
humans at 2 pops may differ at ~300000 more SNPS

race

a social categorization of a quantitative phenotype

on average, humans differ in about ____ millions SNPs

3-4

humans at 2 pops may differ at ~_______

300,000 more SNPs

Angelica Dass "humanae" project

-people from different ethnic backgrounds can sometimes have the same pantone color

____ (few/many) affect skin color and not all groups carry same variants

many

there is _____ (more/less) variation within than between human pops

more

when does natural selection equilibrium occur?

when ?p = 0 and allele frequencies stop changing
can happen when...
p = 0
q = 0
heterozygote fitness is highest (w12 > w22 & w12 > w11)
heterozygote fitness is lowest (w12 < w22 & w12 < w11)

internal equilibrium

natural selection equilibrium in which some polymorphism is maintained in the population
*occurs when heterozygote has the highest or lowest fitness

local equilibrium

equilibrium that is approached only from certain region of parameter space

global equilibrium

equilibrium approached regardless of initial conditions

directional selection

also positive selection, occurs when natural selection favors one of the extreme variations of a trait

what kind of equilibrium occurs with directional selection

stable global equilibrium

overdominance

aka heterozygote superiority, an inheritance pattern in which a heterozygote is more vigorous than either of the corresponding homozygotes

what type of equilibrium occurs with overdominance?

global and stable

balanced polymorphism

situation in which selection maintains two or more phenotypes for a specific gene in a population
greater difference b/t w12 and w22 leads to higher equilibrium frequency of A1

example of balanced polymorphism in sickle cell

-A and S are a balanced polymorphism
-AS is only advantageous where malaria is common

Warfarin resistance in rats

-Warfarin prevents blood clotting by preventing restoration of vitamin K
-mutation in vitamin K epoxide reductase gene >> rats less sensitive, but increases vitamin K requirements

underdominance

aka heterozygote inferiority, selection in which the heterozygote has lower fitness than that of either homozygote

heterozygote inferiority in African butterfly

-the orange and blue homozygotes each resemble a local toxic species
-the heterozygote resembles nothing in particular and is attractive to predators
-the underdominant loci is usually observed in crosses b/t varieties. within one pop they are rapidly fix

outcrossing species

-tend to contain a lot of variation in form of rare, harmful, recessive alleles
-alleles that cause inbreeding depression

why doesn't natural selection usually eliminate rare, harmful recessive alleles from outcrossing species?

-the rare alleles are present mostly in heterozygotes
-

s

selection coefficient against homozygous recessive

h

degree of dominance for recessive allele
h = 0 means recessive
h = 1 means dominant
h = 0.5

h = 0 means ______ (recessive/dominant/semi-dominant)

recessive A2 allele

h = 1 means ______ (recessive/dominant/semi-dominant)

dominant A2 allele

h = 0.5 means ______ (recessive/dominant/semi-dominant)

semi-dominance in A2 allele

hs

strength of selection against heterozygotes

when h = 0, mutation-selection balance/equilibrium depends on ______ and _______

mutation rate and strength of selection against homozygotes (s)

when h > 0, mutation-selection balance/equilibrium depends on _______ and ________

mutation rate and strength of selection against heterozygotes (hs)

for a large population, the probability of fixation of a new favorable mutation depends primarily on the ______ rather than population size

strength of selection

for a small population, the probability of fixation depends primarily on the _______

population size

if 4Ns <<< 1, ________ is the most important aspect acting on a new mutation

genetic drift (pop size)

if 4Ns >>> 1, ______ is the most important process acting on new mutations

natural selection

some models for selection

sexual selection
frequency dependent selection
spatially heterogeneous environments
temporally heterogeneous environments
epistasis
kin selection
fecundity slection
meiotic drive

neutral theory

theory that suggests that most polymorphism at the molecular level is selectively neutral
most mutations do have an effect on fitness, but...
-deleterious one are rapidly eliminated
-advantageous should be rapidly fixed
-most are remaining mutations are n

polymorphism

most often used for genetic variation within populations or species

divergence

genetic differences among (between) species

substitution

a change of one amino acid or nucleotide by another

replacement

a change in amino acid

T/F: the number of differences b/t sequences are always the same as the number of substitutions/replacements

false, may or may not be

K

proportion of replacements that have occurred

Dt

proportion of amino acid differences at time t

the difference in magnitude between K and D becomes _____ (smaller/larger) with time

larger

why does the difference in size b/t K and D become larger with time?

-multiple replacements in the same site

assumptions for derivations of the relationship between D and K

-a constant substitution rate
-all amino acid replacements occur with equal likelihood (not always actually true)

is the assumption that all amino acid replacements occur with equal likelihood true? why or why not?

-no, some amino acid changes are more likely to occur than others
-a purine going to a purine or pyrimidine to pyrimidine is more likely than a purine going to a pyrimidine (or vice versa)

Jukes-Cantor model

simplest model for looking at rates of nucleotide substitution
-constant rate
-each nucleotide as likely to mutate to any other

Kimura 2-parameter model

model for looking at the rates of nucleotide substitution
-assumes there is a different rate of substitution for transitions (purine to purine or pyrimidine to pyrimidine) vs transversions (purine to pyrimidine and vice versa)

AA and nucleotide substitutions models allows us to...

-use observed differences to know how much evolutionary change (subs/replacements have occurred)
-use this info to figure out mutation rate for a given protein/gene
-figure out divergence time between two lineages

the molecular clock

model that uses DNA comparisons to estimate the length of time that two species have been evolving independently
-avg molecular evolution for any given gene is sometimes uniform for long periods of time
-the rate depends on the neutral mutation rate, whic

the rate of the molecular clock depends on...

the neutral mutation rate

what are issues with the neutral theory in regards to molecular data?

the molecular clock is overdispersed
-variance is larger than the mean
under neutrality H = ?/(1 + ?)
-but the relationship b/t H and N is poorly predicted by the neutral theory especially for a large N
there is a generation-time effect

over-dispersion of clock and the neutral theory

-the variance of clocks for various loci are larger than their mean
-in a perfect clock, the variance should equal the mean
-this is a property of the Poisson distribution

heterozygosity and population size and the neutral theory

-under neutrality, H = ?/(1 + ?)
-in the neutral theory, the relationship b/t H and N is poorly predicted, especially for a large N

the generation-time effect and the neutral theory

the neutral theory predicts a constant substitution rate when time is measured in generations, but the experimental data shows that the molecular clock rate is ~constant in years

Tomoko Ohta

-examined proportions of synonymous and non synonymous substitutions in DNA separately
-worked on neutral theory of evolution and developed the nearly neutral theory

nearly-neutral theory

hypothesis that says most mildly deleterious/advantageous substitutions (small s) to be effectively neutral. focuses on new mutations that are acted on by the combination of relatively weak natural selection and genetic drift.
-cases where s ~ or < 1/(2N)

overdispersed clock and nearly neutral theory

presence of slightly deleterious mutations can increase clock variance
-in small pops, these mutations behave as neutral and clocklike
-in large pops, they are selected against, departing from clocklike behavior
difference in behavior in different species

H and N relationship and nearly neutral theory

under neutrality, H = ?/(1 + ?) should approach 1 when N is large
-but, slightly deleterious mutations will be removed more efficiently in large pops
-reduces overall H in large pops

generation time effect and the nearly neutral theory

synonymous substitutions are effectively neutral, and behave as predicted by the neutral model
-clock on generation time scale, dependent only on neutral mutation rate > generation time effect seen
-species with shorter generation times have faster clock

synonymous substitution

a silent substitution, a change of one nucleotide in a sequence to another that does not lead to an amino acid change

nonsynonymous substitution

a change in a gene from one nucleotide to another that leads to an amino acid change

synonymous substitutions ____ (are/are not) effectively neutral and behave as predicted by the neutral model

are

species with shorter generation times have _____ (slower/faster) clock in years

faster

organisms with _____ (shorter/longer) generation time tend to be big and have smaller N

longer

organisms with ______ (shorter/longer) generation time tend to be small and have larger N

shorter

what are some ways we can test for selection?

comparing Ka to Ks
-usually b/t species
-usually Ks > Ka is expected (purifying selection)
signature of selective sweeps
-relies on polymorphism within species/pops
Fst outliers
-compares 2 pops for local adaptation
frequency of variants within population

purifying selection

selection against amino acid changes and deleterious changes

positive selection

natural selection that increases the frequency of a favorable allele, favors amino acid changes

Ka

nonsynonymous substitution rate

Ks

synonymous substitution rate

what type of selection occurs when Ks (synonymous) = Ka (non synonymous)

strict neutrality, no selection

what type of selections occurs when Ks (synonymous) > Ka (non synonymous)

purifying selection (against AA changes)

what type of selection occurs when Ks (synonymous) < Ka (non synonymous)

positive selection (favors AA changes), most genes have this

lysozyme

enzyme that cleaves eubacterial cell walls
found in saliva, serum, tears, mammalian milk
defense against bacterial invasion
found in GI tract of foregut fermenters, used to free nutrients from bacterial cell walls

compared to non fermenters, foregut fermenters have a ____ (high/low) Ka/Ks

high

dN/dS comparisons in chimps and humans

733 genes had dN/dS > 1
-genes involved in olfaction, immune defenses, tumor suppression, apoptosis, X-linked genes, spermatogenesis
brain genes had dN/dS < 1
-very conserved among chimps and humans

genetic hitchhiking

the process by which an allele is able to "ride along" with a nearby favorable allele/selected mutation to which it is physically linked and thus increase in frequency, occurs during a selective sweep
**the linked alleles can be neutral or even deleteriou

how can you tell genetic hitchhiking has occurred?

-loss of variation at selected locus
-loss of variation at linked loci
-a large genomic area has lost variation

selective sweep

recent, rapid spread of a favorable mutation

how can genetic variation be introduced after/during a sweep?

mutation or recombination, but it takes time

with selective sweeps, genetic variation _____ (increases/decreases) with distance from the selected locus

increases

amylase

enzyme that breaks down starch

amylases in humans

-amylases are expressed in saliva and pancreas by genes of the AMY family
-AMY genes occur in a 200 kb region on chromosome 1, but humans vary in number of AMY genes
-many more AMY genes in humans than other primates
-also there are many more AMY genes in

selective sweep in humans for AMY locus

-there is low variation in area around AMY locus across all humans
-this is likely due to a selective sweep in humans after a split from the Neanderthals
-selection was before agriculture, but could be tied to when we started cooking/grinding/processing f

Fst for measuring selection

-compares polymorphism across pops
-Fst can vary across loci in the genome b/c all genes in the genome will experience gene flow the same way (reduces Fst) while natural selection just affects the genes and linked loci that underlie local adaptation
-popu

mouse coat color Fst example

-the Fst for the gene responsible for mouse coat color was high b/c there was selection for different colors in different pops
-the mt gene had no differentiation, and therefore it had no relationship with mouse coat color

using Fst to determine adaptive loci in Tibetans

Tibetans live at 3500-4000m and have the ability to use smaller quantities of oxygen more efficiently
procedure to find the loci responsible...
-compare Tibetans and Hans Chinese b/c they split ~3000-5000 years ago
-scan entire genome of 40 people from ea

EPAS1 gene

a gene that regulates aerobic and anaerobic metabolism and allows hemoglobin to carry more oxygen
**found in 87% of Tibetans but 9% of Han Chinese studied

using frequency of variants within populations

use nucleotide diversity (pi) and nucleotide polymorphism (theta W) to test for neutrality via Tajima's D test
D ~ 0: no selection/neutrality (drift is acting, not selection)
D > 0: balancing selection,two variants are being favored at a locus
D < 0: posi

?

nucleotide diversity, tells us how different a pair of sequences are (on average)
*affected by polymorphism (SNP) frequency (SNPs at low freq gives smaller value)

?w

nucleotide polymorphism, tells us how polymorphic a sequence is
*indifferent to allele frequencies

Tajima's D test

-can be used to tell us the selection that is going on using ? and ?w
-when D ~ 0: neutrality
-when D < 0: directional selection, pop growth (many rare SNPs)
-when D > 0: balancing selection, admixture (many intermediate-frequency SNPs)

what type of selection is going on when Tajima's D ~ 0?

neutrality
-a mix of rare and common SNPs occur in the pop
-older alleles tend to be at higher frequency, younger at lower frequency
-drift is acting!

what type of selection is going on when Tajima's D < 0?

directional selection
-many low frequency SNPs are in the population (usually young)
-usually b/c an allele is favored, and only recently has variation started to accumulate again (positive selection)
*could be due to population growth sometimes too

what type of selection is going on when Tajima's D > 0?

balancing selection
-many intermediate-frequency alleles in population
-usually means that two types of variants are being favored at locus
*could be due to admixture sometimes too

tb1 branching gene in maize

reduced branching in maize due to increased apical dominance
-main stem inhibits growth of lateral meristems, resulting in more seeds in fewer branches
-caused by gene tb1, which is a transcription factor. it's D value is -2.14, which means this is under

tb1 gene

a transcription factor (controls expression of others) that causes the reduced branching in maize
the D value is -2.41, indication that this gene is under directional/positive selection