Background

Hardy-Weinberg Equilibrium (HWE) describes the expected genotype frequencies in an idealized, non-evolving population. For a locus with two alleles A (frequency \(p\)) and a (frequency \(q = 1-p\)), the expected genotype frequencies are:

\[P(AA) = p^2, \quad P(Aa) = 2pq, \quad P(aa) = q^2\]

We’ll test whether the PGM (phosphoglucomutase) locus in Daphnia obtusa conforms to HWE, using data from Spitze 1993. The PGM locus has two alleles: M (“medium”) and S (“slow”).

1. Visualizing Hardy-Weinberg Proportions

First, let’s see how genotype frequencies change as a function of allele frequency \(p\):

curve(x^2, lwd = 2, col = "darkgreen", xlab = "p (frequency of A)", 
      ylab = "Genotype frequency", main = "Hardy-Weinberg Expectations")
curve((1-x)^2, lwd = 2, col = "darkred", add = TRUE)
curve(2*x*(1-x), lwd = 2, col = "darkblue", add = TRUE)
legend("top", legend = c("AA", "Aa", "aa"), 
       col = c("darkgreen", "darkblue", "darkred"), lwd = 2, horiz = TRUE)

The key insight: heterozygotes (blue) are maximized at \(p = 0.5\), while homozygotes dominate at extreme allele frequencies.

2. Simulating Random Mating

Under HWE assumptions, offspring genotypes arise from random union of gametes. We can simulate this by:

Creating a pool of diploid males, each carrying two alleles sampled according to frequency \(p\)
Creating a pool of diploid females the same way
Each parent contributes one randomly chosen allele to offspring

p <- 0.8  # frequency of A allele
N <- 100  # population size

# Each row is an individual with 2 alleles
Males <- matrix(sample(c("A","a"), size = N*2, 
                       replace = TRUE, prob = c(p, 1-p)), nrow = N)

Females <- matrix(sample(c("A","a"), size = N*2, 
                         replace = TRUE, prob = c(p, 1-p)), nrow = N)

# Each parent contributes one allele (randomly from their two)
NextGen <- cbind(apply(Males, 1, sample, 1), 
                 apply(Females, 1, sample, 1))

# Sort alleles so "aA" and "Aa" both become "aA"
genotypes <- c("aa", "aA", "AA")

f.genotype <- apply(NextGen, 1, sort) |> 
  apply(2, paste, collapse = "") |> 
  factor(levels = genotypes) |> 
  table() |> prop.table()

f.genotype

## 
##   aa   aA   AA 
## 0.02 0.28 0.70

3. Creating a Simulation Function

We wrap this into a reusable function:

simFrequencies <- function(p, n = 100){
  Males <- matrix(sample(c("A","a"), size = n*2, 
                         replace = TRUE, prob = c(p, 1-p)), nrow = n)
  
  Females <- matrix(sample(c("A","a"), size = n*2, 
                           replace = TRUE, prob = c(p, 1-p)), nrow = n)
  
  NextGen <- cbind(apply(Males, 1, sample, 1), 
                   apply(Females, 1, sample, 1))
  
  genotypes <- c("aa", "aA", "AA")
  
  f.genotype <- apply(NextGen, 1, sort) |> 
    apply(2, paste, collapse = "") |> 
    factor(levels = genotypes) |> 
    table() |> prop.table()
  
  return(f.genotype)
}

Test it at the same frequency, to see sampling variation:

simFrequencies(0.2, 100)

## 
##   aa   aA   AA 
## 0.60 0.35 0.05

simFrequencies(0.2, 100)

## 
##   aa   aA   AA 
## 0.69 0.28 0.03

simFrequencies(0.2, 100)

## 
##   aa   aA   AA 
## 0.56 0.34 0.10

And at a different frequency:

simFrequencies(0.8, 100)

## 
##   aa   aA   AA 
## 0.01 0.30 0.69

simFrequencies(0.8, 100)

## 
##   aa   aA   AA 
## 0.02 0.40 0.58

simFrequencies(0.8, 100)

## 
##   aa   aA   AA 
## 0.04 0.30 0.66

Note: It is not important (for now) to understand how the function works - just that you enter the allele frequency and sample size, and that it returns genome frequencies.

4. Visualizing Sampling Variation

Even when HWE holds exactly, finite samples show variation around expected values. Let’s simulate across many allele frequencies and overlay the results on the theoretical curves:

cols <- c(AA = "darkgreen", aA = "darkblue", aa = "darkred")

curve(x^2, lwd = 2, col = "darkgreen", xlab = "p", 
      ylab = "Genotype frequency", main = "HWE with Sampling Variation (N=100)")
curve((1-x)^2, lwd = 2, col = "darkred", add = TRUE)
curve(2*x*(1-x), lwd = 2, col = "darkblue", add = TRUE)

for(p in seq(0, 1, 0.01)){
  f.obs <- simFrequencies(p, n = 100)
  points(rep(p, length(f.obs)), f.obs, col = cols[names(f.obs)], pch = 19, cex = 0.5)
}

The scatter around the curves shows sampling variance — this is what we expect even when HWE is true.

5. Testing HWE with Daphnia PGM Data

Now we apply this to real data. From Spitze (1993), the PGM locus genotype counts in Daphnia obtusa were:

Genotype	Count
MM	57
MS	53
SS	18

At the PGM locus: M = “medium”, S = “slow”.
Using our notation: A = M (medium), a = S (slow)

c_aa <- 18   # SS
c_aA <- 53   # MS
c_AA <- 57   # MM

N <- c_aa + c_aA + c_AA
freq.obs <- c(aa = c_aa, aA = c_aA, AA = c_AA) / N

# Calculate allele frequencies from the data
p <- (c_AA + c_aA/2) / N  # frequency of M allele
q <- (c_aa + c_aA/2) / N  # frequency of S allele

note, of course, these sum to 1

## [1] 0.6523438

## [1] 0.3476562

p+q

## [1] 1

Observed vs. Expected

To test whether a population conforms to the Hardy-Weinburg null model, we need to compare observations to a null prediction. This is at the heart of all statistical tests. We gather those as follows:

counts.obs <- c(aa = c_aa, aA = c_aA, AA = c_AA)
counts.exp <- N * c(aa = q^2, aA = 2*p*q, AA = p^2)

rbind(Observed = counts.obs, Expected = round(counts.exp, 1))

##            aa   aA   AA
## Observed 18.0 53.0 57.0
## Expected 15.5 58.1 54.5

The classical approach uses a chi-squared goodness-of-fit test. The test statistic quantifies how far observed counts deviate from expected counts, scaled by the expected values:

\[\chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i}\]

We can compute this directly from our data:

chi_sq_test <- sum((counts.obs - counts.exp)^2 / counts.exp)
chi_sq_test

## [1] 0.9717097

The scaling by \(E_i\) in the denominator matters: a deviation of 5 individuals is more surprising when you expected 10 than when you expected 100. The squaring ensures positive and negative deviations don’t cancel out.

Under the null hypothesis - and at the limit of very large samples - this statistic approximately follows a \(\chi^2\) distribution. I.e., if the number is very large, it is very unlikely. The degrees of freedom equal the number of categories minus 1, minus the number of parameters estimated from the data. Here: 3 genotypes − 1 − 1 (we estimated \(p\) from the data) = 1 df. We can see how “unlikely” this statistic is by drawing a chi squared distribution:

curve(dchisq(x,2), xlim = c(0,4))
abline(v = chi_sq_test, col = 2, lwd = 2)

It is not very extreme. A measure of that not very extremeness is the p-value, which is the area under that curve to the right of the test statistic:

1- pchisq(chi_sq_test, 2)

## [1] 0.6151711

All of this, of course, is done more or less immediately with an R function:

chisq.test(counts.obs, p = c(q^2, 2*p*q, p^2))

## 
##  Chi-squared test for given probabilities
## 
## data:  counts.obs
## X-squared = 0.97171, df = 2, p-value = 0.6152

So - clearly - a highly non-significant devaition.

This is a straightforward, simple, robust, and very common approach. However, it does rely on some assumptions and convoluted reasoning and the blank stare when you present to someone the very concept of a test-statistic. In a modern age where computations are trivial, we might try a more intuitive approach.

Randomization Test of Heterozygote Deviation

Since we can simply simulate this process directly from a population of the correct size with the correct allele frequencies, we can also ask directly: is the observed proportion of heterozygotes in excess or in defecit of the Hardy-Weinberg expectations? The steps are to:

Genearte a null distribution by simulation
Compare the observed values directly against the null.

We generate a null distribution by simulation. This step will take a (noticable) few seconds:

nsims <- 1000
genotype_sims <- matrix(NA, nrow = nsims, ncol = 3)
colnames(genotype_sims) <- c("aa", "aA", "AA")

for(i in 1:nsims)
  genotype_sims[i,] <- simFrequencies(p, N)

Note what we computed here:

head(genotype_sims)

##             aa        aA        AA
## [1,] 0.1250000 0.4609375 0.4140625
## [2,] 0.1328125 0.4843750 0.3828125
## [3,] 0.0937500 0.4296875 0.4765625
## [4,] 0.1640625 0.4296875 0.4062500
## [5,] 0.1250000 0.3906250 0.4843750
## [6,] 0.1093750 0.4531250 0.4375000

A bunch of sets (1000) of proportions. We can look at those distributions against our observations as follows.

par(mfrow = c(1, 3))

hist(genotype_sims[,"aa"], main = "aa (SS) null", 
     xlab = "Frequency", border = "gray")
abline(v = freq.obs["aa"], lwd = 2, col = "red")

hist(genotype_sims[,"aA"], main = "aA (MS) null", 
     xlab = "Frequency", border = "gray")
abline(v = freq.obs["aA"], lwd = 2, col = "red")

hist(genotype_sims[,"AA"], main = "AA (MM) null", 
     xlab = "Frequency", border = "gray80")
abline(v = freq.obs["AA"], lwd = 2, col = "red")

Red lines show observed values. Since they are not too “extreme” - most likely this is a totally reasonable draw from Hardy-Weinberg proportions.

Computing empirical P-values

For the randomization test we directly ask: how often do simulated heterozygote frequencies fall below the observed value (suggesting heterozygote deficit)?

# Two-tailed p-value for heterozygotes
(p_value_hetero <- sum(genotype_sims[,"aA"] < freq.obs["aA"]) / nsims)  * 2

## [1] 0.324

TRICKY EXERCISE: think a bit about why this works … why sum? why / nsims? why the <? why *2? This is correct - but it only works because we can see that the observed frequency is slightly lower than the mean. How would you compute a p-value if you didn’t know whether it was greater or smaller than that mean?

Equivalently, we can test for homozygote excess (which is the inverse):

homo_null <- genotype_sims[,"aa"] + genotype_sims[,"AA"]
homo_obs <- freq.obs["aa"] + freq.obs["AA"]

par(mfrow = c(1, 1))
hist(homo_null, main = "Total Homozygotes Null Distribution", 
     xlab = "Frequency (aa + AA)", border = "darkgrey")
abline(v = homo_obs, lwd = 2, col = "red")

(p_value_homo <- sum(homo_null > homo_obs) / nsims * 2)

## [1] 0.324

These two p-values should be equivalent — heterozygote deficit is homozygote excess.

Interpretation

Both approaches test the same hypothesis. The randomization test makes fewer assumptions about the sampling distribution and directly uses simulation to generate the null. The chi-squared test assumes the test statistic follows a \(\chi^2\) distribution (valid for large samples).

For this dataset, the observed genotype frequencies are consistent with Hardy-Weinberg expectations — we find no significant evidence of heterozygote deficit or homozygote excess at the PGM locus.

Homework Assignment

For these assignments - we do NOT want to see code! Just provide the answers - including figures - in a document.

In the same study, Spitz 1993, also studied the PGI locus, with two alleles: S+ and S-, with these these ratios:

Allele | Count
----------------
SS     | 42
SS-    | 48
S-S-   | 38

What are the expected Hardy Weinberg proportions of the alleles
What are the expected counts of the genotypes?
Perform a chi-squared test of consistency with the expected proportions. Visualize that test along the lines of the code above.
Perform a randomization test of heterozygote excess or defecit. Visualize that test along the lines of the code above.

Discuss (briefly).

Extra credit - pure algebrea: Under HWE, at what values of p does the number of heterozygotes exceed the number of both homozygotes? At what values of p does the number of exceed TOTAL number of homozygotes?

Hardy-Weinberg Equilibrium

EFB 696 - Population Ecology

2026-01-15