Time to coalescence
Consider a single gene locus sampled from two haploid individuals in a population. The ancestry of this sample is traced backwards in time to the point where these two lineages coalesce in their most recent common ancestor (MRCA). Coalescent theory seeks to estimate the expectation of this time period and its variance.
The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parental DNA sequence. In a population with a constant effective population size with 2Ne copies of each locus, there are 2Ne "potential parents" in the previous generation. Under a random mating model, the probability that two alleles originate from the same parental copy is thus 1/(2Ne) and, correspondingly, the probability that they do not coalesce is 1 − 1/(2Ne).
At each successive preceding generation, the probability of coalescence is geometrically distributed—that is, it is the probability of noncoalescence at the t − 1 preceding generations multiplied by the probability of coalescence at the generation of interest:
For sufficiently large values of Ne, this distribution is well approximated by the continuously defined exponential distribution
This is mathematically convenient, as the standard exponential distribution has both the expected value and the standard deviation equal to 2Ne. Therefore, although the expected time to coalescence is 2Ne, actual coalescence times have a wide range of variation. Note that coalescent time is the number of preceding generations where the coalescence took place and not calendar time, though an estimation of the latter can be made multiplying 2Ne with the average time between generations. The above calculations apply equally to a diploid population of effective size Ne (in other words, for a non-recombining segment of DNA, each chromosome can be treated as equivalent to an independent haploid individual; in the absence of inbreeding, sister chromosomes in a single individual are no more closely related than two chromosomes randomly sampled from the population). Some effectively haploid DNA elements, such as mitochondrial DNA, however, are only carried by one sex, and therefore have one quarter the effective size of the equivalent diploid population (Ne/2)
Coalescent theory can also be used to model the amount of variation in DNA sequences expected from genetic drift and mutation. This value is termed the mean heterozygosity, represented as . Mean heterozygosity is calculated as the probability of a mutation occurring at a given generation divided by the probability of any "event" at that generation (either a mutation or a coalescence). The probability that the event is a mutation is the probability of a mutation in either of the two lineages: . Thus the mean heterozygosity is equal to
For , the vast majority of allele pairs have at least one difference in nucleotide sequence.