DNA Mutation simulations

Iain Kennedy Copyright 2009

The spreadsheet below is a simple and crude illustration of how markers mutate, using two columns of data labelled 'A-DYS392' and 'B-DYS392'. By modifying the mutation rate in the marked cell and pressing F9 (Calculate), you can see the random effects of time on the marker values.

What do the two cols represent? It could be

(a) the mutated DYS392 values for two individuals who are brothers at generation 1.

(b) two markers for the same individual

The mutation rate PER MARKER PER GENERATION is 1/500 or 0.002. But we use multiple markers. And I'm lazy...

If you set the rate to 0.002 you can simulate the 40 generation progression of DYS392 for two individuals and the cell marked 'Distance' below row 40 shows the absolute genetic distance comparing just this one marker. It is also a useful reminder that by the present (ie generation 40) BOTH HAVE DIVERGED and so neither may have the ancestral value. Or one or both may still have it.

What about the effect of all the other markers? Each may be mutating. So the more you have, the more likely at least one of them will have changed. You can simulate this using the same spreadsheet by changing the mutation rate from the single per marker value of 0.002 to one of the higher values shown that reflect the number of markers, but illustrated using just the two columns. Pretend, if you like, that they are all subject to mutation but we have only illustrated the two columns that actually got a mutation and that all the others stayed still.

Put another way, using the testing company rates illustrate the improved sensitivity obtained by increasing markers. And since DNAH use the most, DNAH has the best chance of splitting two branches of your tree.

Why did I pick 40 generations? a bit of number coincidence really, 40 generations is approximately

(a) the sensitivity of a low resolution test (Oxford Ancestors)a genetic distance of 1 with one of these equates to 40 generations to the common ancestory, statistically speaking

(b) 40 generations takes us back to around the surname adoption practise began so before that who cares

(c) its 40 generations since Fergus, Lord of Galloway, who I believe is the oldest recorded ancestor of the Kennedys

Is the single marked rate actually 0.002, averaged over long timescales? I wish we knew. There is a lot of debate about this and alternative figures within a range of 3 are discussed. Not all testing companies adopt the same rate in their literature. This is one reason for being very careful about using Time to Most Recent Common Ancestor calculations to map your family tree.

For further discussion and analysis of mutation rates for family historians, see Charles Kerchner's excellent page 'DNA Mutation Rates - An Overview and Discussion'.

Whilst Charles' coverage of STR mutations is superb, his comments about the usefulness of SNPs may be fast getting out of date. The pace of developments in the last few months, mainly driven by David Faux's new company Ethnoancestry, is breathtaking and who knows where it will have lead in another 12 months.

Download the spreadsheet here - it uses a macro to do the simulation so you will have to 'Enable macros' if you have this warning turned on (if you don't, turn it on!). Always employ virus checking when downloading files from the Internet.

The spreadsheet is a bit of fun, others do this sort of thing a lot better than I do. Check out Dean McGee's utility page if this interests you.