It’s been widely reported that investigators got a break in the East Area Rapist/Golden State Killer case when they uploaded a DNA profile to a genealogy database, GEDmatch, and identified relatives of the suspect, Joseph DeAngelo. Did they get lucky, or did they have a good chance of finding him? UC Davis population biologists Graham Coop and M. D. “Doc” Edge have written a nice explainer of the science behind this search.
GEDmatch contains almost a million profiles, not as many as the private databases run by companies such as 23andMe but still large enough to have a good chance of picking up third- or fourth-cousins of a given person, especially if the family was of European ancestry and had been in the U.S. for a few generations (the group best represented in these genealogical databases).
U.S. law enforcement uses the CODIS (Combined DNA Index System) to match DNA from crime scenes. There are some important differences between CODIS and the genealogical databases. Although CODIS contains 13 million DNA profiles, they are mostly of people who have already been convicted of a crime. Secondly, the information in CODIS is based on 13 to 20 areas of DNA called “microsatellite” markers. These are very powerful for making an exact match and are sometimes known as “DNA fingerprints.” But they are less useful for finding connections with relatives beyond first cousins.
The genealogical databases such as GEDmatch, however, are based on a genome-wide set of single nucleotide polymorphisms or SNPs — changes in a single letter of DNA. That makes them much better at identifying more distant relatives.
There are two opposing factors at work here, Coop and Edge write. We all have two parents, four grandparents, eight great-grandparents and a correspondingly expanding number of first, second and third cousins, many of whom we may never have met, especially at the third cousin or higher level. That means there is a good chance that a given individual has at least some relatives in a database of a million profiles or so.
On the other hand, the amount of genetic material we share with our cousins decreases with every generation until it becomes quite possible that we may be related by genealogy, but have no DNA sequences in common. Coop and Edge calculate that these factors coincide at about the level of third to fourth cousins.
To learn more, read Graham Coop's blog.
This story was originally published on the UC Davis Egghead Blog.