MITRE's Tech Futures Podcast

Securing Our Genomes

June 09, 2022 MITRE Season 1 Episode 2
MITRE's Tech Futures Podcast
Securing Our Genomes
Show Notes Transcript

With fast, cheap and universally accessible full genome sequencing on the horizon, the race is on to secure our genomes—but is it possible that our insatiable appetite for information might hinder our ability to protect our most intimate datasets?

By Brinley Macnamara

Guests: Dr. Alex Perez, Frank Vasquez and Dr. Alice Isenberg

Dr. Alex Perez (00:00):

I know this is kind of a cliche, but I was listening to a podcast, a true crime podcast, and they were discussing the Golden State Killer case. And for those who aren't familiar with that case, the Golden State Killer was a serial killer active in California, around the late seventies, early eighties. And this was a cold case. They never found a suspect for the case, but they still had a repository of information, including a preserved DNA sample from one of the crime scenes. So as technology matured in 2018, police were able to take that DNA profile, upload it to a third party database named GEDmatch. And they were then able to ... GEDmatch provides a hereditary matching service. So they run different algorithms that find relatives based on the profile. So you can think of people finding long loss relatives or learning more about their ancestry. So through doing this, they were able to find actual descendants or relatives of the crime scene DNA and then using traditional, I'll call it detective work, they were able to get in touch with those people, interview them and sort of narrow their scope. And then eventually identify a suspect who was recently convicted, I believe in mid to late last year of the crime.

Brinley Macnamara (host) (01:50):

Hello and welcome to MITRE's Tech Futures Podcast. I'm your host, Brinley Macnamara. At MITRE we offer unique vantage point in objective insights that we share in the public interest. And in this podcast series, we showcase emerging technologies that will affect the government and our nation in the future.

Brinley Macnamara (host) (02:08):

Today, we're talking about the story of how a novel technique law enforcement used to hunt down a serial killer inspired Dr. Alex Perez to explore the ways this emerging technology could be exploited to compromise the genomic data of tens of millions of Americans. But before we begin, I want to say a huge thank you to Dr. Kris Rosfjord, the Tech Futures Innovation Area Leader in MITRE's Research and Development program. This episode would not have happened without her support. Now, without further ado, I bring you MITRE's Tech Futures podcast, episode number two.

Brinley Macnamara (host) (02:55):

The technique that police used to hunt down the Golden State Killer is formally known as genealogical triangulation. I know, it's a mouthful of a term, and it's just one technique in an emerging field with an even bigger mouthful of a name, forensic genetic genealogy. A discipline that combines the most advanced methods for genetic analysis with the vast repositories of DNA data that can be found in private sector DNA databases that, as of today, contain the DNA profiles of millions of Americans. And what makes a private sector database that police use to crack the Golden State Killer case, known as GEDmatch, especially powerful is that it's completely open to the general public and is therefore searchable by virtually anyone. And because of this inherent openness combined with the sheer number of private citizens who have chosen to upload their raw DNA profiles to GEDmatch, it's become the defacto database for forensic genetic genealogy. A gold mine for leads that have enabled law enforcement to solve hundreds of cold cases that were considered to be unsolvable only just a few years ago.

Brinley Macnamara (host) (04:04):

So how is it that millions of private citizens have been able to get their DNA sequenced and subsequently uploaded to publicly accessible databases like GEDmatch? The answer lies in the rise of direct to consumer genetic testing services, which have played a key role in the recent explosion in the number of people getting their DNA sequenced and sharing their DNA profiles to third party databases like GEDmatch for the purpose of finding out more about their family's history, or perhaps even a long lost relative. But what could be so dangerous about uploading our DNA profiles to public databases? And if the police use our DNA profiles to track down a serial killer, aren't we doing the public a huge favor? Well, the risk of having our DNA profiles on the public internet, isn't so much that police are using them to lock up serial killers. Rather it lies in the hypothetical of what could happen if a malicious actor used these same techniques to, well do hacker things.

Brinley Macnamara (host) (05:03):

For example, in 2019, researchers at the University of Washington demonstrated that it was possible to falsify relatives by uploading fake DNA profiles to GEDmatch. Moreover, just as police used GEDmatch to hunt down the Golden State Killer, there's little to prevent a malicious actor from using the same trade craft to unmask sensitive DNA profiles. And if you weren't already at the edge of your seat, here's another plot twist. Unlike your typical type of sensitive data, like a password or your social security number, uploading your DNA data to the public internet doesn't just mean your DNA data was shared. It means the DNA of your parents was shared, your siblings, your cousins, grandparents, aunts, uncles, great aunts, great uncles, second cousins, third cousins. In other words, the exposure of your genomic data could impact dozens, if not hundreds of other people. I asked Frank Vasquez, a biomedical engineer and Dr. Perez's co-principal investigator for his thoughts and this particularly concerning risk that comes with the sharing of our genetic data.

Frank Vasquez (06:03):

Yeah I'm glad you brought that up actually, because that's one of the risks that I don't think people consider when they're doing this themselves or when family members are doing it. They could be potentially putting other people that they know or are related to at risk of these types of attacks. The really scary part about it, to me at least is that it doesn't even necessarily have to be that close of a relationship like a sibling or a parent. If I had two of your fourth cousins, I could probably come up with you after a series of queries.

Brinley Macnamara (host) (06:43):

But the risks of sharing our DNA profiles don't stop there. Here's Dr. Perez.

Dr. Alex Perez (06:48):

So I just wanted to add that one of the things that's unique about genomic data particularly is the long term sensitivity. So when you think about other types of personally identifiable information or sensitive information, you can think of like something like a pin number or a password. If those are obtained by a malicious actor, well, you can change them, right? They're changeable. Things like my date of birth. I can't change that, but it's only relevant as long as I'm alive, right? Your genomic information is relatively immutable. It doesn't change much over your lifespan. So if someone were to get tested at age 30, let's call it. So you're talking another 40 to 50 years where that data is uniquely identifiable to a person.

Brinley Macnamara (host) (07:46):

What I find especially fascinating about Alex and Frank's work is that they're not only interested in evaluating the threats to our genomic data, but also the most cutting edge counter-measures that can be used to secure these precious data sets.

Dr. Alex Perez (07:58):

The main thing to remember here is that a lot of approaches with encryption are based on an assumption of computational hardness, right? It's just that it will take forever for someone to brute force decrypt this, or it'll take so much computational power that we can say it's basically impossible and count it as a win. But given the long term sensitivity of the data and given some of the advances that we as a civilization, I guess, are on the precipice of like quantum computing and things like that, those assumptions start to become more tenuous, especially with the genomic data.

Brinley Macnamara (host) (08:51):

I asked Dr. Perez about which defensive technique he finds the most promising. And to be honest, I was surprised to hear his answer was a form of encryption that I never heard of before.

Dr. Alex Perez (09:01):

What honey encryption aims to do is when a wrong password is guessed in a brute force attack, it will provide a plausible looking genomic profile back. So when an adversary guesses that wrong password, they get something that they can't really easily determine is wrong. So they have no clue, even if they guessed the right password in theory, they wouldn't know because they would have no way of filtering out.

Brinley Macnamara (host) (09:31):

If you're well versed in cybersecurity fundamentals, then the idea of using a decoy to divert an attacker will probably ring a bell. In network security there is an analogous concept of a honey pot, which at the most basic level is a server outfitted with some seemingly valuable data meant to incentivize attacker engagement with a monitoring system, thereby throwing them off a high value target. Honey encryption leverages the same underlying idea, provide the attacker with some plausible looking data to stop an attack, or at least, cast doubt on the data the attacker was able to unmask.

Brinley Macnamara (host) (10:23):

In order to make genetic testing scalable, profitable, and valuable to the customer direct to consumer genetic testing companies have had to develop smarter ways of analyzing DNA, which brings me to another mouthful of a term that has been key to making the once far-fetched aspiration of cheap and accessible genetic testing are reality. Single nucleotide polymorphisms also known as SNPs

Dr. Alice Isenberg (10:47):

SNPs aren't new. It's just that they are newly being applied to forensic science applications.

Brinley Macnamara (host) (10:54):

That's Dr. Alice Isenberg talking. She's a Chief Engineer at MITRE. She came to us from the FBI laboratory where, among other things, she oversaw the team that managed and developed the FBI's DNA database. With her PhD in analytical chemistry, commanding presence and thousands of crime scene investigations under her belt, Dr. Isenberg is honestly more remarkable than the fictional FBI agents I see on TV. She also happens to be really good at explaining concepts like single nucleotide polymorphisms to people who, well, don't have their PhDs in analytical chemistry.

Dr. Alice Isenberg (11:28):

So as it sounds, a SNP is a substitution of a single nucleotide or base at a specific position in the DNA genome. For example, at a specific base position in one part of the genome, a C nucleotide or base may appear in most individuals, but in a minority of individuals, that position may be occupied by an A nucleotide or base.

Brinley Macnamara (host) (11:55):

I asked Alice why analyzing SNPs can be more useful than analyzing short tandem repeats, often called STRs, to find the relatives of a particular unidentified DNA sample. While direct to consumer genetic testing companies rely primarily on SNPs for the analysis of their customers' DNA, the FBI relies primarily on short tandem repeats for analyzing suspect DNA. This boils down to looking at variations in how many times a particular set of nucleotides repeats, rather than variations in mutations of specific nucleotides.

Dr. Alice Isenberg (12:28):

They can be helpful for confirming kinship because A, the sheer number of places in the DNA is much greater and they're distributed much more thoroughly throughout the genome. So that helps them look at more areas of DNA that could have been inherited to the extent that using SNPs you can predict fairly well, is this person a first cousin, second, third, fourth cousin. It can go really far out because of the amount of DNA it's looking at and the depths of that.

Brinley Macnamara (host) (13:05):

In light of all this, I asked Alice if she saw any future possibilities for the expansion of SNP based analysis techniques at the FBI Laboratory.

Dr. Alice Isenberg (13:13):

The issue at hand is that there are currently on the order of 20 million profiles in CODIS...

Brinley Macnamara (host) (13:20):

CODIS, by the way, is the name of the FBI's DNA database.

Dr. Alice Isenberg (13:24):

...that have been collected across 30 years of operation.14 million plus of those are offenders, 4 million plus are arrested individuals, and then about a little over a million are crime scene samples as of December, 2020. So if you want to add a capability to do SNPs to CODIS, you then have to reanalyze all of those samples and you would do that anytime a new technology comes along. So there are techniques for translating SNP profiles into STR profiles, but it is an imperfect translation because there's missing data. So ideally, we would be able to sequence entire genomes efficiently and cheaply so that that sequence can be converted into STR profiles if there's sufficient high quality DNA present to do that. So that would be the ultimate goal for CODIS is to be able to have full sequences and then map those to whatever type of DNA you have in your database.

Brinley Macnamara (host) (14:25):

As DNA sequencing technology cheapens at a breathtaking pace, I can't help but think about the implications of Alice's projection about the future of DNA forensics, wherein fully sequenced genomes will we analyzed on a routine basis. It shouldn't take too much stretching of the imagination to see that SNP based genetic testing has raised a bar for the protection of genomic data quite significantly, not only because of the sheer amount of information contained within a SNP profile, but also because SNP profiles include fully sequenced coding DNA. In other words, SNP profiles contain the genes that encode for everything from our fundamental bodily functions to the traits that make each of us unique.

Brinley Macnamara (host) (15:05):

Now that we are on the precipice of this human experiment, it will be imperative that MITRE, with our trusted third party status, work with our government sponsors to secure our most intimate data sets, i.e. our genomes, whose lifespans will last for generations, far beyond the humans whom each of these data sets encode.

Brinley Macnamara (host) (15:29):

The show is written by me. It was produced and edited by Dr. Kris Rosfjord, Dr. Heath Farris, and myself, with editing assistance from Beverly Wood and Bradley Hague. Our guests were Dr. Alex Perez, Frank Vasquez, and Dr. Alice Isenberg. The music in this episode was brought to you by Anthony Earls, Ooyy, Craig Weaver, and Truvio.

Brinley Macnamara (host) (15:50):

We'd like to give a special thanks to Dr. Kris Rosfjord, the Technology Futures Innovation Area Leader for all her support. Copyright 2022 MITRE PRS # 22-0495, February 8th, 2022.

Brinley Macnamara (host) (16:09):

MITRE: solving problems for a safer world.