Some issues with direct-to-consumer personal genomics

October 13, 2012

There are a few issues on the personal genomics boo-hah that I've been meaning to write down, so, time for a brain dump. Context: The cost of sequencing DNA is dropping, fast, companies like 23andme have been offering direct-to-consumer SNP genotyping at affordable prices for quite a while now, and they even recently started offering exome sequencing for around 1000 dollars.

(Explaining the lingo: a SNP is a single nucleotide polymorphism - a single base in the genome where at least 1% of the population under study have a different "letter" than the rest. The exome is the set of all exons in the genome; very roughly speaking, it's the subset of the genome that codes for proteins and functional RNA. That doesn't mean the rest of the genome is junk, but the exome is a handy, sufficiently interesting subset that can already be sequenced at relatively low cost.)

And now, to the issues I mentioned. For me, the two major ones are "Analysis and interpretation" and "Information content and privacy".

Analysis and interpretation

One thing that irks me is this: at this point in time, with our current understanding of all things biological, personal genomics is little more than a geeky (although admittedly shiny) toy. Marketing it as anything more is just plain dishonest. Yes, there are some things genotyping and genome sequencing can tell us, but for a healthy individual, these are few and far between, to say the least. Do you smell snake-oil?

Apart from this problem of our still limited understanding, extracting information from sequencing data still requires a lot of work, with a little bit of voodoo on top of it. And I mean, a lot of work - expensive work, that needs to be carried out by skilled analysts.

On top of that, of course, there's the thing that a person's genomic information is only a small part of the picture. Methylation patterns, somatic mutations in a subset of an organism's cells, gene expression levels in response to stimuli or in dependence of the cell and tissue type, translation rates, protein-protein interactions, involvement of the immune system, or the gut microbiome are just examples for other parts that play a role in determining an organisms biological state. To borrow a metaphor from another field: If you're trying to reverse engineer a heterogenous distributed computer system, and you start out by dumping the firmware of one component - that would be the equivalent of exome sequencing. It's a part of the picture, for sure, but it's only a small first step towards understanding the entire system. Just something to keep in mind.

Information Content and Privacy

Another issue, and one that has kept me away from services like 23andme: Information content and privacy. I find it unnecessary to rehash the obvious privacy implications here, I just want to draw attention to two things:

Number one, 23andme in particular is an American company, that is under American jurisdiction. I'm sure you all remember some of the cases in the recent past where law enforcement agencies have approached Web 2.0 services with requests to hand over the personal data of individuals under investigation (off the top of my head, I recall cases involving Twitter, but I'm sure there are more). The SNP genotyping data that 23andme keeps in their databases is more than sufficient to match it against a biological sample and possibly get an ID this way.

(Additional note: DNA Forensics for law enforcement currently does not use SNPs for identification, but STRs. The reason for this is, um, legacy. The databases that have been built up over the last decades are based on STRs, and moving to a different marker system - one based on SNPs, for example - would mean a lot of cost and effort. However, that doesn't mean a law enforcement agency cannot also get their labs to do SNP genotying on a sample if this seems useful.)

Number two, which is the more important one in my decision to stay away from 23andme: My genome also contains a load of information about my relatives. That means, if I decide to submit my genomic information for storage on the servers of a company, without being able to reliably tell how long this data is going to stay there and where it will eventually end up or what it will be used for in 50 years' time, then I also make this decision for my parents, my grandparents, my siblings, and their children - even the ones that aren't born yet. And I have come to the conclusion that I just don't have the right to make this decision for them.

Comments

UnknownOctober 13, 2012 at 9:31 PM
I agree on most parts with the analysis, if not on the conclusion of not getting genotyped in first place. But: I think you see it a bit negative on the side of what DTC genotyping can tell us.

Agreed on the prediction of disease risks, this are far more often not really informative than the other way around. But: The information whether you are a carrier for a certain disease are binary classifiers and thus pretty nice if you plan to have children.

On the data of your relatives: I think these are only really useful for your next of kin: Your parents and Children, even for siblings it gets harder without at least a second data set
ReplyDelete
Replies
UnknownOctober 13, 2012 at 9:33 PM
Meh, the comments here are nearly impossible to navigate on mobile safari, but I'll try to go on:

Without a second data set for reference in your relatives you pretty quickly end up with the same predictive value you already have for the disease risks now, as the chance of your relatives sharing a set of multiple SNPs drops pretty quickly with the distance of your relation and the number of SNPs needed to show a trait.
ReplyDelete
Replies

Add comment

Search This Blog

streamspace