Identifying Personal Genomes by Surname Inference

El presente artículo, publicado en "Science", viene a demostrar la afirmación bíblica de que nada permanecerá oculto. Pues bien, las cosas son así: el Proyecto Genoma Humano y otros esfuerzos para conocer la plantilla genética del Homo Sapiens se gestó mediante la secuenciación del ADN de indivíduos anónimos. Es decir nadie revela su constitución genética particular (excepto aquellos delincuentes que se han ganado a pulso que la policía los tenga registrados hasta por el perfil genético en una base de datos). Pero no es el caso de un paper en el que el equipo investigador cruza datos y, un poquito por aquí y otro tanto por otro lado, y a base de emplear información libre de internet, llega a identificar algunos de los donantes. El asunto no es moco de pavo. Con los denominados metadatos tales como edad y región geográfica de procedencia y las secuencias génicas, se le pone nombre y apellidos a la información científica. Vamos, un Gran Hermano íntimo y familiar. Si ellos lo hicieron lo puede hacer cualquiera.

+Author Affiliations

¹Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA.
²Harvard–Massachusetts Institute of Technology (MIT) Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA.
³Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁴Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
⁵Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, TX 77030, USA.
⁶Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel.
⁷School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
⁸Department of Molecular Microbiology and Biotechnology, Tel-Aviv University, Tel Aviv 69978, Israel.
⁹The International Computer Science Institute, Berkeley, CA 94704, USA.

↵*To whom correspondence should be addressed. E-mail: yaniv@wi.mit.edu

ABSTRACT

Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.

codón desastre

Páginas vistas

viernes, 18 de enero de 2013

Barra libre de ADN

Identifying Personal Genomes by Surname Inference

ABSTRACT

No hay comentarios:

Publicar un comentario