Friday, March 7, 2014

Open Data

"Open Access" to published scientific reports -- especially reports produced using government funds -- has been a topic for heated debate for a number of years. More recently, I've come across an equally heated debate on Twitter about "Open Data". Open Data, similar to its sibling Open Access, asserts that researchers should make the data they collect, clean, and organize freely and publicly available. There are variations in how this should look. Some would have datasets published along with manuscripts. Some would have them posted on an institutional website. Others would have datasets "accessible", but only through a gateway through which interested parties would submit an analysis request to the original researchers, who would do the actual analysis and send the results to the requester.

I can see both sides of the issue. On the one hand, I think publicly funded research should absolutely be made public and freely accessible -- at least, at some point in its life. On the other hand, I think the original researchers should retain credit for the work they did collecting, cleaning, and organizing.

There are deeper issues too. There is the potential for a negative impact on the career paths of researchers whose data "goes public", since their capacity to publish from their data is going to decrease as the number of people with access to their data increases. That is not to say that researchers should be allowed to sit on their data in perpetuity either, but it seems to me that there ought to be balance struck.

Also, there is a particularly sticky issue with human genetic datasets. A tremendous amount of public resources have been and will continue to be poured into genotyping, sequencing, etc. to ferret out the "host" causes of human disease. Shouldn't this data be accessible to as many people as possible, so we can accelerate the process of identifying how genes affect human health? Well, maybe. Then again, is it really ethical to make genotyping and sequencing data fully accessible to anyone on the internet when we know that it is now possible to identify individuals from such data?


