9 March 2014

The open data controversy

As of March 3rd, open access publisher Public Library of Science (PLoS) requires authors publishing in one of its journals  to make publicly available all data relevant to the conclusions drawn in their papers.

After the wave of protests (just check #PLoSFail) that PLoS delicately called "an extraordinary outpouring of discussions" or "[a] flurry of interest", the publisher changed its position, see updated original blog post and the more recent explanation. The requirement was downsized from 

"...any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances" 


"...if you are providing graphs, it would indeed be helpful to provide the spreadsheet from which you generated the graph. If you think some other form of the data would be useful to other researchers who might want to understand, replicate or build on your work, please do include it. Conversely, if it is usual in publications in this field to provide only the summary information, then that remains sufficient now."

In short, they attempted a revolution before quickly returning to the status quo. Still, it's a good thing they insist on making available the raw data for graphs: squinting at ten superposed curves in log-log scale is hard on the eyes...

As to their more ambitious goal of opening all the relevant data, it is much too early for that, mainly because scientists see their (painfully acquired) data as valuable property, to be converted into publications. Giving it up for free to colleagues (and competitors) is not an economically viable model. A very lucid presentation of this point of view was given by Terry McGlynn, in a (long) blog post.

I am sure a default policy of open access to all scientific data (with reasonable exceptions) would be a Very Good Thing, but we need to work out a way of formally crediting the initial authors. The only solution I see is co-authorship, but that would pose some serious problems:
  • Merely collecting the data is not sufficient for authorship. PLOS ONE, for instance, requires active involvement in all stages of the work.
  • The original authors might disagree with the conclusions of the second team, or even compete with them by writing their own, separate analysis.
  • The number of authors on scientific papers is already quite large; do we need to increase it? This point might be solved, or at least mitigated, by a detailed description of each author's role.
I'm looking further to the development of this story, in particular to the response of the more established journals.

