Skip to content
ALL Metrics
-
Views
Get PDF
Get XML
Cite
Export
Track
Open Letter

From data sharing to data publishing

[version 1; peer review: 1 approved, 2 approved with reservations]
PUBLISHED 24 Jan 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Data sharing, i.e. depositing data in research community accessible repositories, is not becoming as rapidly widespread across the life science research community as hoped or expected. I consider the sociological and cultural context of research and lay out why the community should instead move to data publishing with a focus on neuroscience data, and outline practical steps that can be taken to realize this goal.

Keywords

Data sharing, data publishing, FAIR principles

Some research practices evolve rapidly. In the past few years, the number of preprints in BioRxiv has more than doubled every year, from 797 articles in 2014, 1601 in 2015, 4,295 in 2016, and already 10,819 posted in 2017. This is transformative, and is likely to redefine the publishing world in years to come - but an article on a preprint archive system is not considered as “published” until the content has been reviewed by community experts for correctness (and sometimes, unfortunately, for “importance”).

Data sharing has also become more widespread. Taking as an example the field of brain imaging, initiatives such as the Human Connectome Project, the UK Biobank, INDI, ABIDE, OpenfMRI, and many others have made very large datasets available to the community (Poldrack & Gorgolewski, 2014; Poline et al., 2012). The number of publications using these datasets is growing fast and poses some interesting questions on the re-analysis of the same datasets (Poldrack & Poline, 2015). The benefits of data sharing are numerous, but first and foremost accessible data increases the chance for reproducibility and replicability. The release of data is increasingly mandated by funding agencies, such as the Wellcome Trust (see for instance the 2015 report from the United Kingdom Academy of Medical Sciences), but many researchers also individually recognize that they should be releasing data, since these are research products acquired under their stewardship for the progress of science or medicine, and not their “property”. Given the numerous compelling studies on the lack of statistical power (Button et al., 2013; Poldrack et al., 2017) and its possible role in the reproducibility crisis in life sciences, there is a very strong scientific incentive to make data accessible to the research community.

Nevertheless, data sharing does not seem to be taking over the world of biomedical or neuroscience research at a pace similar to the growth of preprint archiving systems. There are clear reasons for this. A key one is that data is often thought of as an asset in a competitive environment, which disincentives sharing. While an article is always written to communicate research results, releasing data to the scientific community necessitates efforts beyond current practices for the data to be documented appropriately, and requires sustainable local or remote infrastructures capable of dealing with possibly large amounts of data. Data may also be sensitive, therefore needing additional ethical and legal aspects to be considered and implemented. Data sharing with all the necessary environment - in other words making data FAIR (Findable, Accessible, Interoperable, Reusable (Wilkinson et al., 2016)) - is therefore thought to be “too complicated” or “too costly”. While it is certainly true that this would require effort, it seems that the key issue is motivation (or lack thereof). When a new research technique appears promising, laboratories will eagerly invest in material or human resources to adopt it. This may take months or even years and can necessitate large financial resources, new recruitments, and/or months of staff training. While extensive data sharing would likely radically change the efficiency and speed of science, this is not (yet) thought to be worth investing heavily in, except in a few laboratories or institutions, such as the Montreal Neurological Institute with its Open Science Initiative (Owens, 2016).

It is time that data publishing supersedes data sharing. Since researchers are happy to invest time and resources to publish their work, and gain recognition from their peers through these publications, publishing data articles is a solution to increase the number of available well documented and citable datasets, for both fundamental and clinical research. A data article is a full description of a dataset for its future use in research, and should contain all necessary corresponding information making the dataset useful for a research community. Data articles are standard articles and therefore participate to the current publication infrastructure that tracks impact and increases visibility (indexing in bibliographical database) and is used – or misused - for research assessment. Some research even show that data articles may have higher citation counts compared to conventional articles (Leitner et al., 2016).

In addition to solving - at least partly - for the motivation issue, data publishing elevates data to a first class research object because it is reviewed for its usability and usefulness by the research community. It brings the peer review process to data accessibility, technical documentation, provenance, ethical and legal aspects, quality measurements, etc. Data acquisition and quality checks do require time, effort, years of expertise and are fundamental to any scientific result (other than simulation or theory), and therefore deserve the recognition associated with a publication. Data papers are citable, transforming the FAIR principles into FORCE (FAIR, Open, Research-Object based, Citable Ecosystem, Data Citation Synthesis Group, 2014).

Some practical steps to further data publishing.

What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?

- Researchers can today engage in training on the tools and standards required for efficient and adequate management and reuse of datasets (see for instance the ReproNim NIH-funded project and its online training module on FAIR data), and these tools may vary depending on the specificities of the data themselves. Training could for instance target the use of a database system when these infrastructures exist, or the use of more lightweight solutions, such as DataLad, a project that adds a layer of metadata on the git-annex distributed data versioning system. Training should at least cover the appropriate metadata for data description, the ethical and legal constraints linked to data accessibility and reuse, legitimate license and data usage agreements, and information on the rationales for data paper publishing.

- Universities and institutions themselves can step up their training proposal in this domain. While some online resources exist, formal courses are needed on the technical, legal and ethical, and sustainability aspects of data management, provenance documentation, citation, FAIR principles and their possible implementations in specific domains. All of these will eventually be part of the life scientist’s curriculum. This dovetails with the evolution of a university’s school of information and libraries mission, as they become the new stewards of sustainable repositories and long term digital archiving – and likely, in the future, of scholarship e-communication.

- Funding bodies have both a simple and critical role to play. They need to ensure that their funds are being used with maximum efficiency, and therefore mandate data release when possible. Already the Wellcome Trust and NIMH amongst others have taken steps in this direction for scientific, ethical, societal, and economical reasons.

- Publishers and editors can also implement practical steps, to establish “data articles” as a key article type, and require that data availability be the norm, not the exception (PLOS, F1000Research and Royal Society Journals, Scientific Data, are examples of journals with data sharing requirements – eg http://journals.plos.org/plosone/s/data-availability, see also Allison et al., 2016), as well as enforce data standards when they exist.

- Last but not least, international organizations and scientific societies can establish and develop standards for repositories as well as for metadata. Already, some journals are vetting for some “acceptable” repositories based on the amount of available metadata and their long term sustainability, but we still often lack recognized criteria for what should be considered a well-documented and long term accessible dataset. The International Neuroinformatics Coordinating Facility (INCF) will certainly play a key role in establishing standards and best practices in neuroscience and should become a certification body. In the past, INCF has successfully launched standards such as BIDS (Gorgolewski et al., 2016).

Today there is an increase in the number of journals accepting neuroscience-focused data articles (e.g. Scientific Data, GigaScience, F1000Research, eNeuro, eLife, MNI Open Research, Wellcome Open Research), but they make only for a small proportion of the literature and of the acquired datasets. While data papers are still a novelty, they should be more and more recognized for what they are: first class research objects, findable, citable and re-usable building blocks of science. This transformative change of practice – and culture - needs to involve the entire research community: funding agencies, publishers, editors, and researchers. In the future, computationally readable metadata are likely to be used to automatically update, refine, in/validate or generalize results with machine findable datasets, profoundly changing the practice of science. Additionally, software and analyses scripts may also reach the stage of publishable research object category (Eglen et al., 2017), leading to a full-fledged reproducible and re-usable publication. Let’s not share data: let’s publish them.

Disclaimer

The views expressed in this article are those of the author(s). Publication in MNI Open Research does not imply endorsement by the MNI.

Comments on this article Comments (1)

Version 1
VERSION 1 PUBLISHED 24 Jan 2018
Discussion is closed on this version, please comment on the latest version above.
  • Reviewer Response 10 May 2018
    Chris Gorgolewski
    10 May 2018
    Reviewer Response
    I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full

    Interventions focused on academic publishing seem to ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
VIEWS
$counts.viewCount
 
downloads
Citations
CITE
how to cite this article
Poline JB. From data sharing to data publishing [version 1; peer review: 1 approved, 2 approved with reservations] MNI Open Res 2018, 2:1 (https://doi.org/10.12688/mniopenres.12772.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 24 Jan 2018
Views
0
Cite
Reviewer Report 18 Dec 2018
John Borghi, Stanford University, Stanford, CA, USA 
Ana E. Van Gulick, University Libraries, Carnegie Mellon University, Pittsburgh, PA, USA 
Approved with Reservations
VIEWS 0
Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Borghi J and Van Gulick AE. Reviewer Report For: From data sharing to data publishing [version 1; peer review: 1 approved, 2 approved with reservations]. MNI Open Res 2018, 2:1 (https://doi.org/10.21956/mniopenres.13832.r26142)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 31 Jan 2019
    Jean-Baptiste Poline, McGill University, Montréal, Canada
    31 Jan 2019
    Author Response
    Hi, thanks for the feedback, I have put the responses inline preceeded by >>>


    Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 31 Jan 2019
    Jean-Baptiste Poline, McGill University, Montréal, Canada
    31 Jan 2019
    Author Response
    Hi, thanks for the feedback, I have put the responses inline preceeded by >>>


    Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging ... Continue reading
Views
0
Cite
Reviewer Report 13 Mar 2018
John Chodacki, California Digital Library, University of California Curation Center, Oakland, CA, USA 
Approved
VIEWS 0
Well thought out framing of a key challenge in communicating research.  

I would like to see more discussion of the basics: what is meant by "data" and why we want researchers to "publish".

... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Chodacki J. Reviewer Report For: From data sharing to data publishing [version 1; peer review: 1 approved, 2 approved with reservations]. MNI Open Res 2018, 2:1 (https://doi.org/10.21956/mniopenres.13832.r26093)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
0
Cite
Reviewer Report 06 Feb 2018
Nikola Stikov, NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montreal, Montreal, QC, Canada;  Montreal Heart Institute, University of Montreal, Montreal, QC, Canada 
Approved with Reservations
VIEWS 0
This letter is very timely, convincing and informative. Scientific communication is undergoing a sea change, and data publishing will be at the center of the storm. JB Poline does an excellent job of drawing the public's attention to the burning ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Stikov N. Reviewer Report For: From data sharing to data publishing [version 1; peer review: 1 approved, 2 approved with reservations]. MNI Open Res 2018, 2:1 (https://doi.org/10.21956/mniopenres.13832.r26068)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (1)

Version 1
VERSION 1 PUBLISHED 24 Jan 2018
Discussion is closed on this version, please comment on the latest version above.
  • Reviewer Response 10 May 2018
    Chris Gorgolewski
    10 May 2018
    Reviewer Response
    I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full

    Interventions focused on academic publishing seem to ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.