From data sharing to data publishing

¹ Montreal Neurological Institute and Hospital, McGill University, Montréal, QC, H3A 2B4, Canada
² Henry H. Wheeler, Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, Berkley, CA, 94720, USA

Jean-Baptiste Poline
Roles: Conceptualization, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Data sharing, i.e. depositing data in research community accessible repositories, is not becoming as rapidly widespread across the life science research community as hoped or expected. I consider the sociological and cultural context of research and lay out why the community should instead move to data publishing with a focus on neuroscience data, and outline practical steps that can be taken to realize this goal.

Keywords

Data sharing, data publishing, FAIR principles

Corresponding author: Jean-Baptiste Poline

Competing interests: No competing interests were disclosed.

Grant information: This work was partially funded by McGill MNI startup funds, NIH-NIBIB P41 EB019936 (ReproNim) NIH-NIMH R01 MH083320 (CANDIShare) and NIH 5U24 DA039832 (NIF).

Copyright: © 2018 Poline JB. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Poline JB. From data sharing to data publishing [version 1; peer review: 1 approved, 2 approved with reservations]. MNI Open Res 2018, 2:1 (https://doi.org/10.12688/mniopenres.12772.1) First published: 24 Jan 2018, 2:1 (https://doi.org/10.12688/mniopenres.12772.1) Latest published: 31 Jan 2019, 2:1 (https://doi.org/10.12688/mniopenres.12772.2)

Some research practices evolve rapidly. In the past few years, the number of preprints in BioRxiv has more than doubled every year, from 797 articles in 2014, 1601 in 2015, 4,295 in 2016, and already 10,819 posted in 2017. This is transformative, and is likely to redefine the publishing world in years to come - but an article on a preprint archive system is not considered as “published” until the content has been reviewed by community experts for correctness (and sometimes, unfortunately, for “importance”).

Data sharing has also become more widespread. Taking as an example the field of brain imaging, initiatives such as the Human Connectome Project, the UK Biobank, INDI, ABIDE, OpenfMRI, and many others have made very large datasets available to the community (Poldrack & Gorgolewski, 2014; Poline et al., 2012). The number of publications using these datasets is growing fast and poses some interesting questions on the re-analysis of the same datasets (Poldrack & Poline, 2015). The benefits of data sharing are numerous, but first and foremost accessible data increases the chance for reproducibility and replicability. The release of data is increasingly mandated by funding agencies, such as the Wellcome Trust (see for instance the 2015 report from the United Kingdom Academy of Medical Sciences), but many researchers also individually recognize that they should be releasing data, since these are research products acquired under their stewardship for the progress of science or medicine, and not their “property”. Given the numerous compelling studies on the lack of statistical power (Button et al., 2013; Poldrack et al., 2017) and its possible role in the reproducibility crisis in life sciences, there is a very strong scientific incentive to make data accessible to the research community.

Nevertheless, data sharing does not seem to be taking over the world of biomedical or neuroscience research at a pace similar to the growth of preprint archiving systems. There are clear reasons for this. A key one is that data is often thought of as an asset in a competitive environment, which disincentives sharing. While an article is always written to communicate research results, releasing data to the scientific community necessitates efforts beyond current practices for the data to be documented appropriately, and requires sustainable local or remote infrastructures capable of dealing with possibly large amounts of data. Data may also be sensitive, therefore needing additional ethical and legal aspects to be considered and implemented. Data sharing with all the necessary environment - in other words making data FAIR (Findable, Accessible, Interoperable, Reusable (Wilkinson et al., 2016)) - is therefore thought to be “too complicated” or “too costly”. While it is certainly true that this would require effort, it seems that the key issue is motivation (or lack thereof). When a new research technique appears promising, laboratories will eagerly invest in material or human resources to adopt it. This may take months or even years and can necessitate large financial resources, new recruitments, and/or months of staff training. While extensive data sharing would likely radically change the efficiency and speed of science, this is not (yet) thought to be worth investing heavily in, except in a few laboratories or institutions, such as the Montreal Neurological Institute with its Open Science Initiative (Owens, 2016).

It is time that data publishing supersedes data sharing. Since researchers are happy to invest time and resources to publish their work, and gain recognition from their peers through these publications, publishing data articles is a solution to increase the number of available well documented and citable datasets, for both fundamental and clinical research. A data article is a full description of a dataset for its future use in research, and should contain all necessary corresponding information making the dataset useful for a research community. Data articles are standard articles and therefore participate to the current publication infrastructure that tracks impact and increases visibility (indexing in bibliographical database) and is used – or misused - for research assessment. Some research even show that data articles may have higher citation counts compared to conventional articles (Leitner et al., 2016).

In addition to solving - at least partly - for the motivation issue, data publishing elevates data to a first class research object because it is reviewed for its usability and usefulness by the research community. It brings the peer review process to data accessibility, technical documentation, provenance, ethical and legal aspects, quality measurements, etc. Data acquisition and quality checks do require time, effort, years of expertise and are fundamental to any scientific result (other than simulation or theory), and therefore deserve the recognition associated with a publication. Data papers are citable, transforming the FAIR principles into FORCE (FAIR, Open, Research-Object based, Citable Ecosystem, Data Citation Synthesis Group, 2014).

Some practical steps to further data publishing.

What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?

- Researchers can today engage in training on the tools and standards required for efficient and adequate management and reuse of datasets (see for instance the ReproNim NIH-funded project and its online training module on FAIR data), and these tools may vary depending on the specificities of the data themselves. Training could for instance target the use of a database system when these infrastructures exist, or the use of more lightweight solutions, such as DataLad, a project that adds a layer of metadata on the git-annex distributed data versioning system. Training should at least cover the appropriate metadata for data description, the ethical and legal constraints linked to data accessibility and reuse, legitimate license and data usage agreements, and information on the rationales for data paper publishing.

- Universities and institutions themselves can step up their training proposal in this domain. While some online resources exist, formal courses are needed on the technical, legal and ethical, and sustainability aspects of data management, provenance documentation, citation, FAIR principles and their possible implementations in specific domains. All of these will eventually be part of the life scientist’s curriculum. This dovetails with the evolution of a university’s school of information and libraries mission, as they become the new stewards of sustainable repositories and long term digital archiving – and likely, in the future, of scholarship e-communication.

- Funding bodies have both a simple and critical role to play. They need to ensure that their funds are being used with maximum efficiency, and therefore mandate data release when possible. Already the Wellcome Trust and NIMH amongst others have taken steps in this direction for scientific, ethical, societal, and economical reasons.

- Publishers and editors can also implement practical steps, to establish “data articles” as a key article type, and require that data availability be the norm, not the exception (PLOS, F1000Research and Royal Society Journals, Scientific Data, are examples of journals with data sharing requirements – eg http://journals.plos.org/plosone/s/data-availability, see also Allison et al., 2016), as well as enforce data standards when they exist.

- Last but not least, international organizations and scientific societies can establish and develop standards for repositories as well as for metadata. Already, some journals are vetting for some “acceptable” repositories based on the amount of available metadata and their long term sustainability, but we still often lack recognized criteria for what should be considered a well-documented and long term accessible dataset. The International Neuroinformatics Coordinating Facility (INCF) will certainly play a key role in establishing standards and best practices in neuroscience and should become a certification body. In the past, INCF has successfully launched standards such as BIDS (Gorgolewski et al., 2016).

Today there is an increase in the number of journals accepting neuroscience-focused data articles (e.g. Scientific Data, GigaScience, F1000Research, eNeuro, eLife, MNI Open Research, Wellcome Open Research), but they make only for a small proportion of the literature and of the acquired datasets. While data papers are still a novelty, they should be more and more recognized for what they are: first class research objects, findable, citable and re-usable building blocks of science. This transformative change of practice – and culture - needs to involve the entire research community: funding agencies, publishers, editors, and researchers. In the future, computationally readable metadata are likely to be used to automatically update, refine, in/validate or generalize results with machine findable datasets, profoundly changing the practice of science. Additionally, software and analyses scripts may also reach the stage of publishable research object category (Eglen et al., 2017), leading to a full-fledged reproducible and re-usable publication. Let’s not share data: let’s publish them.

Disclaimer

The views expressed in this article are those of the author(s). Publication in MNI Open Research does not imply endorsement by the MNI.

Competing interests

No competing interests were disclosed.

Grant information

This work was partially funded by McGill MNI startup funds, National Institutes of Health (NIH)-NIBIB P41 EB019936 (ReproNim) NIH-NIMH R01 MH083320 (CANDIShare) and NIH 5U24 DA039832 (NIF).

Acknowledgments

The author is grateful to many colleagues and friends from the MNI and the Montreal area, as well as to Thomas Ingraham for his comments and suggestions.

F1000 recommended

References

Allison DB, Brown AW, George BJ, et al.: Reproducibility: A tragedy of errors. Nature. 2016; 530(7588): 27–9. PubMed Abstract | Publisher Full Text | Free Full Text
Button KS, Ioannidis JP, Mokrysz C, et al.: Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013; 14(5): 365–76. PubMed Abstract | Publisher Full Text
Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014. Publisher Full Text
Eglen SJ, Marwick B, Halchenko YO, et al.: Toward standard practices for sharing computer code and programs in neuroscience. Nat Neurosci. 2017; 20(6): 770–73. PubMed Abstract | Publisher Full Text
Gorgolewski KJ, Auer T, Calhoun VD, et al.: The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016; 3: 160044. PubMed Abstract | Publisher Full Text | Free Full Text
Leitner F, Bielza C, Hill SL, et al.: Data Publications Correlate with Citation Impact. Front Neurosci. 2016; 10: 419. PubMed Abstract | Publisher Full Text | Free Full Text
Owens B: DATA SHARING. Montreal institute going ‘open’ to accelerate science. Science. 2016; 351(6271): 329. PubMed Abstract | Publisher Full Text
Poldrack RA, Baker CI, Durnez J, et al.: Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat Rev Neurosci. 2017; 18(2): 115–26. PubMed Abstract | Publisher Full Text
Poldrack RA, Gorgolewski KJ: Making big data open: data sharing in neuroimaging. Nat Neurosci. 2014; 17(11): 1510–17. PubMed Abstract | Publisher Full Text
Poldrack RA, Poline JB: The publication and reproducibility challenges of shared data. Trends Cogn Sci. 2015; 19(2): 59–61. PubMed Abstract | Publisher Full Text
Poline JB, Breeze JL, Ghosh S, et al.: Data sharing in neuroimaging research. Front Neuroinform. 2012; 6: 9. PubMed Abstract | Publisher Full Text | Free Full Text
Wilkinson MD, Dumontier M, Aalbersberg IJ, et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3: 160018. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (1)

Version 1

VERSION 1 PUBLISHED 24 Jan 2018

Discussion is closed on this version, please comment on the latest version above.

Reviewer Response 10 May 2018

Chris Gorgolewski

10 May 2018

Reviewer Response

I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full.

Interventions focused on academic publishing seem to ... Continue reading I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full.

Interventions focused on academic publishing seem to work. From the perspective of a public data repository (OpenNeuro.org) we are seeing many data submissions driven by a) existence of journals publishing data papers (for example Scientific Data) b) data sharing requirements (for example PLoS). It's far from mainstream, but things are changing in the right direction.
I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full.

Interventions focused on academic publishing seem to work. From the perspective of a public data repository (OpenNeuro.org) we are seeing many data submissions driven by a) existence of journals publishing data papers (for example Scientific Data) b) data sharing requirements (for example PLoS). It's far from mainstream, but things are changing in the right direction.
Competing Interests: I published with the main author and we are members of the same working group. Close
Report a concern
Discussion is closed on this version, please comment on the latest version above.

Author details Author details

Jean-Baptiste Poline
Roles: Conceptualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was partially funded by McGill MNI startup funds, NIH-NIBIB P41 EB019936 (ReproNim) NIH-NIMH R01 MH083320 (CANDIShare) and NIH 5U24 DA039832 (NIF).

Article Versions (2)

version 2

Revised

Published: 31 Jan 2019, 2:1

https://doi.org/10.12688/mniopenres.12772.2

version 1

Published: 24 Jan 2018, 2:1

https://doi.org/10.12688/mniopenres.12772.1

© 2018 Poline JB. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

VIEWS

$counts.viewCount

downloads

Citations

SEE MORE DETAILS

CITE

how to cite this article

Poline JB. From data sharing to data publishing [version 1; peer review: 1 approved, 2 approved with reservations] MNI Open Res 2018, 2:1 (https://doi.org/10.12688/mniopenres.12772.1)

NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 24 Jan 2018

Views

Reviewer Report 18 Dec 2018

John Borghi, Stanford University, Stanford, CA, USA

Ana E. Van Gulick, University Libraries, Carnegie Mellon University, Pittsburgh, PA, USA

Approved with Reservations

https://doi.org/10.21956/mniopenres.13832.r26142

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter advocates that data publishing in journals that mimic the peer-review and dissemination of process of articles will motivate the adoption of public data sharing. The letter includes some next steps to encourage data publishing which include training in tools and standards for data management and metadata, university support, funder mandates, and the growth of the “data article” in publishing platforms.

Overview: The letter is timely and relevant as scholarly publishing shifts dramatically with the emergence of open access and the opportunity for researchers to easily share products of their research including data and code that go beyond a static paper. Overall, there is some lack of clarity in the focus of the letter: whether it is intended to address data publishing at large or if it is intended mainly for neuroscience (or neuroimaging) research, some terms such as data sharing and publishing lack clear definitions, the challenges of data sharing beyond motivation and training are not addressed in detail, and next steps are somewhat vague regarding implementation. While the letter presents a reasonable overview and a significant recommendation for open science, it could be improved with more detail and focus.

Rationale: The author encourages researchers to make their data available so that it can be assessed and reused by others. Specifically, he recommends that efforts to facilitate “data publishing” supersede those related to “data sharing” in order to motivate researchers and lend credibility, accessibility, and more rigorous review to data dissemination.

Unfortunately, because neither “data publishing” nor “data sharing” are explicitly defined in the text, the difference between them is not clear and it is left ambiguous how such a framing would substantially change researcher behavior. To strengthen the argument being made in this letter, we recommend that the author clearly define both practices in the first section.

Differing views and opinions:
The letter addresses some of the challenges of data sharing but does not clearly address how data publishing would solve these challenges.

For example, the challenge of publishing both a paper on the scientific finding a dataset supports as well as a data paper on the dataset is not clear - is it expected both would be published around the same time? If this work is cited or reused, which paper should be cited? Similarly, if data is published then reused or extended, how should it be cited and assessed?

Language:
Throughout the paper there are sentences and paragraphs that are confusing and ambiguous. We would recommend the author revise these for clarity.

One example at top of the “Some Practical Steps...” section: “What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?”

This sentence appears to be advocating for improving data management practices throughout the course of a research project. However, as it is currently written, it ambiguous how “acquisition, documentation, and curation” relate to “publishable research objects”. We recommend that the author make this connection explicit.

Supporting arguments:
The parallel to preprints is confusing. Though the growth of preprints may be transformative, the activities involves in writing and posting a preprint is very similar to those involved in writing and submitting a scientific paper. Facilitating the sharing or publishing of datasets, as outlined throughout this article, presents some entirely unique challenges to researchers, institutions, funders, and publishers.

Throughout the paper, examples are drawn primarily from neuroscience (neuroimaging). If the paper is meant to address data-related practices more generally, we would recommend a broader set of examples. Otherwise, the focus on neuroscience (neuroimaging) should be made more explicit.

Few examples are offered for how to implement this training at universities or how to address training beyond neuroimaging. While the cited online resources are useful it might be helpful to address how this could be built into research methods training and integrated within research workflows. Similarly, how can this shift be supported by universities and libraries?

In the section detailing next steps for publishers and editors, the difference between implementation of “data articles” as an article type and the enforcement of “data policies” such as data availability statements and the requirement that data adhere to disciplinary standards is confusion.

The discussion of standards for repositories versus datasets is confusing. Again the example of INCF and BIDS is very specific to neuroimaging without elaboration as to why this is a good model.

Is the rationale for the Open Letter provided in sufficient detail?

Partly
Does the article adequately reference differing views and opinions?

Partly
Are all factual statements correct, and are statements and arguments made adequately supported by citations?

Yes
Is the Open Letter written in accessible language?

Yes
Where applicable, are recommendations and next steps explained clearly for others to follow?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Cognitive neuroscience, neuroimaging (MRI), research data management, data curation.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 31 Jan 2019

Jean-Baptiste Poline, McGill University, Montréal, Canada

31 Jan 2019

Author Response

Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging ... Continue reading Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter advocates that data publishing in journals that mimic the peer-review and dissemination of process of articles will motivate the adoption of public data sharing. The letter includes some next steps to encourage data publishing which include training in tools and standards for data management and metadata, university support, funder mandates, and the growth of the “data article” in publishing platforms.

Overview: The letter is timely and relevant as scholarly publishing shifts dramatically with the emergence of open access and the opportunity for researchers to easily share products of their research including data and code that go beyond a static paper.

* Overall, there is some lack of clarity in the focus of the letter: whether it is intended to address data publishing at large or if it is intended mainly for neuroscience (or neuroimaging) research, some terms such as data sharing and publishing lack clear definitions, the challenges of data sharing beyond motivation and training are not addressed in detail, and next steps are somewhat vague regarding implementation. While the letter presents a reasonable overview and a significant recommendation for open science, it could be improved with more detail and focus.

>>> Thanks for the feedback. Because it is published in "MNIopenresearch" the audience is in the neuroscience (and specifically neuroimaging) research fields. I made this clearer in the text.

Rationale: The author encourages researchers to make their data available so that it can be assessed and reused by others. Specifically, he recommends that efforts to facilitate “data publishing” supersede those related to “data sharing” in order to motivate researchers and lend credibility, accessibility, and more rigorous review to data dissemination.

Unfortunately, because neither “data publishing” nor “data sharing” are explicitly defined in the text, the difference between them is not clear and it is left ambiguous how such a framing would substantially change researcher behavior. To strengthen the argument being made in this letter, we recommend that the author clearly define both practices in the first section.

>>> Thanks for the note, I have now defined what I meant by data publishing and data sharing in the text.

Differing views and opinions:
The letter addresses some of the challenges of data sharing but does not clearly address how data publishing would solve these challenges.

For example, the challenge of publishing both a paper on the scientific finding a dataset supports as well as a data paper on the dataset is not clear - is it expected both would be published around the same time? If this work is cited or reused, which paper should be cited? Similarly, if data is published then reused or extended, how should it be cited and assessed?

>>> That's an important question but not the one I address. The timing can be independent (before, at the same time, after) but my personal opinion is that it should come first.

Language:
Throughout the paper there are sentences and paragraphs that are confusing and ambiguous. We would recommend the author revise these for clarity.

One example at top of the “Some Practical Steps...” section: “What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?”

This sentence appears to be advocating for improving data management practices throughout the course of a research project. However, as it is currently written, it ambiguous how “acquisition, documentation, and curation” relate to “publishable research objects”. We recommend that the author make this connection explicit.

>>> The idea here is that a dataset that has not reached a certain level of quality in terms of documentation and curation as defined by the community should not be "publishable". I have clarified this in the text. Let me know if you still find the text unclear.

Supporting arguments:
The parallel to preprints is confusing. Though the growth of preprints may be transformative, the activities involves in writing and posting a preprint is very similar to those involved in writing and submitting a scientific paper. Facilitating the sharing or publishing of datasets, as outlined throughout this article, presents some entirely unique challenges to researchers, institutions, funders, and publishers.

>>> yes, but the parallel is only with respect to the peer review process: one can put any article on a pre-print archive system, but only a subset of these will be accepted as peer reviewed. Indeed, there are specific challenges for releasing or publishing data, but this is a different matter that I do not address here, it would distract from the point I would like to make. I have clarified this in the text.

Throughout the paper, examples are drawn primarily from neuroscience (neuroimaging). If the paper is meant to address data-related practices more generally, we would recommend a broader set of examples. Otherwise, the focus on neuroscience (neuroimaging) should be made more explicit.

>>> I have made the focus on neuroscience explicit, note that the venue is the MNI (Montreal _Neurological_ Institute) open research.

Few examples are offered for how to implement this training at universities or how to address training beyond neuroimaging. While the cited online resources are useful it might be helpful to address how this could be built into research methods training and integrated within research workflows. Similarly, how can this shift be supported by universities and libraries?

>>> I have added some text on this. How to implement specific training in universities and how to go beyond neuroimaging is very dependent of the institutions. I mention now a couple of possibilities, but please consider that this is a *letter*, not a full fledge article, and therefore does not expand on these aspects.

In the section detailing next steps for publishers and editors, the difference between implementation of “data articles” as an article type and the enforcement of “data policies” such as data availability statements and the requirement that data adhere to disciplinary standards is confusion.

>>> I have clarified this section, please see if it is easier to read.

The discussion of standards for repositories versus datasets is confusing. Again the example of INCF and BIDS is very specific to neuroimaging without elaboration as to why this is a good model.

>>> This section simply points to the need for standards, both on the side of repositories and of datasets, because both are needed for data articles. I have clarified this. The reason that BIDS seems a good model is because it has allowed an increased efficiency for re-use of many datasets. I have added some clarification on this.
Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter advocates that data publishing in journals that mimic the peer-review and dissemination of process of articles will motivate the adoption of public data sharing. The letter includes some next steps to encourage data publishing which include training in tools and standards for data management and metadata, university support, funder mandates, and the growth of the “data article” in publishing platforms.

Overview: The letter is timely and relevant as scholarly publishing shifts dramatically with the emergence of open access and the opportunity for researchers to easily share products of their research including data and code that go beyond a static paper.

* Overall, there is some lack of clarity in the focus of the letter: whether it is intended to address data publishing at large or if it is intended mainly for neuroscience (or neuroimaging) research, some terms such as data sharing and publishing lack clear definitions, the challenges of data sharing beyond motivation and training are not addressed in detail, and next steps are somewhat vague regarding implementation. While the letter presents a reasonable overview and a significant recommendation for open science, it could be improved with more detail and focus.

>>> Thanks for the feedback. Because it is published in "MNIopenresearch" the audience is in the neuroscience (and specifically neuroimaging) research fields. I made this clearer in the text.

Rationale: The author encourages researchers to make their data available so that it can be assessed and reused by others. Specifically, he recommends that efforts to facilitate “data publishing” supersede those related to “data sharing” in order to motivate researchers and lend credibility, accessibility, and more rigorous review to data dissemination.

Unfortunately, because neither “data publishing” nor “data sharing” are explicitly defined in the text, the difference between them is not clear and it is left ambiguous how such a framing would substantially change researcher behavior. To strengthen the argument being made in this letter, we recommend that the author clearly define both practices in the first section.

>>> Thanks for the note, I have now defined what I meant by data publishing and data sharing in the text.

Differing views and opinions:
The letter addresses some of the challenges of data sharing but does not clearly address how data publishing would solve these challenges.

For example, the challenge of publishing both a paper on the scientific finding a dataset supports as well as a data paper on the dataset is not clear - is it expected both would be published around the same time? If this work is cited or reused, which paper should be cited? Similarly, if data is published then reused or extended, how should it be cited and assessed?

>>> That's an important question but not the one I address. The timing can be independent (before, at the same time, after) but my personal opinion is that it should come first.

Language:
Throughout the paper there are sentences and paragraphs that are confusing and ambiguous. We would recommend the author revise these for clarity.

One example at top of the “Some Practical Steps...” section: “What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?”

This sentence appears to be advocating for improving data management practices throughout the course of a research project. However, as it is currently written, it ambiguous how “acquisition, documentation, and curation” relate to “publishable research objects”. We recommend that the author make this connection explicit.

>>> The idea here is that a dataset that has not reached a certain level of quality in terms of documentation and curation as defined by the community should not be "publishable". I have clarified this in the text. Let me know if you still find the text unclear.

Supporting arguments:
The parallel to preprints is confusing. Though the growth of preprints may be transformative, the activities involves in writing and posting a preprint is very similar to those involved in writing and submitting a scientific paper. Facilitating the sharing or publishing of datasets, as outlined throughout this article, presents some entirely unique challenges to researchers, institutions, funders, and publishers.

>>> yes, but the parallel is only with respect to the peer review process: one can put any article on a pre-print archive system, but only a subset of these will be accepted as peer reviewed. Indeed, there are specific challenges for releasing or publishing data, but this is a different matter that I do not address here, it would distract from the point I would like to make. I have clarified this in the text.

Throughout the paper, examples are drawn primarily from neuroscience (neuroimaging). If the paper is meant to address data-related practices more generally, we would recommend a broader set of examples. Otherwise, the focus on neuroscience (neuroimaging) should be made more explicit.

>>> I have made the focus on neuroscience explicit, note that the venue is the MNI (Montreal _Neurological_ Institute) open research.

Few examples are offered for how to implement this training at universities or how to address training beyond neuroimaging. While the cited online resources are useful it might be helpful to address how this could be built into research methods training and integrated within research workflows. Similarly, how can this shift be supported by universities and libraries?

>>> I have added some text on this. How to implement specific training in universities and how to go beyond neuroimaging is very dependent of the institutions. I mention now a couple of possibilities, but please consider that this is a *letter*, not a full fledge article, and therefore does not expand on these aspects.

In the section detailing next steps for publishers and editors, the difference between implementation of “data articles” as an article type and the enforcement of “data policies” such as data availability statements and the requirement that data adhere to disciplinary standards is confusion.

>>> I have clarified this section, please see if it is easier to read.

The discussion of standards for repositories versus datasets is confusing. Again the example of INCF and BIDS is very specific to neuroimaging without elaboration as to why this is a good model.

>>> This section simply points to the need for standards, both on the side of repositories and of datasets, because both are needed for data articles. I have clarified this. The reason that BIDS seems a good model is because it has allowed an increased efficiency for re-use of many datasets. I have added some clarification on this.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 31 Jan 2019

Jean-Baptiste Poline, McGill University, Montréal, Canada

31 Jan 2019

Author Response

Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging ... Continue reading Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter advocates that data publishing in journals that mimic the peer-review and dissemination of process of articles will motivate the adoption of public data sharing. The letter includes some next steps to encourage data publishing which include training in tools and standards for data management and metadata, university support, funder mandates, and the growth of the “data article” in publishing platforms.

Overview: The letter is timely and relevant as scholarly publishing shifts dramatically with the emergence of open access and the opportunity for researchers to easily share products of their research including data and code that go beyond a static paper.

* Overall, there is some lack of clarity in the focus of the letter: whether it is intended to address data publishing at large or if it is intended mainly for neuroscience (or neuroimaging) research, some terms such as data sharing and publishing lack clear definitions, the challenges of data sharing beyond motivation and training are not addressed in detail, and next steps are somewhat vague regarding implementation. While the letter presents a reasonable overview and a significant recommendation for open science, it could be improved with more detail and focus.

>>> Thanks for the feedback. Because it is published in "MNIopenresearch" the audience is in the neuroscience (and specifically neuroimaging) research fields. I made this clearer in the text.

Rationale: The author encourages researchers to make their data available so that it can be assessed and reused by others. Specifically, he recommends that efforts to facilitate “data publishing” supersede those related to “data sharing” in order to motivate researchers and lend credibility, accessibility, and more rigorous review to data dissemination.

Unfortunately, because neither “data publishing” nor “data sharing” are explicitly defined in the text, the difference between them is not clear and it is left ambiguous how such a framing would substantially change researcher behavior. To strengthen the argument being made in this letter, we recommend that the author clearly define both practices in the first section.

>>> Thanks for the note, I have now defined what I meant by data publishing and data sharing in the text.

Differing views and opinions:
The letter addresses some of the challenges of data sharing but does not clearly address how data publishing would solve these challenges.

For example, the challenge of publishing both a paper on the scientific finding a dataset supports as well as a data paper on the dataset is not clear - is it expected both would be published around the same time? If this work is cited or reused, which paper should be cited? Similarly, if data is published then reused or extended, how should it be cited and assessed?

>>> That's an important question but not the one I address. The timing can be independent (before, at the same time, after) but my personal opinion is that it should come first.

Language:
Throughout the paper there are sentences and paragraphs that are confusing and ambiguous. We would recommend the author revise these for clarity.

One example at top of the “Some Practical Steps...” section: “What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?”

This sentence appears to be advocating for improving data management practices throughout the course of a research project. However, as it is currently written, it ambiguous how “acquisition, documentation, and curation” relate to “publishable research objects”. We recommend that the author make this connection explicit.

>>> The idea here is that a dataset that has not reached a certain level of quality in terms of documentation and curation as defined by the community should not be "publishable". I have clarified this in the text. Let me know if you still find the text unclear.

Supporting arguments:
The parallel to preprints is confusing. Though the growth of preprints may be transformative, the activities involves in writing and posting a preprint is very similar to those involved in writing and submitting a scientific paper. Facilitating the sharing or publishing of datasets, as outlined throughout this article, presents some entirely unique challenges to researchers, institutions, funders, and publishers.

>>> yes, but the parallel is only with respect to the peer review process: one can put any article on a pre-print archive system, but only a subset of these will be accepted as peer reviewed. Indeed, there are specific challenges for releasing or publishing data, but this is a different matter that I do not address here, it would distract from the point I would like to make. I have clarified this in the text.

Throughout the paper, examples are drawn primarily from neuroscience (neuroimaging). If the paper is meant to address data-related practices more generally, we would recommend a broader set of examples. Otherwise, the focus on neuroscience (neuroimaging) should be made more explicit.

>>> I have made the focus on neuroscience explicit, note that the venue is the MNI (Montreal _Neurological_ Institute) open research.

Few examples are offered for how to implement this training at universities or how to address training beyond neuroimaging. While the cited online resources are useful it might be helpful to address how this could be built into research methods training and integrated within research workflows. Similarly, how can this shift be supported by universities and libraries?

>>> I have added some text on this. How to implement specific training in universities and how to go beyond neuroimaging is very dependent of the institutions. I mention now a couple of possibilities, but please consider that this is a *letter*, not a full fledge article, and therefore does not expand on these aspects.

In the section detailing next steps for publishers and editors, the difference between implementation of “data articles” as an article type and the enforcement of “data policies” such as data availability statements and the requirement that data adhere to disciplinary standards is confusion.

>>> I have clarified this section, please see if it is easier to read.

The discussion of standards for repositories versus datasets is confusing. Again the example of INCF and BIDS is very specific to neuroimaging without elaboration as to why this is a good model.

>>> This section simply points to the need for standards, both on the side of repositories and of datasets, because both are needed for data articles. I have clarified this. The reason that BIDS seems a good model is because it has allowed an increased efficiency for re-use of many datasets. I have added some clarification on this.
Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter advocates that data publishing in journals that mimic the peer-review and dissemination of process of articles will motivate the adoption of public data sharing. The letter includes some next steps to encourage data publishing which include training in tools and standards for data management and metadata, university support, funder mandates, and the growth of the “data article” in publishing platforms.

Overview: The letter is timely and relevant as scholarly publishing shifts dramatically with the emergence of open access and the opportunity for researchers to easily share products of their research including data and code that go beyond a static paper.

* Overall, there is some lack of clarity in the focus of the letter: whether it is intended to address data publishing at large or if it is intended mainly for neuroscience (or neuroimaging) research, some terms such as data sharing and publishing lack clear definitions, the challenges of data sharing beyond motivation and training are not addressed in detail, and next steps are somewhat vague regarding implementation. While the letter presents a reasonable overview and a significant recommendation for open science, it could be improved with more detail and focus.

>>> Thanks for the feedback. Because it is published in "MNIopenresearch" the audience is in the neuroscience (and specifically neuroimaging) research fields. I made this clearer in the text.

Rationale: The author encourages researchers to make their data available so that it can be assessed and reused by others. Specifically, he recommends that efforts to facilitate “data publishing” supersede those related to “data sharing” in order to motivate researchers and lend credibility, accessibility, and more rigorous review to data dissemination.

Unfortunately, because neither “data publishing” nor “data sharing” are explicitly defined in the text, the difference between them is not clear and it is left ambiguous how such a framing would substantially change researcher behavior. To strengthen the argument being made in this letter, we recommend that the author clearly define both practices in the first section.

>>> Thanks for the note, I have now defined what I meant by data publishing and data sharing in the text.

Differing views and opinions:
The letter addresses some of the challenges of data sharing but does not clearly address how data publishing would solve these challenges.

For example, the challenge of publishing both a paper on the scientific finding a dataset supports as well as a data paper on the dataset is not clear - is it expected both would be published around the same time? If this work is cited or reused, which paper should be cited? Similarly, if data is published then reused or extended, how should it be cited and assessed?

>>> That's an important question but not the one I address. The timing can be independent (before, at the same time, after) but my personal opinion is that it should come first.

Language:
Throughout the paper there are sentences and paragraphs that are confusing and ambiguous. We would recommend the author revise these for clarity.

One example at top of the “Some Practical Steps...” section: “What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?”

This sentence appears to be advocating for improving data management practices throughout the course of a research project. However, as it is currently written, it ambiguous how “acquisition, documentation, and curation” relate to “publishable research objects”. We recommend that the author make this connection explicit.

>>> The idea here is that a dataset that has not reached a certain level of quality in terms of documentation and curation as defined by the community should not be "publishable". I have clarified this in the text. Let me know if you still find the text unclear.

Supporting arguments:
The parallel to preprints is confusing. Though the growth of preprints may be transformative, the activities involves in writing and posting a preprint is very similar to those involved in writing and submitting a scientific paper. Facilitating the sharing or publishing of datasets, as outlined throughout this article, presents some entirely unique challenges to researchers, institutions, funders, and publishers.

>>> yes, but the parallel is only with respect to the peer review process: one can put any article on a pre-print archive system, but only a subset of these will be accepted as peer reviewed. Indeed, there are specific challenges for releasing or publishing data, but this is a different matter that I do not address here, it would distract from the point I would like to make. I have clarified this in the text.

Throughout the paper, examples are drawn primarily from neuroscience (neuroimaging). If the paper is meant to address data-related practices more generally, we would recommend a broader set of examples. Otherwise, the focus on neuroscience (neuroimaging) should be made more explicit.

>>> I have made the focus on neuroscience explicit, note that the venue is the MNI (Montreal _Neurological_ Institute) open research.

Few examples are offered for how to implement this training at universities or how to address training beyond neuroimaging. While the cited online resources are useful it might be helpful to address how this could be built into research methods training and integrated within research workflows. Similarly, how can this shift be supported by universities and libraries?

>>> I have added some text on this. How to implement specific training in universities and how to go beyond neuroimaging is very dependent of the institutions. I mention now a couple of possibilities, but please consider that this is a *letter*, not a full fledge article, and therefore does not expand on these aspects.

In the section detailing next steps for publishers and editors, the difference between implementation of “data articles” as an article type and the enforcement of “data policies” such as data availability statements and the requirement that data adhere to disciplinary standards is confusion.

>>> I have clarified this section, please see if it is easier to read.

The discussion of standards for repositories versus datasets is confusing. Again the example of INCF and BIDS is very specific to neuroimaging without elaboration as to why this is a good model.

>>> This section simply points to the need for standards, both on the side of repositories and of datasets, because both are needed for data articles. I have clarified this. The reason that BIDS seems a good model is because it has allowed an increased efficiency for re-use of many datasets. I have added some clarification on this.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 13 Mar 2018

John Chodacki, California Digital Library, University of California Curation Center, Oakland, CA, USA

Approved

https://doi.org/10.21956/mniopenres.13832.r26093

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 06 Feb 2018

Nikola Stikov, NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montreal, Montreal, QC, Canada; Montreal Heart Institute, University of Montreal, Montreal, QC, Canada

Approved with Reservations

https://doi.org/10.21956/mniopenres.13832.r26068

This letter is very timely, convincing and informative. Scientific communication is undergoing a sea change, and data publishing will be at the center of the storm. JB Poline does an excellent job of drawing the public's attention to the burning issues of data publishing, such as the slow pace of adoption, the need for peer review of data, and the ways of incentivizing researchers to be more transparent with their data. The FAIR principles are explained, and the author highlights several data-sharing initiatives that adhere to these principles.

A letter of this length cannot possibly address all the issues concerning data publishing, nor dwell on the subtleties surrounding them. However, there are four points that I would like to see addressed or expanded in the letter:

1. The ethics and legal constraints of data sharing are explicitly mentioned, but the letter could do a better job of informing the community how (and why) to be pro-active about making data available. For example, it is mentioned that one obstacle to data sharing is its sensitive content, and I would argue that this is the biggest issue we are currently facing. Therefore, it would be good to provide a balanced perspective of the pros and cons of data sharing, emphasizing the need for thorough anonimization, and informing readers about consent forms that enable data sharing:

https://open-brain-consent.readthedocs.io/en/latest/

2. The issue of authorship in papers that re-use other researchers' data is also not addressed. For example, certain consortia require to be listed as co-authors on any papers that use their data, and this is a controversial practice. When discussing incentivization, the issue of consistently crediting data curators is almost as important as the issue of data publishing.

3. The author emphasizes bringing the peer-review process to data publishing, but I would take things one step further and argue that the data itself should drive the peer-review process of articles. Most journals require reviewers to evaluate static documents, but making it easy to look up the data, and possibly re-running certain analysis (using Jupyter notebooks, Binder, and similar technologies), will go a long way toward making research more transparent and reproducible.

4. Finally, technological advances in data visualization and interactivity (Plot.ly, Distill, etc.) will truly add value to the data publishing process. This, combined with the power of social media to create communities and build consensus, is the next wave of data curation, and I believe these trends deserve mention in the letter.

The letter could also benefit from a thorough proofread by a native speaker to address minor syntax and language transfer issues. That being said, I commend JB Poline for his efforts to transform data publishing, and I look forward to approving a revised version of this letter.

Is the rationale for the Open Letter provided in sufficient detail?

Yes
Does the article adequately reference differing views and opinions?

Partly
Are all factual statements correct, and are statements and arguments made adequately supported by citations?

Yes
Is the Open Letter written in accessible language?

Partly
Where applicable, are recommendations and next steps explained clearly for others to follow?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Quantitative magnetic resonance, myelin imaging, microstructural modeling

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (1)

Version 1

VERSION 1 PUBLISHED 24 Jan 2018

Discussion is closed on this version, please comment on the latest version above.

Reviewer Response 10 May 2018

Chris Gorgolewski

10 May 2018

Reviewer Response

I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full.

Interventions focused on academic publishing seem to ... Continue reading I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full.

Interventions focused on academic publishing seem to work. From the perspective of a public data repository (OpenNeuro.org) we are seeing many data submissions driven by a) existence of journals publishing data papers (for example Scientific Data) b) data sharing requirements (for example PLoS). It's far from mainstream, but things are changing in the right direction.
I wholeheartedly support the idea of promoting data sharing via data papers (data publishing). We have proposed a similar idea in the following manuscript https://www.frontiersin.org/articles/10.3389/fnins.2013.00009/full.

Interventions focused on academic publishing seem to work. From the perspective of a public data repository (OpenNeuro.org) we are seeing many data submissions driven by a) existence of journals publishing data papers (for example Scientific Data) b) data sharing requirements (for example PLoS). It's far from mainstream, but things are changing in the right direction.
Competing Interests: I published with the main author and we are members of the same working group. Close
Report a concern
Discussion is closed on this version, please comment on the latest version above.

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 2 (revision) 31 Jan 19			read
Version 1 24 Jan 18	read	read	read

Nikola Stikov, Polytechnique Montreal, Montreal, Canada; University of Montreal, Montreal, Canada
John Chodacki, University of California Curation Center, Oakland, USA
John Borghi, Stanford University, Stanford, USA

Ana E. Van Gulick, Carnegie Mellon University, Pittsburgh, USA

Comments on this article

All Comments(1)

Back to all reports

0 Views Cite this report Responses(0)

Approved

The revisions implemented in version 2.0 have added clarity and detail to this letter. It is a timely and relevant review of data sharing and data publishing practices and needs especially within the field of neuroscience.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Cognitive neuroscience, neuroimaging (MRI), research data management, data curation.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

0 Views Cite this report Responses(1)

Approved With Reservations

Is the rationale for the Open Letter provided in sufficient detail?

Partly
Does the article adequately reference differing views and opinions?

Partly
Are all factual statements correct, and are statements and arguments made adequately supported by citations?

Yes
Is the Open Letter written in accessible language?

Yes
Where applicable, are recommendations and next steps explained clearly for others to follow?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Cognitive neuroscience, neuroimaging (MRI), research data management, data curation.

Respond to this report

Responses (1)

Author Response

31 Jan 2019

Jean-Baptiste Poline, McGill University, Montréal, Canada

Hi, thanks for the feedback, I have put the responses inline preceeded by >>>

Summary: This letter addresses an important issue in the evolving scholarly communications landscape and in the emerging practices of open science, the uptake of data sharing and data publishing. The focus is on life sciences and neuroscience and the letter advocates that data publishing in journals that mimic the peer-review and dissemination of process of articles will motivate the adoption of public data sharing. The letter includes some next steps to encourage data publishing which include training in tools and standards for data management and metadata, university support, funder mandates, and the growth of the “data article” in publishing platforms.

Overview: The letter is timely and relevant as scholarly publishing shifts dramatically with the emergence of open access and the opportunity for researchers to easily share products of their research including data and code that go beyond a static paper.

* Overall, there is some lack of clarity in the focus of the letter: whether it is intended to address data publishing at large or if it is intended mainly for neuroscience (or neuroimaging) research, some terms such as data sharing and publishing lack clear definitions, the challenges of data sharing beyond motivation and training are not addressed in detail, and next steps are somewhat vague regarding implementation. While the letter presents a reasonable overview and a significant recommendation for open science, it could be improved with more detail and focus.

>>> Thanks for the feedback. Because it is published in "MNIopenresearch" the audience is in the neuroscience (and specifically neuroimaging) research fields. I made this clearer in the text.

Rationale: The author encourages researchers to make their data available so that it can be assessed and reused by others. Specifically, he recommends that efforts to facilitate “data publishing” supersede those related to “data sharing” in order to motivate researchers and lend credibility, accessibility, and more rigorous review to data dissemination.

Unfortunately, because neither “data publishing” nor “data sharing” are explicitly defined in the text, the difference between them is not clear and it is left ambiguous how such a framing would substantially change researcher behavior. To strengthen the argument being made in this letter, we recommend that the author clearly define both practices in the first section.

>>> Thanks for the note, I have now defined what I meant by data publishing and data sharing in the text.

Differing views and opinions:
The letter addresses some of the challenges of data sharing but does not clearly address how data publishing would solve these challenges.

For example, the challenge of publishing both a paper on the scientific finding a dataset supports as well as a data paper on the dataset is not clear - is it expected both would be published around the same time? If this work is cited or reused, which paper should be cited? Similarly, if data is published then reused or extended, how should it be cited and assessed?

>>> That's an important question but not the one I address. The timing can be independent (before, at the same time, after) but my personal opinion is that it should come first.

Language:
Throughout the paper there are sentences and paragraphs that are confusing and ambiguous. We would recommend the author revise these for clarity.

One example at top of the “Some Practical Steps...” section: “What do we need to do as a community to reconsider data acquisition, documentation and curation as critical activities and make these publishable research objects in peer reviewed venues?”

This sentence appears to be advocating for improving data management practices throughout the course of a research project. However, as it is currently written, it ambiguous how “acquisition, documentation, and curation” relate to “publishable research objects”. We recommend that the author make this connection explicit.

>>> The idea here is that a dataset that has not reached a certain level of quality in terms of documentation and curation as defined by the community should not be "publishable". I have clarified this in the text. Let me know if you still find the text unclear.

Supporting arguments:
The parallel to preprints is confusing. Though the growth of preprints may be transformative, the activities involves in writing and posting a preprint is very similar to those involved in writing and submitting a scientific paper. Facilitating the sharing or publishing of datasets, as outlined throughout this article, presents some entirely unique challenges to researchers, institutions, funders, and publishers.

>>> yes, but the parallel is only with respect to the peer review process: one can put any article on a pre-print archive system, but only a subset of these will be accepted as peer reviewed. Indeed, there are specific challenges for releasing or publishing data, but this is a different matter that I do not address here, it would distract from the point I would like to make. I have clarified this in the text.

Throughout the paper, examples are drawn primarily from neuroscience (neuroimaging). If the paper is meant to address data-related practices more generally, we would recommend a broader set of examples. Otherwise, the focus on neuroscience (neuroimaging) should be made more explicit.

>>> I have made the focus on neuroscience explicit, note that the venue is the MNI (Montreal _Neurological_ Institute) open research.

Few examples are offered for how to implement this training at universities or how to address training beyond neuroimaging. While the cited online resources are useful it might be helpful to address how this could be built into research methods training and integrated within research workflows. Similarly, how can this shift be supported by universities and libraries?

>>> I have added some text on this. How to implement specific training in universities and how to go beyond neuroimaging is very dependent of the institutions. I mention now a couple of possibilities, but please consider that this is a *letter*, not a full fledge article, and therefore does not expand on these aspects.

In the section detailing next steps for publishers and editors, the difference between implementation of “data articles” as an article type and the enforcement of “data policies” such as data availability statements and the requirement that data adhere to disciplinary standards is confusion.

>>> I have clarified this section, please see if it is easier to read.

The discussion of standards for repositories versus datasets is confusing. Again the example of INCF and BIDS is very specific to neuroimaging without elaboration as to why this is a good model.

>>> This section simply points to the need for standards, both on the side of repositories and of datasets, because both are needed for data articles. I have clarified this. The reason that BIDS seems a good model is because it has allowed an increased efficiency for re-use of many datasets. I have added some clarification on this.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

0 Views Cite this report Responses(0)

Approved

Well thought out framing of a key challenge in communicating research.

I would like to see more discussion of the basics: what is meant by "data" and why we want researchers to "publish".

I would expand on the role of language/terminology. As the author states, the phrase "data publishing" will help reframe the issues for researchers. However, one reason our current situation exists is because researchers do not understand the basics of what scholarly communications professionals mean when we use the words "data" or the phrase "the underlying data".

There is a lack of very basic understanding of how data publishing (or sharing) should be applied to their research. Additionally, many researchers stop at the "why?" question. Since research is journey, why would we want to capture single snapshots? Why are you asking me to share/publish?

Is the rationale for the Open Letter provided in sufficient detail?

Yes
Does the article adequately reference differing views and opinions?

Yes
Are all factual statements correct, and are statements and arguments made adequately supported by citations?

Yes
Is the Open Letter written in accessible language?

Yes
Where applicable, are recommendations and next steps explained clearly for others to follow?

Yes

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

0 Views

06 Feb 2018 | for Version 1

Nikola Stikov, NeuroPoly Lab, Institute of Biomedical Engineering, Polytechnique Montreal, Montreal, QC, Canada; Montreal Heart Institute, University of Montreal, Montreal, QC, Canada

0 Views Cite this report Responses(0)

Approved With Reservations

Is the rationale for the Open Letter provided in sufficient detail?

Yes
Does the article adequately reference differing views and opinions?

Partly
Are all factual statements correct, and are statements and arguments made adequately supported by citations?

Yes
Is the Open Letter written in accessible language?

Partly
Where applicable, are recommendations and next steps explained clearly for others to follow?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Quantitative magnetic resonance, myelin imaging, microstructural modeling

Respond to this report

Responses (0)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] Allison DB, Brown AW, George BJ, et al.: Reproducibility: A tragedy of errors. Nature. 2016; 530(7588): 27–9. PubMed Abstract | Publisher Full Text | Free Full Text

[2] Button KS, Ioannidis JP, Mokrysz C, et al.: Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013; 14(5): 365–76. PubMed Abstract | Publisher Full Text

[3] Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014. Publisher Full Text

[4] Eglen SJ, Marwick B, Halchenko YO, et al.: Toward standard practices for sharing computer code and programs in neuroscience. Nat Neurosci. 2017; 20(6): 770–73. PubMed Abstract | Publisher Full Text

[5] Gorgolewski KJ, Auer T, Calhoun VD, et al.: The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016; 3: 160044. PubMed Abstract | Publisher Full Text | Free Full Text

[6] Leitner F, Bielza C, Hill SL, et al.: Data Publications Correlate with Citation Impact. Front Neurosci. 2016; 10: 419. PubMed Abstract | Publisher Full Text | Free Full Text

[7] Owens B: DATA SHARING. Montreal institute going ‘open’ to accelerate science. Science. 2016; 351(6271): 329. PubMed Abstract | Publisher Full Text

[8] Poldrack RA, Baker CI, Durnez J, et al.: Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat Rev Neurosci. 2017; 18(2): 115–26. PubMed Abstract | Publisher Full Text

[9] Poldrack RA, Gorgolewski KJ: Making big data open: data sharing in neuroimaging. Nat Neurosci. 2014; 17(11): 1510–17. PubMed Abstract | Publisher Full Text

[10] Poldrack RA, Poline JB: The publication and reproducibility challenges of shared data. Trends Cogn Sci. 2015; 19(2): 59–61. PubMed Abstract | Publisher Full Text

[11] Poline JB, Breeze JL, Ghosh S, et al.: Data sharing in neuroimaging research. Front Neuroinform. 2012; 6: 9. PubMed Abstract | Publisher Full Text | Free Full Text

[12] Wilkinson MD, Dumontier M, Aalbersberg IJ, et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3: 160018. PubMed Abstract | Publisher Full Text | Free Full Text

From data sharing to data publishing

Abstract

Keywords

Disclaimer

Competing interests

Grant information

Acknowledgments

References

Comments on this article Comments (1)

Open Peer Review

Comments on this article Comments (1)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Competing Interests Policy

Stay Updated