The American Journal of Botany thus requires all authors to archive the data, code, and any other information integral to the published research but not contained within the paper itself. This policy also applies to custom software described in the paper. Whenever possible the scripts and other artefacts used to generate the analyses presented in the paper should also be publicly archived. Data must be archived by the time of publication.
Paper 1
The authors provide the character matrix and character states for all the features they measure and the taxa evaluated here. This data sets were included directly in the paper as Appendices. They described the methods, but did not include a script or more detailed protocol.
Paper 2
The authors provide a link to GenBank for all the sequences. However, this link is no longer available. They described the methods, but did not include a script or more detailed protocol.
Paper 3
The authors provided a link to the University of Stirling repository, were the data collected during both experiments are available. The data are presented in the form of
.csvfiles. [An]Rscript with information about the analyses and figures generation is [also] provided in the repository. However, the script lacks descriptive commentary on the steps followed.
Paper 4
The raw data for these analyses were not provided in the paper. However, a Dryad repository was created to store the results from the pAUC analysis. [No analysis scripts provided].
Paper 5
A matrix containing anatomical data for 147 species (out of the 275 mentioned in the paper) was made available on Github. The phylogeny was not provided. Scripts for nearly all of the analyses, adapted from another paper, were also published on Github.
Data and code are important products of scientific enterprise, and they must be preserved and remain accessible in future decades.
For manuscripts that depend on new or existing data, and/or on code written by the authors, Ecology Letters requires that this material be supplied and accessible to editors and reviewers at the time of submission, and permanently archived in an accessible repository before publication.
Ecology Letters requires that the raw data (or subset of existing data) used to generate the results in the paper are archived in public repositories such as: Dryad, Figshare, Hal, Zenodo, NERC Environmental Data Service, OSF, US federal agency repositories, Environmental Data Initiative (EDI), DataONE, or a similar repository which assigns permanent unique DOIs.
Computer code used to produce the figures and conduct analyses or simulations must also be archived in a public repository (e.g., Zenodo, Figshare). Code should not be uploaded with your submission as a supporting document. All code must be annotated so readers can understand what each segment or function does
Note that Ecology Letters also includes a board of ~dozen Data Editors
Paper 1
Full data set available on Figshare; analysis code also on Figshare.
Paper 2
Full data set and code available on Zenodo.
Paper 3
Full data set on Zenodo. Code for Microbiome DNA sequencing (BLAST) and perhaps network construction [is present], rest is likely missing. I’m not fully sure, since there are a large number of files and no clear documentation
Paper 4
Full dataset and code available on Zenodo.
Paper 5
Full dataset and code available on Zenodo.
Open data means sharing data and is made more meaningful by the term “FAIR” i.e. data that is findable, accessible, interoperable and reusable). Being more open with data means you can analyze other researcher’s findings and reuse it to inform new findings, and the research landscape becomes more efficient and accountable.
You will benefit from making your research data openly available too. More transparent data helps reviewers see how researchers went from data to analysis and provide nuanced feedback based on their own understanding of the data. When researchers make their data open, they are showing the world they are working transparently and reproducibly, building a strong reputation that will help them throughout their professional life, encouraging reuse and citation.
All research- and synthesis-based articles must include a Data Availability Statement, whether or not the data used in the article is shared.
Paper 1
Data and code available on Figshare
Paper 2
Data and code available on Dryad
Paper 3
Data available at https://srtm.csi.cgiar.org/srtmdata/ and code available at https://github.com/DipeshDFRS/Snow_leopard. “The R code has analyses, I believe.”
Paper 4
Data and code available in a Github repository
Paper 5
Raw data on Figshare; analyses in supplement.
Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. The British Ecological Society thus requires, as a condition for publication, that all data supporting the results in papers published in its journals are archived in an appropriate public archive offering open access and guaranteed preservation. For theoretical papers the underlying model code must be archived. […] The archived data must allow each result in the published paper to be recreated and the analyses reported in the paper to be replicated in full to support the conclusions made. Authors are welcome to archive more than this, but not less
Paper 1
Data archived in EDI (Environmental Data Initiative) data portal; no analysis code provided
Paper 2
Data archived in Dryad; no analysis code provided
Paper 3
Data archived in Dryad; no analysis code provided
Paper 4
Plant data in Northeastern University Data Portal; Microbial sequence data in NIH/NCBI Data Portal; no analysis code provided
Paper 5
Data archived in Dryad; no analysis code provided
State of the field 10 years ago:
Anyone want to use their methodology for last 10 years?
“This article describes four foundational principles – Findability, Accessibility, Interoperability, and Reusability – that serve to guide data producers and publishers as they navigate around these obstacles, thereby helping to maximize the added-value gained by contemporary, formal scholarly digital publishing.
Importantly, it is our intent that the principles apply not only to ‘data’ in the conventional sense, but also to the algorithms, tools, and workflows that led to that data.”
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
“These high-level FAIR Guiding Principles precede implementation choices, and do not suggest any specific technology, standard, or implementation-solution; moreover, the Principles are not, themselves, a standard or a specification.
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
From our Activity 2 submissions, we saw that ecologists and evolutionary biologists frequently upload their data (including code) on a variety of places – supplement/appendix to paper, Dryad, Figshare, Zenodo, Institutional repositories, Github, etc.
DOI (Digital Object Identifier)Within ecology and evolution, Dryad, Figshare, and Zenodo are commonly used archival repositories (for non-sequence data).
For certain types of data (e.g. long sequence reads, individual barcode sequences, protein sequences or 3D builds of proteins), there are established databases that you should become familiar with.
If you use any of these, you’re doing great.
If you are worried about which to use, look to your journal to understand the norms in your field.
Don’t use Github (or Gitlab) as an “Archive”
csv files instead of xls files for spreadsheets