FAQ: Data retention, deposit and availability

1. Does the requirement apply to me?

The Data retention, deposit and availability requirement (hereafter the requirement) applies to researchers and trainees (students and postdoctoral researchers) who receive funding from the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), or the Social Sciences and Humanities Research Council of Canada (SSHRC) ("the Agencies") in the form of a grant, chair, award or fellowship. The funding recipient is responsible for ensuring compliance with the requirement. All Canadian researchers and trainees, regardless of funding support, are encouraged to abide by the requirement.

2. Do I have to comply with the requirement if I am leading or participating in research that has multiple funding sources? What if I am co-author of an article or my institution has its own data policy?

If an Agency grant, chair, award or fellowship provides funding, in whole or in part, for a research project, the recipient of the funding must comply with the requirement. This applies when the research is also supported by one or more non-Agency source(s) of funding, and regardless of whether the funding recipient is the main author or a co-author on a resulting preprint or journal article.

3. What are the differences between data retention, deposit and availability?

Data retention refers to the storage and management of research data after completion of the active phase of a research project. It does not necessarily involve making the data discoverable or accessible to others.

Data deposit refers to the selection and transfer of research data into a data repository. Data deposit is a form of data retention that additionally supports discovery, appropriate access and potential reuse.

Data availability refers to the ability of humans and machines to access research data through well-defined mechanisms, with clear information about the conditions under which access can be obtained. Data being "available" does not necessarily mean "open to everyone immediately", but rather that there is a clear, documented pathway to obtain the data, which can be via immediate download or a formal application process. Making data available is about removing unnecessary barriers to the data while respecting legitimate restrictions.

4. What does "as open as possible, as closed as necessary" mean?

"As open as possible, as closed as necessary" is a guiding principle to help researchers navigate decisions around making research data available. It emphasizes that research data should be shared openly whenever possible, but that there are legitimate reasons for why some data cannot be open. "As open as possible" means defaulting to openness and maximizing what can be shared even when full datasets cannot be released publicly. It also means applying the least restrictive licence and access conditions that circumstances allow, and time-limiting restrictions through embargoes rather than permanent closure when appropriate. "As closed as necessary" means imposing well-justified restrictions on who, how, when and for what purposes others can access some data. Such reasons may include honouring participant consent, respecting ethical principles, and protecting patentable discoveries.

5. Why should I retain, deposit and make my data available?

Retaining and depositing data for open discovery, appropriate access and potential reuse comes with many benefits to researchers. First and foremost is the ability of researchers to easily access, understand and reuse their own data years after having completed a research project. By making the data available to others, researchers can also build new collaborations and expand the impact of their research. Depositing data, and assigning data with a persistent identifier, helps to ensure that researchers are properly credited when others use their data. These practical benefits also serve fundamental principles of research integrity and efficiency. Data are critically important research outputs that support research transparency, reproducibility, transferability and productivity. The Tri-Agency Framework: Responsible Conduct of Research emphasizes that research integrity hinges on keeping complete and accurate records of data, methodologies and findings in a manner that allows verification and replication by others – practices that data retention, deposit and availability directly support. By engaging in these practices, Canadian researchers promote responsible research and research excellence.

6. What data do I have to retain, deposit and make available?

The requirement to retain data applies to all Agency-supported research.

The requirements to deposit and make data available apply to the data supporting findings in peer-reviewed journal articles and preprints arising from Agency-supported research. Only one deposit is required for the datasets of all authors; e.g., by the corresponding author on the article or any other designated person.

Research outputs other than peer-reviewed journal articles (e.g. books, monographs, non-peer-reviewed journal articles) are outside the scope of the data deposit and availability requirements. However, researchers and trainees are encouraged to deposit and make available data associated with these research outputs when appropriate.

Caption text
Research output Data retention Data deposit Data availability
Peer-reviewed journal article required required required
Other types of publications required recommended recommended
Unpublished research required recommended recommended

When publishing findings from Agency-funded research in peer-reviewed journal articles and preprints, authors must deposit and make available the minimal dataset required to support and replicate these findings. This minimal dataset does not need to include the entire raw dataset collected during the research project, only the portion relevant to the specific study's findings. Where useful, authors should deposit and make available processed data and/or the code used to generate processed data, in addition to raw data.

7. How can I retain my data?

Data retention can be achieved through the use of national storage platforms (e.g. Nextcloud), institutionally approved long-term storage services (e.g. institutional infrastructure, servers or cloud services), or locally managed infrastructure (e.g. local servers, hard drives) that align with national, institutional, and disciplinary standards. Data repositories can also support long-term retention of data, though oftentimes they require that data be made available to others, whether publicly or with access restrictions. Retention practices should ensure that data remain securely stored and maintained for the duration of the retention period (10 years).

8. How can I deposit my data?

To deposit data, select a trusted, discipline-specific, generalist, or institutional data repository. As a first step, consult guidance on characteristics of trusted repositories in the Canadian context (note: this guidance is being produced by the Data Repository Expert Group of the Digital Research Alliance of Canada). Consider using the global registry re3data.org to find an appropriate data repository, many of which are free-to-use and include Canadian-hosted options. Whenever possible, prioritize Canadian repositories (Canadian-owned servers physically located in Canada) that provide persistent identifiers (DOIs), and allow you to apply clear licensing (e.g. Creative Commons licences) and attribution to enable others to responsibly reuse and cite your data.

Canadian national research data repositories, some discipline-specific and/or generalist repositories, are available for Canadian researchers to deposit and make research data available. These include the Digital Research Alliance of Canada's Federated Research Data Repository and Borealis, the Canadian Dataverse Repository.

Prepare data for deposit keeping the FAIR principles in mind: ideally, include non-proprietary file formats (e.g. TXT, CSV, JSON) that facilitate reuse, and original/proprietary file formats (e.g. RDATA, MAT, SAV, SAS) that facilitate reproducibility. Provide complete metadata (title, author, description, dates, methodology, etc.) when prompted, and include data documentation (README, data dictionary, codebook, etc.), for others to understand your data. Always review and follow legal, ethical, and institutional policies–including privacy obligations and Indigenous data governance principles (e.g. CARE, OCAP®) before depositing and publishing data.

9. How can I make my data available?

Data can be made available in several ways, depending on the considerations and requirements that apply to the dataset(s). When there are no ethical, legal, cultural or contractual limitations on making data available, data must be shared publicly. When such limitations exist, data access may need to be restricted to defined groups or for permitted uses. For some sensitive data, access should not be provided under any circumstance.

Data repositories typically offer built-in mechanisms to make data available to others, and most data repositories facilitate public access to deposited data. Some repositories also support controlled access features, which allow metadata about the dataset to remain visible while restricted files are released only after a request is reviewed and approved.

For data that are subject to access controls, whether in a repository or retained on other secure infrastructure, documentation is needed that describes how data access requests are received, the conditions under which access may be granted, and how secure access will be provided. Designating responsibility to a defined group, such as a data access committee, can help ensure that requests are reviewed and processed consistently, and that access can be maintained over time.

10. Do all data need to be made publicly available?

No. Research data that directly support the findings in peer-reviewed journal articles and preprints arising from Agency-supported research must be made available according to the principle "as open as possible, as closed as necessary". This means that non-sensitive data must be made publicly available without restrictions. However, sensitive data must be safeguarded via controlled access management. In some instances, researchers can apply an embargo on the public release of deposited datasets; see Question 19 What should I do if I want to reuse my data to publish another article?

11. What information do I have to include in my data availability statement?

Data availability statements in peer-reviewed journal articles and preprints must include the location where data are deposited and a link to these data via a persistent identifier (e.g. a DOI). If all or parts of the data cannot be deposited or made publicly available, authors must specify why deposit is not possible, why access is restricted, and describe how access to the data can be provided, including:

  • a brief description of the data;
  • eligibility criteria for access;
  • conditions for use if access is granted;
  • requirements and steps necessary to make a request.

If access cannot be provided to the data, authors must indicate why this is the case (e.g. consent for data sharing not obtained, abiding by contractual obligations, protecting Indigenous data sovereignty). Additional guidance on data availability statements is available here.

12. What should I do if my data are sensitive?

All human participant data must be managed in compliance with the terms of participant consent, the principles of TCPS, and any requirements or guidance issued by research ethics boards. Provincial and federal laws regarding personal information must be followed. Other forms of data may also be sensitive, such as ecological data regarding species at risk, data relating to patents, or licensed data. When in doubt, researchers should seek guidance from RDM experts at their institution's libraries and/or their institution's ethics and legal offices.

Sensitive data on humans, here defined as "data that cannot be shared without potentially breaking the law, or violating the trust of or risking harm to an individual, entity, or community" (SDEG, 2025) need to be safeguarded. Any data that could directly or indirectly lead to information being linked to individuals who have not explicitly given permission for their personal information to be disclosed should be regarded as sensitive. It may be possible to share parts of a sensitive dataset (e.g. certain variables that directly support research findings) without putting participants at risk. Potentially sensitive data should undergo a risk assessment process to determine their level of sensitivity before being made available and should only be made publicly available if the risk is negligible. If it is not possible to ensure that risk is negligible, data can be shared by request only using controlled access management.

Sensitive data should be deidentified and/or anonymized, as appropriate, before being made publicly available. Deidentified data are data that have had identifiers removed. Anonymized data are data where all potential linkages between the data and the identifiers are irrecoverable, or any identifiable information has been permanently destroyed. Anonymous data – where direct identifiers were not collected (e.g. online survey data) – may still require deidentification to deal with potential indirect identifiers. For further guidance on sensitive data, risk assessment, and data deidentification, refer to the Sensitive Data Toolkit by the Sensitive Data Expert Group.

In some cases, at the direction of research ethics boards, particularly sensitive data may be exempt from the retention requirement (e.g. video data that are difficult to deidentify).

Additional resources:

13. What should I do if my data are qualitative?

Qualitative data are still data, and every effort should be made to follow the spirit of the data retention, deposit and availability requirement, particularly with regards to the principle "as open as possible, as closed as necessary". If qualitative research data directly support research findings in peer-reviewed journal articles or preprints arising from agency funding, they need to be at minimum retained if no suitable repository exists in which to deposit the data. Data retention does not necessarily involve making the data openly discoverable or accessible to others (see Question 3 What are the differences between data retention, deposit and availability?).

Non-sensitive qualitative data must be retained, deposited and made publicly available. Sensitive qualitative data must be retained, deposited when suitable repositories exist, and made available via access control. If research participants have not consented to having their identity made public, data must be deidentified and/or anonymized. Refer to Question 12 What should I do if my data are sensitive? for guidance on making sensitive qualitative data available.

14. What should I do if I have "big data"?

The requirement also applies to large datasets and "big data" (extremely large and complex datasets that are difficult or impossible to manage and process with traditional software and tools). If possible, researchers should deposit their data in a repository that accepts large datasets, such as the Federated Research Data Repository. If the data are too large to transfer and deposit, consult with RDM experts at your institution and with service providers (e.g. the national computing host sites supported by the Digital Research Alliance of Canada). They can help you retain your data and make your metadata and data documentation publicly available where possible.

15. What should I do if my data pertain to Indigenous research?

The Tri-Agency RDM Policy clearly states that, in the case of Indigenous research, Indigenous Peoples, communities and their governance structures or relevant community representatives must determine if and how data are collected, used, retained, deposited, preserved, safeguarded and made available. These considerations should be addressed at the onset of a research project, via a data management plan that is co-created with concerned Indigenous rights-holders and interested parties.

Useful resources for research data management in the context of Indigenous research:

16. What is data documentation and how does it differ from structured metadata?

Data documentation provides important information about a dataset so it can be understood and reused without contacting its creator. Data documentation typically includes information such as the original purpose of the data, collection methods, instruments and software used, file structure, variable descriptions, and data processing steps. A data dictionary and/or codebook is often included as part of the documentation to define variables, codes (e.g. abbreviations), and measurement units. Data documentation is usually presented in one or multiple text files (e.g. README) accompanying the data.

Structured metadata, in contrast, is information embedded within or associated with the dataset and is usually required by data repositories to enable discovery, interoperability and machine readability. Authors are asked to enter metadata in specific fields when depositing data in a repository. Well-structured metadata follows one of several standardized formats and supports the FAIR principles by ensuring that datasets can be indexed, discovered and integrated across systems.

In short, data documentation explains the context and content of the data for human understanding; whereas metadata provides concise, standardized descriptors that facilitate data use and interoperability, and improve long-term usability.

17. What information do I have to include in the README file associated with my data?

A README file should provide sufficient context for readers to understand and reuse a dataset without contacting the data creator(s).

At a minimum, the README file must include:

  • an explanation of variables, units of measurement, codes or abbreviations (e.g. to indicate missing data)
  • any statistical transformations used, anonymization processes
  • a link to any relevant research output(s) (e.g. a preprint or journal article), as applicable.

For qualitative data, the README file must also contain important details such as interview protocols, transcription conventions, and other contextual information that is essential to aid interpretation.

Authors are encouraged to go beyond these minimums by including the dataset title, author name(s), contact information, collection dates and licensing terms. Authors are also encouraged to describe the original purpose of the data, research context, data structure (e.g. tabular data, text, shape files), and file formats; explain the data collection methods, instruments used, software, and data processing steps; document any file naming conventions, ontologies, and controlled vocabularies used; list required software and versions; note known issues, missing data, and limitations; and provide citation instructions.

Note: when data cannot be deposited and no metadata are created, the encouraged elements above become required, as the README is the sole source of this information for anyone accessing the dataset.

Additional resources on best practices, guidance, and templates:

18. Do I retain intellectual property over data that have been deposited and/or made available?

In most cases, intellectual property rights over research data are retained by the researcher or the institution where the research was conducted. Researchers are generally considered the creators and stewards of their data, and in Canada, data produced as part of employment or under a grant are usually subject to institutional ownership policies. When data are deposited and/or made available, ownership is not automatically transferred; however, the terms of use and access are governed by institutional policies, funder requirements, or agreements with data repositories.

When data are made available in a repository, permissions for reuse are typically granted through a licence selected by the depositor (e.g. Creative Commons licences). This specifies how the data can be accessed, reused and cited, but does not remove intellectual property rights unless explicitly stated. Authors should review institutional policies and funding requirements before depositing data to ensure the chosen licence complies with their responsibilities and obligations.

19. What should I do if I want to reuse my data to produce another research output (e.g. publish another article)?

Authors wanting to retain priority of access to their data can request an embargo on data availability from the journal where the article is published and/or the data repository where the data are deposited. Many data repositories offer a one-year, no-questions-asked embargo. The Agencies allow embargoes on data availability for up to three years; longer embargoes can be requested by using the contact information at the bottom of this page (note: this information appears at the bottom of the RDM Policy FAQ on science.gc.ca, where these FAQ will be posted). Examples of valid reasons for requesting an embargo include wanting to reuse the data to publish other research outputs and maintaining priority of access to data that students will be using to complete their thesis. An embargo on data availability is not an embargo on deposit; the deposit requirement is maintained.

20. Are there costs associated with retaining, depositing and making data available?

Research data management requires time and can generate costs, especially for projects involving large datasets. However, time invested in proper RDM planning and execution at the onset of a research project typically leads to efficiencies later in the research project. Many solutions are available at no cost to Canadian researchers for the purpose of retaining, depositing and making their data available. Refer to Questions 5, 7, 8, 9 for additional information.

Research data management costs are eligible costs of research when associated specifically to a project funded by the Agencies. They are also an eligible use of Research Support Funds when associated more broadly to institutional supports for RDM.

21. What supports are available to help me retain, deposit and make my data available?

Institutional support

  • Refer to your institution's RDM strategy for supports that your institution may have in place.
  • Librarians and data specialists in your institution's library can help you plan for the active storage, documentation, curation and retention of your data. They can also guide you in selecting an appropriate repository for depositing data and advise on best practices for making your data available, including metadata creation.
  • Research offices can provide guidance on funder requirements, including on data management plans and data custodianship.
  • For research involving human participant data, your institution's human research ethics board can advise on TCPS requirements for consent, privacy, and confidentiality.

National research data repositories

  • Borealis is a national, multi-disciplinary repository supported by over 80 post-secondary institutions across Canada for researchers to deposit, preserve and share their datasets. Many academic libraries provide hands-on support for research data curation and deposit into Borealis.
  • The Federated Research Data Repository (FRDR) is a national repository for depositing and sharing large datasets, which is hosted by the Digital Research Alliance of Canada. FRDR is designed to support researchers who need to deposit large or complex datasets that may exceed the storage or technical capabilities of institutional data repositories.

Training and educational resources

22. What should I do if I am (re)using someone else's data?

Reusing someone else's data, also known as "secondary use of data", involves analyzing data originally collected by others for a different purpose. It requires making any derived data (the subset or transformed dataset resulting from your analysis) available, ideally with any associated code. Specifically, you must deposit and make available any derived data and associated data documentation when transforming, combining, or otherwise altering pre-existing research data. If derived data cannot be deposited and made available because of a reuse licence or other restrictions, deposit and share the code and steps necessary to generate the derived data. The creator(s) of the data should be credited in the data documentation and/or metadata according to the licence applied.

23. What are the Agencies doing to value data as an important research contribution?

As signatories to the San Francisco Declaration on Research Assessment (DORA), the agencies are committed to ensuring that a wide range of research outputs are considered and valued as part of the research assessment process (see CIHR, NSERC, and SSHRC guidance for merit review). Guidelines for reviewing the tri-agency CV also reflect this commitment to a more inclusive, diverse and holistic approach to excellence in research evaluation. Information on the implementation of DORA recommendations is available for each agency (CIHR, NSERC, SSHRC).

24. How will the Agencies verify that I have complied with the Policy?

The Agencies will actively monitor compliance with the requirement and contact the individuals and/or institutions concerned in the event of a suspected breach of the requirement. If a breach of the requirement is identified and remains unaddressed, the Agencies may take steps outlined in the Tri-Agency Framework: Responsible Conduct of Research or other actions deemed necessary to address the breach. Failure to comply with the requirement may result in corrective actions that could negatively impact an individual's eligibility to apply for or receive Agency funding. In the case of an alleged breach of the requirement identified by third parties, and which cannot be resolved by contacting the data creator(s), individuals can contact their central point of contact responsible for RCR at the relevant institution to initiate a complaint. The process for addressing an alleged failure to comply with Agency policies throughout the life cycle of a research project are outlined here.

25. How does the requirement relate to the Tri-Agency Policy on Open Access to Publications?

The objective of the Tri-Agency Open Access Policy on Publications is to ensure that all Agency-funded, peer-reviewed research articles are immediately and freely available online to the research community, readers in the public, private and not-for-profit sectors, and the general public. The data retention, deposit and availability requirement of the Tri-Agency RDM Policy complements the Open Access Policy on Publications by promoting responsible stewardship of the data associated with these publications. Both policies support the effective and responsible conduct of research and increase the ability to store, find and reuse research outputs.

The responsibilities of recipients of CIHR funding regarding publication-related Research Data outlined in the Tri-Agency Open Access Policy on Publications (2015) remain in effect on all grants awarded prior to XX MONTH YEAR (date to be determined).

26. How should researchers consider and incorporate research security into their RDM planning?

See Question r in current FAQ for the data deposit requirement on science.gc.ca.