Skip to content

Research Data Management: Storing and Preserving Data

“Open data can underpin innovation, for example when researchers with fresh perspectives use data in unexpected ways or when companies use data to help them develop new products.”

Data storage and preservation at the University of Hull

If you will be handling personal information, read the guidance on Managing Sensitive Data before making a decision about where to store your files.

Written for University of Hull students, the Digital Student guide to making the most of digital technology covers Online Safety and Security, including storing your data securely and protecting yourself from scams and phishing attacks.

Support for OneDrive from the University of Hull's ICT Services.

OneDrive is a networked file storage system offering 3TB of storage space per person. It's part of the Microsoft Office 365 suite of services available to all University of Hull staff and students, which can be accessed from any device, in any location with internet access.

Files are encrypted in transit and located in EU datacentres for compliance with Data Protection regulations. OneDrive is regularly backed up and recoverability of lost files can be initiated.

Files located in OneDrive can be shared with specific people within or outside the University(via their email address), anyone at the University of Hull, or anyone with the link (in which case the link expires after 30 days). You can choose whether people you share with are permitted to edit the files, or simply view them.

cautionOneDrive is not suitable for sharing files with a large number of project collaborators, or preserving files after a project is completed, as file ownership is tied to a specific account. If you are leaving the University, you should make arrangements to move files to a new location before your account is closed.

What happens to my OneDrive when I leave the University?

Support for users of Teams, from University of Hull ICT Services.

Microsoft Teams is a cloud-based communication and collaboration platform, which integrates with other MS Office products and supports working together on files. If you are undertaking a project with multiple collaborators over a significant period of time, a Team may be more suitable than OneDrive, as all Team members can upload files, and view/edit/download files added by other members.

cautionHowever, Teams is not a solution for preserving files after completion of a project, because the ownership of the Team is tied to a live University of Hull account.

University of Hull staff can request a new Team from ICT. Students should ask their academic supervisor to request a Team on their behalf.

ICT provide centralised network storage for all University members, hosted and managed within the University data centre. Your 'home drive' (normally G:\) quota is 2GB. Files are backed up regularly, and the backup is retained for 6 months.

Access to the University network is controlled by user authentication, but files are not encrypted, so this is not an appropriate storage platform for personal or commercially sensitive data.

The Horizon client enables students to use their own device to remotely connect to the PCs we have available on campus in teaching labs and open access areas, to access University network software and filestore.

All staff and students are advised to use OneDrive as your primary filestore to benefit from vastly increased storage capacity, access from any networked device, and compliance with EU Data Protection regulations.

Sharepoint site for Worktribe: access, guides and FAQs

If your research project is logged in Worktribe, any datasets which have potential value after the completion of the project should be recorded as project outputs. Each record must include the title of the dataset, date of collection and location of the files. The files themselves can be added to the record; however, Worktribe's suitability for preserving large file sizes and proprietary formats is untested.

Datasets deposited in Worktribe can be made discoverable through Repository@Hull if appropriate.  The Repository will provide a citable url and reusable metadata, but it does not assign a DOI to the dataset.

Information about Box cloud storage for University of Hull staff and students (from ICT)

Box is currently in the process of being decommissioned and the University supports the use of Microsoft OneDrive and Teams to facilitate cloud storage and collaborative working.

Box files can be shared with anyone who has an email account, internally or externally.  Store up to 1TB free of charge, with file upload limit 15GB. Files are backed up daily in a different location (adhering to the EU Safe Harbour Framework); deleted files can be recovered from trash for up to 30 days.

cautionBox is not suitable for preserving data beyond the lifetime of a project, as file ownership is tied to a live Hull ICT account, and file urls are not easily citeable.

FAQs from the University's Support Portal:

Hydra Digital Repository

Hydra was the University of Hull's original repository for datasets and other research outputs, before the launch of Worktribe in 2017. It is still in use for research theses and exam papers, but no other new deposits are accepted.

If you have a stake in datasets which remain in Hydra, contact repository@hull.ac.uk to discuss next steps.

Data Safe Haven

The University of Hull's Data Safe Haven is a secure research environment with appropriate technical and information governance controls for the storage and processing of sensitive research data, managed by the Hull Health Trials Unit.

The DSH is available to researchers from their own machines, wherever they are. Data resides within a secured data centre and is accessed via a secured connection meaning that all the activity occurs within the environment and not on the end user machine.

The DSH gives researchers an environment to store and analyse sensitive data without it ever leaving the security of the data centre. It is available to support activity across all areas of the University, for researchers with or without a budget.

A new research storage solution for the University is under consideration for 2022-23, with the goal of providing a single system to support the entire data lifecycle from generation to publication.

Anonymization:

The EU-funded OpenAIRE consortium has released Amnesia, a free web-based tool for removing identifying information from data

Encryption:

If you are working with sensitive data, you may need to encrypt that data, particularly if you are moving the data between systems, or sharing with others.

Further advice about backups, external hard drives and data encryption can be found at Get Safe Online, a UK government-funded initiative for businesses and the general public.

You may be familiar with the cloud-based services Dropbox and Google Drive for sharing files. Be mindful that these services may store files in jurisdictions outside the European Economic Area, which is a breach of UK Data Protection legislation for personal data. As these services are not supported by the University, standards for file recovery and preservation may be subject to change without notice.


Any USB flash drives left behind in PCs or sockets on campus will be destroyed, so don't rely on a USB stick for storing or transporting files you can't afford to lose.


Don't share files as email attachments when you can use OneDrive instead, to benefit from file encryption and daily backup.

See the University's File Storage Policy (April 2021) for a detailed overview of the provision of local disk drives, network storage, cloud storage options, and guidance on the use of removable media (including USB sticks and external hard drives).

If your project will involve the introduction of IT systems or technology into the University, you should submit an ICT Project Request as early as possible in the planning process, so that questions of security, data protection and technical compatibility can be assessed before implementation.

Choosing a file format

The file formats best suited for long-term sustainability and accessibility:
• Are frequently used
• Have open specifications
• Are independent of specific software, developers or vendors.

It may be appropriate to deposit copies of your data in more than 1 format to maximise both utility and preservation (for instance, .xls and .csv).

Many established data repositories maintain directories of  preferred (ideal) and accepted (good enough) file formats for each type of data: tabular, audio, video, text, geographical information and others:

UK Data Service Recommended File Formats

DANS File Formats (from the Dutch national data repository, one of the largest in the world)

Archaeology Data Service Guidelines for Depositors

The global Open Preservation Foundation has carried out an international comparison of recommended file formats.

Digital repositories which offer a mediated deposit service may permit certain less sustainable formats for file transfer, then migrate the data to an open format on your behalf:  for instance, The National Archives (UK).

File management: best practice

open folder iconRecommended: a series of five short blog posts by Alistair Downie at the University of Cambridge, illustrating common pitfalls and solutions when managing files across the lifetime of a project and beyond.  Published December 2019.

Bite-sized RDM #1 - #5

External data storage and preservation services

A selection of established external data storage platforms and repositories for scholarly research. The University of Hull has not formally endorsed these services.

If your data relates to a project in Worktribe, you should also create a Worktribe output record for the data, to signpost the external location. It is not necessary to upload the files to Worktribe.

Dryad Home

A not-for-profit service supported by the University of California, which is free to use for data storage, but charges a base rate of $120 for publication (rising in line with file size). 

A number of publishers sponsor Dryad so that their authors can deposit and publish data associated with a journal article free-of-charge.  As part of the peer review process,  reviewers will be able to access deposited data privately before the paper is accepted.

Data deposited in Dryad is assigned a mandatory CC-0 licence,  ie. data can be re-used for any purpose, without attribution.

Environmental Data Service

The EDS consists of a network of five disciplinary centres (specializing in marine, freshwater, terrestrial, atmospheric and polar research), which preserve and disseminate data from environmental scientists in the UK and around the world.  Deposit in the EDS is mandatory for NERC-funded researchers (see the Policies tab); data generated from other funded projects in relevant disciplines will also be considered.

Figshare Features and FAQs

A Figshare account is currently free-of-charge; the service is owned by Digital Science (founders of Altmetric.com), and endorsed by a number of large funders and commercial publishers.

Upload any file format up to 5GB, for private retention or sharing.  Receive a DOI in order to cite your data within your publications.  Replacing the file or editing the metadata (e.g. title, authors, linked url) automatically triggers a new version, for accountability.

Figshare users are encouraged to choose a CC-0 licence for their data, or CC-BY for content which has undergone further processing such as figures and filesets. Figshare's Copyright and Licence Policy.

Altmetric data (indicators of social media attention) are collected for all Figshare deposits. Dataset records with the necessary metadata are crawled by Google Scholar:

A GitHub repository enables you to store and collaborate on project code. When you're ready to make your work public, GitHub's choosealicence.com helps you decide what licence is appropriate for your open source software.

Now owned by Elsevier, Mendeley Data acts as a searchable index to data repositories as well as a preservation platform. Register for a free account with your University of Hull ID in order to create dataset records and deposit files, up to a maximum size of 10GB per dataset. You can choose to keep records in 'draft' form (private), share with collaborators, set an embargo or publish with a Creative Commons licence and a DOI.

Mendeley Data is stored in the EU, and archived in perpetuity with the Dutch Data Archiving and Network Services (DANS).

OSF is an open-source collaboration platform for researchers, which supports study design, literature reviews and data collection as well as file storage (both private and public). It is maintained by the US-based Center for Open Science, sponsored by federal agencies, private foundations, and commercial entities. You can choose to store files on a server in Germany to meet EU data protection standards.

OSF is free-to-use: private projects are capped at 5GB, and public projects at 50GB. It connects to a number of other platforms for file storage (e.g. Box, DropBox, Google Drive, GitHub) and research discovery (e.g. ORCID, Google Scholar, CrossRef), so that you can use it as a project management hub without necessarily storing all associated files in the same location.

See also OSF FAQs.

Depositing Data with the UK Data Service

Funded by the ESRC to "meet the data needs of researchers, students and teachers from all sectors", the UK Data Service is both a portal to national and international socioeconomic macrodata, and a data repository for researchers in quantitative and qualitative social sciences and humanities.

Register for a free account to deposit your data in ReShare: you can choose an open licence, or 'safeguard' your data for registered users only, to be used for research and teaching purposes. Deposit in ReShare is mandatory for ESRC-funded projects.

Zenodo is EU-funded; files and metadata are stored at CERN. Upload any file format up to 50GB, and receive a DOI. Zenodo supports DOI versioning: new deposits can be linked, and you can direct people to a specific version or the general DOI.

The depositor is permitted to choose the appropriate level of licence for their data (Creative Commons or another open licensing scheme).  Your funder or publisher may specify a licence.

Register a Zenodo Community in order to share working documents, data and publications with other researchers in your field.

FAQs for more information and tips.

Data journals are becoming more commonplace in the academic publishing landscape. Authors focus specifically on the dataset rather than the research outcomes, and provide details such as the context of the data collection, the choice of software environment and data processing decisions including file formats. There is no analysis or examination of the data but rather they provide a mechanism for describing and the data and its accessibility, and enable users to credit the dataset creators through citation and peer-review.  (Adapted from Research Data Management & Sharing, University of Strathclyde, 2020).

  • Data in Brief (Elsevier): launched in 2015, this title has now published over 5000 papers.  Editors explain the benefits of publishing your data in two videos, and the publishers provide a template for submission and guidance for authors.  The current charge for publishing in this title in $600 (March 2020).

See also the University of Edinburgh's list of Data Journals (2016), including several discipline-specific titles.

A number of publishers who are committed to open research incorporate storage options for data associated with papers,  e.g. ArXivF1000.

Re3data.org (Registry of Research Data Repositories) provides a searchable directory of validated data repositories for all disciplines and formats.

FAIRsharing Collections and Recommendations: browse by discipline, funder or publisher platform to identify their recommended data repositories and metadata standards. Compiled by a team based at the University of Oxford.

For guidance on choosing an open licence for your data, see the recommended reading for the question Who Owns My Data?

Metadata standards

It's important that any dataset you preserve for potential use by other researchers has enough descriptive information (metadata) for humans and search engines to understand what the files consist of.  Research areas which have a strong tradition of sharing data may have published discipline-specific metadata standards to facilitate the discovery and re-use of files.  

  • DataCite Schema:  developed by a UK, European and US working group formed from research councils and libraries, "a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions". Member organisations who adopt DataCite metadata standards are able to issue DOIs (permanent digital IDs) for deposited files.

  • The UK's Digital Curation Centre maintains a disciplinary directory of metadata standards.

  • FAIRSharing: Standards: a searchable directory of controlled vocabularies, identifier schemas and reporting guidelines for every academic discipline, maintained by the Data Readiness Team at the University of Oxford.  

Archiving websites

If you have created a project website outside the University's web domain, this should be preserved like any other research output after the completion of the project. If you have been relying on project funding to pay hosting fees, you should consider archiving your website in one of the dedicated online services.

new                    A blog post from the University of Kent Research Support Team (June 2021) offers a detailed overview of your options.