Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Help LibGuides

Research Data Management: Storing and Preserving Data

Data storage and preservation at the University of Hull

See the University's File Storage Policy (April 2021) for a detailed overview of the provision of local disk drives, network storage, cloud storage options (OneDrive, Box) and guidance on the use of removable media (including USB sticks and external hard drives).

 

Box

 

Information about Box cloud storage for University of Hull staff and students (from ICT)

Box files can be shared with anyone who has an email account, internally or externally.  Max. upload 1TB per user per month. Files are backed up daily in a different facility; deleted files can be recovered from trash for up to 30 days.

Box is not suitable for preserving data beyond the lifetime of a project, as file ownership is tied to a live Hull ICT account, and file urls are not easily citeable.

FAQs from the University's Support Portal:

Sharepoint site for Worktribe: access,  guides and FAQs

All datasets created as part of a University of Hull research project must be recorded in Worktribe: the title, date of collection and location of files. The files themselves do not have to be uploaded.

Researchers who choose to deposit datasets in Worktribe can make the files discoverable through Repository@Hull if appropriate.  This method is untested with large file sizes.  The Repository will provide a citable url and reusable metadata, but it does not assign a DOI to the dataset.

Hydra Digital Repository

Hydra provides a public-facing platform for the University's digital research outputs, including datasets relating to research in a wide variety of disciplines.  Depositors can  customise the metadata (resource descriptions, licence terms etc), and will receive a stable url for citation, although not a DOI.  For data which is not intended for publication, access can be limited to members of the University, or set to private.

Contact repository@hull.ac.uk to discuss whether Hydra is a suitable repository for your research data.

Data Safe Haven

The University of Hull's Data Safe Haven is a secure research environment with appropriate technical and information governance controls for the storage and processing of sensitive research data, managed by the Hull Health Trials Unit.

The DSH is available to researchers from their own machines, wherever they are. Data resides within a secured data centre and is accessed via a secured connection meaning that all the activity occurs within the environment and not on the end user machine.

The DSH gives researchers an environment to store and analyse sensitive data without it ever leaving the security of the data centre. It is available to support activity across all areas of the University, for researchers with or without a budget.

A new research storage solution for the University is under consideration for 2021, with the goal of providing a single system to support the entire data lifecycle from generation to publication.

Backing up:

University of Hull filestores are backed up regularly, and earlier versions of files can be recovered. 

Anonymization:

The EU-funded OpenAIRE consortium has released Amnesia, a free web-based tool for removing identifying information from data

Encryption:

If you are working with sensitive data, you may need to encrypt that data, particularly if you are moving the data between systems, or sharing with others.

Further advice about backups, external hard drives and data encryption can be found at Get Safe Online, a UK government-funded initiative for businesses and the general public.


Best avoided:

Any USB flash drives left behind in PCs or sockets on campus will be destroyed, so don't rely on a USB stick for storing or transporting files you can't afford to lose.

Don't share files by email when you can use Box instead, to benefit from file encryption and daily backup.

Resources for University of Hull students:

The Digital Student guide to making the most of digital technology covers Online Safety and Security, including storing your data securely and protecting yourself from scams and phishing attacks.

Horizon allows students to use their own device to remotely connect to the PCs we have available on campus in teaching labs and open access areas, to access software and storage.

File management: best practice

open folder iconRecommended: a series of five short blog posts by Alistair Downie at the University of Cambridge, illustrating common pitfalls and solutions when managing files across the lifetime of a project and beyond.  Published December 2019.

Bite-sized RDM #1 - #5

Choosing a file format

The file formats best suited for long-term sustainability and accessibility:
• Are frequently used
• Have open specifications
• Are independent of specific software, developers or vendors.

Dutch national data archiving service DANS and the UK's Archaeology Data Service both maintain indexes of preferred (ideal) and accepted (good enough) file formats for each type of data: audio, video, text, geographical information and many more.

DANS File Formats

Archaeology Data Service Guidelines for Depositors

External data storage and preservation services

A selection of established external data storage platforms and repositories for scholarly research. The University of Hull has not formally endorsed these services.

https://datadryad.org/stash

A not-for-profit service supported by the University of California, which is free to use for data storage, but charges a base rate of $120 for publication (rising in line with file size). 

A number of publishers sponsor Dryad so that their authors can deposit and publish data associated with a journal article free-of-charge.  As part of the peer review process,  reviewers will be able to access deposited data privately before the paper is accepted.

Data deposited in Dryad is assigned a mandatory CC-0 licence,  ie. data can be re-used for any purpose, without attribution.

https://nerc.ukri.org/research/sites/environmental-data-service-eds/

The EDS consists of a network of five disciplinary centres (specializing in marine, freshwater, terrestrial, atmospheric and polar research), which preserve and disseminate data from environmental scientists in the UK and around the world.  Deposit in the EDS is mandatory for NERC-funded researchers (see the Policies tab); data generated from other funded projects in relevant disciplines will also be considered.

https://figshare.com/features

FAQs: https://knowledge.figshare.com/ 

A Figshare account is currently free-of-charge; the service is owned by Digital Science (founders of Altmetric.com), and endorsed by a number of large funders and commercial publishers.

Upload any file format up to 5GB, for private retention or sharing.  Receive a DOI in order to cite your data within your publications.  Replacing the file or editing the metadata (e.g. title, authors, linked url) automatically triggers a new version, for accountability.

Figshare users are encouraged to choose a CC-0 licence for their data, or CC-BY for content which has undergone further processing such as figures and filesets. Further information: https://knowledge.figshare.com/articles/item/copyright-and-license-policy

Altmetric data (indicators of social media attention) are collected for all Figshare deposits. Dataset records with the necessary metadata are crawled by Google Scholar: https://knowledge.figshare.com/articles/item/is-figshare-content-indexed-by-google-scholar

https://data.mendeley.com/

Now owned by Elsevier, Mendeley Data acts as a searchable index to data repositories as well as a preservation platform. Register for a free account with your University of Hull ID in order to create dataset records and deposit files, up to a maximum size of 10GB per dataset. You can choose to keep records in 'draft' form (private), share with collaborators, set an embargo or publish with a Creative Commons licence and a DOI.

Mendeley Data is stored in the EU, and archived in perpetuity with the Dutch Data Archiving and Network Services (DANS).

https://osf.io/

OSF is an open-source collaboration platform for researchers, which supports study design, literature reviews and data collection as well as file storage (both private and public). It is maintained by the US-based Center for Open Science, sponsored by federal agencies, private foundations, and commercial entities. You can choose to store files on a server in Germany to meet EU data protection standards.

OSF is free-to-use: private projects are capped at 5GB, and public projects at 50GB. It connects to a number of other platforms for file storage (e.g. Box, DropBox, Google Drive, GitHub) and research discovery (e.g. ORCID, Google Scholar, CrossRef), so that you can use it as a project management hub without necessarily storing all associated files in the same location.

See also OSF FAQs.

https://www.ukdataservice.ac.uk/deposit-data.aspx

Funded by the ESRC to "meet the data needs of researchers, students and teachers from all sectors", the UK Data Service preserves large scale social surveys and government data series for the long term.  The ReShare repository is provided for datasets arising from smaller scale projects,  especially ESRC-funded, although researchers from all disciplines are welcome to self-deposit via ReShare.

UKDS does not accept data which cannot be shared; you can set access restrictions so that only registered service users can access your content if you wish.

https://help.zenodo.org/features/

Zenodo is EU-funded; files and metadata are stored at CERN. Upload any file format up to 50GB, and receive a DOI. Zenodo supports DOI versioning: new deposits can be linked, and you can direct people to a specific version or the general DOI.

The depositor is permitted to choose the appropriate level of licence for their data (Creative Commons or another open licensing scheme).  Your funder or publisher may specify a licence.

Register a Zenodo Community in order to share working documents, data and publications with other researchers in your field.

FAQs for more information and tips.

Data journals are becoming more commonplace in the academic publishing landscape. Authors focus specifically on the dataset rather than the research outcomes, and provide details such as the context of the data collection, the choice of software environment and data processing decisions including file formats. There is no analysis or examination of the data but rather they provide a mechanism for describing and the data and its accessibility, and enable users to credit the dataset creators through citation and peer-review.  (Adapted from Research Data Management & Sharing, University of Strathclyde, 2020).

  • Data in Brief (Elsevier): launched in 2015, this title has now published over 5000 papers.  Editors explain the benefits of publishing your data in two videos, and the publishers provide a template for submission and guidance for authors.  The current charge for publishing in this title in $600 (March 2020).

See also the University of Edinburgh's list of Data Journals (2016), including several discipline-specific titles.

A number of publishers who are committed to open research incorporate storage options for data associated with papers,  e.g. ArXivF1000.

Re3data.org (Registry of Research Data Repositories) provides a searchable directory of validated data repositories for all disciplines and formats.

FAIRsharing Collections and Recommendations: browse by discipline, funder or publisher platform to identify their recommended data repositories and metadata standards. Compiled by a team based at the University of Oxford.

Metadata standards

It's important that any dataset you preserve for potential use by other researchers has enough descriptive information (metadata) for humans and search engines to understand what the files consist of.  Research areas which have a strong tradition of sharing data may have published discipline-specific metadata standards to facilitate the discovery and re-use of files.  

  • DataCite Schema:  developed by a UK, European and US working group formed from research councils and libraries, "a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions". Member organisations who adopt DataCite metadata standards are able to issue DOIs (permanent digital IDs) for deposited files.
  • The UK's Digital Curation Centre maintains a disciplinary directory of metadata standards.
  • DDI (Data Documentation Initiative): "an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences...  Documenting data with DDI facilitates understanding, interpretation, and use -- by people, software systems, and computer networks".