7- Interoperability of data formats enables data-driven research

From dtls
Jump to: navigation, search

This is one of thirteen recommendations for Data Stewardship as formulated by the Netherlands E-Science Centre.

The E-science centre writes

Storing heterogeneous data, in specialist or inaccessible formats, with insufficient metadata and in autonomous databases is in contradiction to good data-stewardship practices. Data and its location must be readily identifiable, searchable and accessible. However it is recognised that existing and future data-formats and standards (where they exist) vary across disciplines and universal solutions are unlikely to be developed.

What DTL recommends for the Data Stewardship plan

Answer the following questions:

  • Are you dealing with structured or unstructured data? What do you need to do to make data computer-readable?
  • Are you using standard ontologies for your (meta-)data? Are these ontologies open?
  • What format are you using to store your data? Is it a standard in your field, or a vendor-specific format?
  • Do you expect to produce data that will not fit the standard format? How will you store that data: will you communicate with the standards committee or make your own extensions?
  • Many data formats have a minimal and a recommended level of meta-data, how complete will your data set be?
  • What will the license to your data be?
  • How many "stars" will your data get in http://5stardata.info ?
  • What other resources will your data be linked to?

Experience from DTL

  • F6 files should be archived in low-tech format -> raw and result data in text format (fq,vcf)
  • F7 how does this data format work? -> add explanations of data columns to ##header (partial)

Sector specific

Specific per technology