7- Interoperability of data formats enables data-driven research
This is one of thirteen recommendations for Data Stewardship as formulated by the Netherlands E-Science Centre.
The E-science centre writes
- Storing heterogeneous data, in specialist or inaccessible formats, with insufficient metadata and in autonomous databases is in contradiction to good data-stewardship practices. Data and its location must be readily identifiable, searchable and accessible. However it is recognised that existing and future data-formats and standards (where they exist) vary across disciplines and universal solutions are unlikely to be developed.
What DTL recommends for the Data Stewardship plan
Answer the following questions:
- Are you dealing with structured or unstructured data? What do you need to do to make data computer-readable?
- Are you using standard ontologies for your (meta-)data? Are these ontologies open?
- What format are you using to store your data? Is it a standard in your field, or a vendor-specific format?
- Do you expect to produce data that will not fit the standard format? How will you store that data: will you communicate with the standards committee or make your own extensions?
- Many data formats have a minimal and a recommended level of meta-data, how complete will your data set be?
- What will the license to your data be?
- How many "stars" will your data get in http://5stardata.info ?
- What other resources will your data be linked to?
Experience from DTL
- F6 files should be archived in low-tech format -> raw and result data in text format (fq,vcf)
- F7 how does this data format work? -> add explanations of data columns to ##header (partial)