Historic interest: What parts of biological research can DISC serve
The impact of the various elements of the DISC research support expertise can be seen in the following figure (to be updated now that the Netherlands eScience Center and SURF/SARA/BigGRID must be considered part of DISC):
DISC aims to support at least four of the seven stages that are identified in data intensive 'e-science' projects:
- All investigations start with a scientific question, but in the e-science era those are not necessarily circumscript hypotheses that need validation or falsification in experiments.
- They are more and more 'open discovery' type questions that require the generation of large and complex data sets with a variety of techniques, and combining the newly acquired data with existing scientific knowledge. Such projects bring very specific technical as well as information technology related challenges that frequently go beyond the in-house expertise of individual research groups or even consortia. Experimental groups can involve DISC experts in the design phase of such complex experiments to discuss issues of data capture, metadata, sample sizes etc. Also the planning of the required compute, storage and network infrastructure can be done much more efficiently together with DISC experts.
- During data acquisition DISC experts should be 'stand by' in case data related challenges may occur. Most of the work in this phase will be done by the subject experts, potentially in collaborations with DTL.
- The actual capture and storage of raw data involves many complex challenges, such as linking of measurements with different techniques on identical subjects (e.g. genomics and proteomics), data capture standards for metadata etc. Decisions on storage of raw data versus preprocessed data require specific expertise, as well as data reduction without loss of core information.
- Preprocessing and downstream processing of data to turn them into a format in which they can be used for data analysis, integration with the final aim to create meaningful information from the raw data is also a more and more professional process and has many pitfalls. DISC experts should assist during this phase in ensuring good data stewardship but most importantly the processing of the data for the next crucial step
- actual data analysis, and especially after integration with other crucial legacy and or de novo data into models that may explain the 'biology behind the data'.
- the goal is to gain novel biological insights. As indicated by the colour scheme in the picture, the pure hardware and connectivity infrastructure will be provided in DISC by the partners that coordinate this aspect and provide the HPC and networking infrastructure in the Netherlands.
Although (1) and (7) are depicted here as purely in the realm of the experimental scientists (green) we hope that more and more scientists will structurally involve DISC experts in the entire life cycle of their research. A discussion point is even if DISC should also embrace and develop expertise in 'valorisation' (societal application) of findings. This may sound farfetched and out of scope, but it has been mentioned by for instance CTMM, that for instance clinical trials will again have serious data challenges, which may have to be anticipated in much earlier phases. At this point we believe that the individual 'life cycle owners' are in charge of deciding whether the data related aspects of such downstream implementation issues are part and parcel of 'their' DISC sector or not. It might wel be conceivable that it is relevant for Translational Medicine, but not for e.g. Precision Breeding.
- Continue to: Work packages running projects together