In the context of nation-wide collaborative research projects and initiatives, data pipelines should be interoperable between different clusters within Switzerland. Several technologies are available to help share pipelines and make them work in different HPC environments; workflow managers such as Snakemake can be combined with containerisation technologies including Docker and Singularity.
The approach of “code moving to data”, rather than moving data to computing infrastructures, is becoming increasingly necessary, in part due to projects with sensitive data, as well as those with ever-increasing data sets. The aim of this project was to assess whether containerisation could address this need of sharing and running pipelines in a reproducible manner on different HPC clusters across Switzerland, and as such involved a number of project partners. A secondary goal was to create a community around these container technologies in order to facilitate development, deployment and running of these containers.
The project resulted in three tangible outcomes: a putative tested technology stack to run the same workflow in different HPC clusters in Switzerland, a set of guidelines for pipeline interoperability using the Docker and Singularity container technologies, and a validation procedure for testing different technology stacks in the context of interoperable workflows. In addition, the project also resulted in the creation of a highly interactive community of container developers, deployers and maintainers across the Swiss research landscape, thereby facilitating the future exchange of pipelines for collaborative research projects.
Swiss Personalized Health Network and the Swiss
Technology stack to run the same workflow in different Swiss HPC clusters
Validation procedure for testing technology stacks