Targeted sequencing is often used to investigate a small subset of important genes, allowing an in-depth coverage of genomic areas of interest. For particular diseases, such as specific types of cancer, these genes can aid in prognosis and diagnosis, and better inform the course of treatment. In addition, this approach results in a more manageable dataset, compared to whole genome sequencing approaches.
In the first stage of this project, a computational pipeline was developed for the Dermatology Clinic at the University Hospital Zurich, to analyse a subset of 190 genes involved in melanoma. A number of existing bash scripts previously used to analyse over 100 samples were converted to a pipeline in the workflow manager Snakemake, and additional analysis modules were added to provide a more thorough analysis.
The conversion to a Snakemake pipeline simplifies the execution of the pipeline, brings more robustness to failure of individual steps, and aids in maintenance and modification by structuring the steps and separating the parameters in an external configuration file. Furthermore, to aid in the portability of the analysis pipeline, the pipeline was containerised and tested, before finally being deployed on the dermatology server and training given to the local users at the hospital.
The second stage of the project built on this by designing and developing a version of the pipeline for testing in different computing clusters across Switzerland. In order to do this, an additional analysis module was implemented, and an anonymous test dataset was prepared. Extensive 'in-house' testing was carried out prior to sharing with the community, and the documentation of the pipeline extended to reflect the extended analysis.
Melanoma analysis pipeline implemented in the Snakemake workflow manager