Pawsey Friday: Supercomputing Showcase
2:00pm - 3:30pm
Explore the breadth of benefits supercomputing can offer to you and your project. You will hear from researchers who are using Pawsey supercomputers, and working with our expert staff to enhance their code and workflows.
This is a way to showcase diversity of thought and research when using supercomputers. You will leave the session understanding why or if supercomputers or its applications are for you via a showcase of research using supercomputers.
As a tradition our normal Pawsey Fridays were followed by a sundowner session, In these trying times, we would invite you to bring your drink and snacks while enjoying the presentation. A Q&A session will be held at the end of the discussion.
About the presenters
Mahsa Mousavi Mousaviderazmahalleh
Mahsa finished her PhD in applied bioinformatics at UWA. She is currently a post-doctoral researcher in TrEnD lab at Curtin University. She works on an NHMRC funded project to determine the causative agents of tick-borne disease in Australia using metabarcoding techniques. As part of her role in TrEnD, she also develops bioinformatic pipelines for analysis of environmental DNA data which is one of the major research focuses in TrEnD lab.
A workflow for analysis of environmental DNA (eDNA) data
Environmental DNA coupled with metabarcoding is revolutionising the way biodiversity can be monitored across a variety of environment. However, the large number of tools deployed in downstream bioinformatic analyses often place a challenge in configuration and maintenance of a workflow, and consequently limits the research reproducibility.
Here, we describe a fully automated pipeline that employs a number of state of the art applications to processes eDNA data from raw sequences to generation of zero-radius operational taxonomic units (ZOTUs) and their abundance tables. This pipeline is based on Nextflow and Singularity which enable a scalable, portable and reproducible workflow using software containers on a local computer, clouds and high-performance computing (HPC) clusters.
Paul is a Research Fellow at Curtin University, where he researches and teaches. His main research interests include: explosive transient events and radio variability, space situational awareness with the Murchison Widefield Array, and software that supports and enables astronomy research. By developing new tools to automate data collection and processing, source finding, and data analysis, Paul’s goal is to better understand the information that we get from past, current, and future instruments so that we can more reliably detect the rare and interesting objects that lead to new knowledge.
Porting workflows to a HPC environment
Paul’s uptake project focused on porting a workflow that had been developed on a desktop system to one that could work in an HPC environment. The initial workflow (named Robbie) was designed to process multiple images of radio astronomy data, identify radio sources, and extract light curves for each. Robbie v1.0 was orchestrated using Make, which quickly became cumbersome to use, to debug, and to deploy on other systems.
After an intensive training session at Pawsey, Paul’s group was able to use Nextflow and Singularity containers to create v2.0 of Robbie which can now be deployed on Pawsey systems. Additional support from Pawsey meant that the scope of their research projects could be massively expanded as they are now able to process orders of magnitude more data than before.
Dr. Monica Kehoe is a Plant Virologist and Molecular Plant Pathologist working for the Western Australian Department of Primary Industries and Regional Development (WA DPIRD) in the diagnostic and laboratory services section. Her current work focuses mainly on the development, validation and use of molecular methods for plant disease diagnostics across a broad range of broadacre and horticultural crops.
Incorporating supercomputing resources into diagnostic workflows
Monica’s ultimate goal is to incorporate supercomputing resources into the diagnostic workflow. Towards this aim, the 2018 uptake project successfully ported a workflow to Pawsey that reduced the time to a complete virus genome from weeks, to just hours. The first project to benefit from this improved workflow was a Wine Australia funded project looking at the diversity of Grapevine leafroll virus 1 and Grapevine leafroll virus 3 in Western Australia, and then using the resulting information to create new field-ready diagnostic tools for industry. This pipeline was built specifically for Illumina sequencing data, and Monica is now working on porting the analysis of Nanopore sequencing data for diagnostics to Pawsey as well to help improve the turnaround time to result for these analyses.