Genomics of West Australian Flora

Pawsey resources have supported and enabled high-level bioinformatic analyses across a wide range of conservation-focused projects within Biodiversity & Conservation Science, DBCA. Research published in 2021 that used Pawsey supercomputers included projects which: • Identified the putative source populations for multiple introductions into Australia of the invasive weed Passiflora foetida, which will assist in the development of biological control agents (Hopley et al., 2021). • Provided a reassessment of the taxonomy of the Rufous Fieldwren species complex based on morphological and genetic variation across its range, assisting in clarifying conservation listing for island subspecies (Burbidge et al., 2021). • Isolated sex-linked genetic markers for the Ghost Bat to assist in molecular sexing of non-invasive samples, further refining the method for genetic monitoring of this vulnerable species (Ottewell et al., 2021). • Investigated the impacts of data quality filtering thresholds on genotype-environment association analyses, a commonplace analytical method to identify molecular markers under natural selection (Ahrens et al., 2021). A Master’s Thesis was completed investigating “Conservation genetics and population modelling to secure wild populations of djoongari (Pseudomys fieldi)” by Rebecca Quah (UWA). Many other projects are currently in data analysis phase, variously investigating the phylogeography, conservation genetics and diversity of Western Australian flora and fauna.
Person

Principal investigator

Kym Ottewell kym.ottewell@dbca.wa.gov.au
Magnifying glass

Area of science

Conservation
CPU

Systems used

Magnus, Zeus and Nimbus
Computer

Applications used

Stacks, SOAPdenovo, BEAST, Bwa, Samtools, Mothur, R, Python, fastQC, Trimmomatic, fastStructure, Migrate, PEMA, YAMP, MegaHit, BayPass, Vcftools, Singularity, Docker, Nextflow, Ipyrad, IQ-tree
Partner Institution: Department Biodiversity, Conservation and Attractions | Project Code: pawsey0220

The Challenge

Genomic sequencing data for non-model organisms is becoming increasingly accessible and as sequencing technology advances, genomic datasets are becoming larger and computational requirements more intensive. Typical of bioinformatic workflows, our challenges are working with large datasets containing many ‘observations’ (DNA sequences) from large numbers of individuals (typically 100 to several hundred individuals). Many of our analytical techniques require raw and intermediate data to be held in memory as DNA sequences are sorted, organized and compared within and between individuals, requiring intensive memory and computational resources to achieve this. Desktop computing resources are either inadequate for many tasks (do not have sufficient memory) or are slow, taking weeks to months to complete individual analyses that are only a small part of an analytical workflow.

The Solution

Computational resources that offer large memory, multiple processors and enable multi-threading are required for many of our analyses.

The Outcome

Access to Magnus and Zeus supercomputers has provided us with the computational resources to complete bioinformatic analyses efficiently. Zeus, in particular, has supplied access to high memory computational nodes that enable us to complete computationally-intensive genome assembly steps within timeframes of days (rather than weeks or months). Magnus and Zeus both offer multi-threading which has also increased the output of analyses as computational steps can be arrayed across multiple processors and nodes. The recent switch to use of containers on supercomputing has also provided us the ability to customize and stabilize the software environment for a number of our analytical steps and assisted us in producing reproducible workflows. Our research capacity has grown enormously with the use of Pawsey resources as indicated by the number and diversity of research projects currently being undertaken in our organisation.

List of Publications

1. Hopley, T., Webber, B. L., Raghu, S., Morin, L., & Byrne, M. (2021). Revealing the introduction history and phylogenetic relationships of Passiflora foetida sensu lato in Australia. Frontiers in Plant Science, 1453.
2. Burbidge, A. H., Dolman, G., Ottewell, K., Johnstone, R., & Burbidge, M. (2021). Genetic and morphological relationships of fieldwrens (Calamanthus): implications for conservation status and management. Emu-Austral Ornithology.
3. Ottewell, K., Thavornkanlapachai, R., McArthur, S., Spencer, P. B., Tedeschi, J., Durrant, B., … & Byrne, M. (2020). Development and optimisation of molecular assays for microsatellite genotyping and molecular sexing of non-invasive samples of the ghost bat, Macroderma gigas. Molecular Biology Reports, 47(7), 5635-5641.
4. Ahrens, C. W., Jordan, R., Bragg, J., Harrison, P. A., Hopley, T., Bothwell, H., … & Rymer, P. D. (2021). Regarding the F‐word: The effects of data filtering on inferred genotype‐environment associations. Molecular Ecology Resources, 21(5): 1460-1474

Figure 1. Ghost bat, Macroderma gigas, from Featherdale, NT. Non-invasive genetic monitoring approaches are being developed for this vulnerable threatened species. Photo credit: Nicola Hanrahan

Figure 2. Bettongia lesueur at Matuwa-Kurrara Kurrara Indigenous Protected Area, WA. This species persists in three natural island populations off the coast of WA and has been reintroduced to the mainland across a series of fenced feral predator-free reserves. Genomic data is being used to assess the genetic diversity of the natural and reintroduced populations to assist with future conservation management. Photo credit: Judy Dunlop
Figure 3. Geleznowia amabilis K.A.Sheph. & A.D.Crawford near Kalbarri, WA. This species was newly described in 2020 and is one of several members of a species complex. Genomic data is being used to clarify taxonomic boundaries in the complex, with analyses run on Zeus. Photo credit: Benjamin Anderson