Pawsey Internship Alumni: Celebrating Achievement!
Since 2005, more than 200 students have spent more than 2000 weeks researching and contributing to computational algorithms, data science, cloud computing, and more – through the Pawsey Summer Internship Programmes.
The story does not stop at the 2000 hours…
… These Alumni keep on giving through:
- Activities “close to home”, e.g., participating as Pawsey Intern Mentors and Intern poster judges, and
- Activities in the wider community, through their contributions to science, industry, government, education and more.
Pawsey Intern Alumni are recognised and celebrated here. Click on a profile below to view short summaries, or search using the filters.
If you’re a Pawsey Alumni and you would like to have your profile included, contact email@example.com. Let’s celebrate you!
The project investigated the use of GPU-acceleration in molecular dynamics (MD) simulations of complex biomolecular systems. This was done using the AMBER suite of molecular simulation programs, in which a fast GPU MD simulation engine, pmend.cuda, has been developed such that the entirety of the MD calculation is performed on the GPU while the CPU core only drives the simulation.
Polyhydroxyalkanoates (PHAs) are a family of microbially-made polyesters that are meant to quickly degrade in the environment, but this degradation is reliant on microbially-secreted PHA depolymerases, whose taxonomic and environmental distribution have not been well-defined. As a result, the impact of increased PHA production and disposal on global environments is unknown. This Intern Project searched the global databases for metagenomes to analyze the distribution of PHA depolymerase genes in microbial communities from diverse aquatic, terrestrial and waste management systems.
In 2020, a paper entitled ‘Latent Space Phenotyping: Automatic Image-Based Phenotyping for Treatment Studies’ was published in Plant Phenomics (https://spj.sciencemag.org/journals/plantphenomics/2020/5801869/). The paper outlines a novel alternative to traditional image analysis methods for phenotyping without the need for complex and bespoke image analysis pipelines. The source code has been made available (https://github.com/p2irc/lsplab). The project was developed in Python, using Tensorflow and leverages nVidia GPUs (CUDA/cuDNN). This Intern Project looked at modernising that project on supporting infrastructure (a supercomputing environment or a cloud environment).
Semantic change detection (SCD) is an important problem for many industries. For example, rail network operators need to identify early warning signs of deteriorating conditions of supporting structure to avoid derail accidents. Where existing detection problems aim to recognise and locate known objects, SCD aims to recognise and locate characteristics of objects that deviate from what is expected. SCD is a challenging machine learning task. This Intern Project set up baselines for advanced research in SCD, by performing and analysis and then comparing several deep learning-based change detection approaches recently proposed in the literature, such as those based on Generative Adversarial Networks (GANs), Deep Convolutional Autoencoders (CAEs), and Long Short Term Memory (LSTM) models.
Batten disease is a group of genetically inherited, neuro-degenerate diseases, most of which start in early childhood. Batten is always fatal, usually in the late teens or twenties and there is no treatment to reverse or halt disease progression. The overall aim of this Intern Project was to use molecular dynamics (MD) simulations, that combined with wet-lab experiments, to improve understanding of recently discovered lead molecules to treat Batten. In addition, the simulations will assist to develop technology to screen for more lead molecules.
Soccer is a popular sport around the world. Effective soccer analytics could improve team performance regardless of resources by maximizing the potential of existing players, analysing opponents’ game strategies, and highlighting players’ features, which are usually undervalued in predicting winning. In this Intern Project, we explored machine learning (ML) approaches to design feature extraction and prediction models for context-aware and adaptive Soccer Analytics. We explored three types of context-aware machine learning approaches: Bayesian network, Decision Tree, and Deep Neural Networks; and one advanced adaptive machine learning approach: Forest Deep Neural Network (fDNN)
Understanding electron transfer in ion-atom collisions is essential for a variety of applications, ranging from astrophysical processes, such as solar wind and nuclear fusion, to modern cancer treatment techniques like hadron therapy. The goal of this Intern Project was to perform accurate calculations of differential cross sections for electron capture in high-energy (MeV regime) proton-helium collisions using a semiclassical wave-packet convergent close-coupling (WP-CCC) method recently developed in our group [Alladustov et al 2019 Phys. Rev. A 99 052706].
The Intern Project explored applications of deep generative models in geoscientific model development. Applying deep learning algorithms to parameter estimation problems has received active interest from both academia and industry in recent years. Modern deep neural networks allow for fast reconstruction of various subsurface properties with a sufficient degree of accuracy. At the same time, realistic models require large datasets for training, which are not always possible to obtain from real data. Using deep generative models can significantly improve the performance. The project used the latest development in deep learning algorithms and worked with real data.
Genomic prediction has been a staple of plant and animal breeding, but ‘classic’ approaches cannot suitably model large datasets or complex additional datasets, such as time-series weather data. ‘Modern’ approaches such as the construction of Artificial Neural Networks can be highly effective for prediction of complex data. The Intern Project built ensembled Artifical Neural Networks that took already existing genomic and weather data in two streams to calculate phenotypic predictions. The input data was divided into training, testing and hold-out sets, where the neural network was built upon training and testing data and ideally could accurately predict phenotypes from the hold out set. Once built the Neural Networks were further optimised, taking advantage of the GPU backend for optimal hyper-parameter selection, to improve the phenotypic predictions in crop breeding.
The transfer of ions across liquid-liquid interfaces is key to many technological applications, such has heavy metal extractions and sensing. Although, there is a phenomenological understanding of how this process occurs, there is no detailed molecular picture of the ion transfer process in the presence of electric fields. The group has recently developed a method to correctly simulate the effect of an external electric field in heterogenous systems, which was tested on the interfaces between two immiscible liquids, water and 1,2-Dichloroethane (DCE) (the most commonly used system for sensing applications). This Intern Project continued this work by studying the properties of the water/DCE interface in the presence of electrolytes on both sides of the interfaces, focusing on determining how the structure of the interface changes with the concentration of the electrolytes and on computing the transfer potential of various ions from one liquid phase to the other.
This Intern Project focused on computational and theoretical physics based on high-performance computing, specifically implementing a new parallelization framework for the Monte Carlo simulation computer code and performing relevant modeling calculations. In the Project, Reese studied charged particle transport in a hydrogen-helium plasma, which is particularly relevant to fusion research. The present version of the code was implemented on one node with no parallelization. Monte Carlo simulation code is very computationally intensive and appropriate parallelization needed to be implemented. In addition, approaches to visualization of the obtained results were investigated and implemented.
We now know a quantum computer can solve an enormously large set of linear equations, can simulate a wide range of Hamiltonians representing chemical and biological systems, can perform various linear transformations including Fourier transforms, and can efficiently evaluate inner products and distances in super high dimensional vector space, the last of which is particularly useful in machine learning. In this Intern Project, we explored potential applications in combinatorial optimization, which are known to be notoriously difficult to solve, even approximately in general. The group recently developed a promising quantum algorithm, taking advantage of intrinsic quantum correlations and quantum parallelism, to deal with combinatorial optimization problems that scale up exponentially. The Intern Project helped to validate this algorithm through large-scale high-performance simulation of an actual quantum computer.
Thai took the sleep dataset that is being collected globally by students and, with the support of the Pawsey Visualisation Team, developed interactive web-based visualisations to enable high school students to explore and understand the dataset. The Intern Project is being undertaken with the goal of further growing the dataset and expanding the initial portal to include STEM educational materials and “voices” of scientists and experts.
Impact craters across the surface of planetary bodies are of great importance to understand the formation and the evolution of celestial bodies. Secondary craters result from the debris ejection from a primary impact and lead to the formation of long chains of smaller craters on the surrounding ground. The team developed a Crater Detection Algorithm trained on Mars, detecting 94 million impact craters > 25m in diameter. The team is retraining the algorithm on the Moon, and will then turn its attention to Mercury. Mercury exhibits the most unusual secondary crater population in the Solar System. The analysis of secondary craters smaller than 1 km in diameter has never been performed because they are too numerous to be counted by hand. The goal of this Intern Project is to perform analysis on Mercury using the Messenger/MDIS-NAC (1.1m/px) by creating a training dataset using this set of imagery, to retrain the current model. The resulting automatic impact crater catalog will be used by the Bepi-Columbo mission to help target areas of interest.
In recent years, public and government concern in Australia about the potential for tick-borne diseases in people has increased considerably. Uncertainty about Australian Lyme disease-like illness requires evidence-based science to identify the microorganisms responsible and provide conclusive data about the speed of infection after tick attachment. This Intern Project identified appropriate bioinformatic pipelines to assign taxonomy to multiple sourced samples (tick, vertebrate host, microbe), which contributed to ultimately improve diagnostic tests, treatment protocols, and the control of tick-borne diseases
Rayleigh and Raman scattering: This research project in computational and theoretical physics used Pawsey’s high-performance computing, implementing a new parallelization framework for the photon collision computer code and performed relevant modeling calculations. The problem of photon-atom scattering was addressed using a fully quantum approach based on the evaluation of the Kramers-Heisenberg-Waller (KHW) matrix elements. Appropriate parallelization was implemented to improve computational performance
The goal of the Intern Project was to develop a containerised workflow solution for DNA Zoo genome alignments to human genome using the LASTZ sequence alignment program. The project planned to take advantage of the HPC and Nimbus Research Cloud architecture at Pawsey’s to test the primary alignment processing stages, using the DNA Zoo genome assemblies of diverse mammal species to human. This work is foundational to doing any comparative work, and to a key desideratum: mapping conservation in the human genome with single-base-pair resolution
The goal of the Intern Project was to port the EDIP interatomic potential developed by the Curtin Carbon Group to GPU-enabled systems. This project continued the development of HPC capability in the Curtin Carbon Group. A number of years ago the group ported the EDIP interatomic potential to LAMMPS as part of a Pawsey Internship Project. The routines proved extremely valuable, underpinning a successful ARC Discovery Project and establishing the group as international leaders in this field. By expanding our capability into GPUs, the group planned to continue to push the boundaries of what is possible with molecular dynamics simulation.
3D geophysical inversion is a core method for resolution of the subsurface, for a wide range of applications, but in particular is used for minerals and petroleum exploration. The goal of this Intern Project was to derive new workflows to build a “one step” process to generate 3D geophysical inversion models from native-format data, as is collected in airborne surveys. This data is inherently anisotropic as it is collected along long lines, densely sampled (e.g. 10m), but with a much greater separation between lines (e.g. 400 m). Most inversion procedures require a number of pre-processing steps, which are sub-optimal (e.g., time consuming, extensive manual input, numerous assumptions).
Modern approaches are taking advantage of HPC infrastructures that permit much more comprehensive and precise models to be implemented. The ability to rapidly and rigorously build 3D models is burgeoning as “live-data” environments and on demand services become more common
Edric worked on the simulation of quantum statistical algorithms. The key to this project was the calculation of extremely large matrix exponentials using algorithms parallelised by MPI. These codes simulated the quantum statistical algorithms that were proposed by the quantum research group at UWA.
The ability of proteins to fold spontaneously in their native structure or functional state is essential for biological function. Failure to fold in the native shape may lead to misfolding and aggregation of proteins into insoluble aggregates, known as amyloid fibrils. These fibrous deposits have been linked to debilitating and age-related diseases, such as Alzheimer’s, Parkinson’s, type-II diabetes and others. The Intern Project studied the role of mutations on the structure, dynamics and aggregation propensity on the lipid-oriented protein: apolipoprotein A-I (apoA-I). The accumulation of this protein as amyloid fibril has been associated with atherosclerotic plaques. The work was done in collaboration with the experimental research group led by Dr Michael Griffin from the Bio21 Institute and University of Melbourne.
The Intern Project aimed at calibrating a multiphysics geomechanical simulator against experimental data using a Deep Learning (DL) approach. A specificity of the simulator used was its novel constitutive model controlling the mechanical behaviour from state variables like temperature and pore pressure. Since those properties were not directly measured, the calibration could only be obtained through an inversion process. Traditional approaches to inverse problems are largely based on deterministic gradient-based methods, which are limited by non-linearity and non-uniqueness of large-scale problems in high-dimensional parameter spaces. The non-linear physical couplings involved in multiphysics problems make this process extremely challenging, even for expert users, and therefore are particularly suitable for Artificial Intelligence (AI) methods.
This Intern Project evaluated and compared state-of-the-art semantic segmentation methods (DeconvNet, UNet, SegNet, PSPNet, FastSCNN, DeepLabV3) for critical infrastructure monitoring. Semantic segmentation is usually the first task in any scene analysis application, providing useful information about the different foreground and background objects in the scene. The project compared the methods on benchmark segmentation datasets, and then applied them to specific applications where scenes containing critical infrastructure needed to be analysed. The project recommended the most suitable methods based on the overall speed and accuracy.
Repeat expansions of short tandem repeats (STRs) are responsible for over twenty-five human neurological disorders, including Huntington disease, spinocerebellar ataxias and intellectual disabilities (e.g. Fragile-X). Many disorders showing anticipation go undiagnosed as we do not know all the possible repeat expansions. Next-generation sequencing (NGS) may be used to detecting novel repeat expansions but requires computationally intensive algorithms. The goal of this Intern Project was to scan for novel repeat expansions genome-wide in hundreds of NGS samples by creating analysis pipelines using a workflow manager to help analyse NGS samples for evidence of repeat expansions and by incorporating the running of several packages for repeat detection, including HipSTR, STRetch, ExpansionHunter and exSTRa.
The aim of this Intern Project was to automate the detection of new impact craters on the surface of Mars by using a Crater Detection Algorithm. The pipeline of data treatment involved training on images containing already known new impact craters, which were then applied on all high-resolution imagery dataset currently available, with preferential focus on dust-free regions.
Tarun worked on the characterisation and comparative analyses of immune genes in marsupials. The goal of the project was to develop a containerised workflow solution to map the already characterised 800 genes vital to the immune response in the human genome for the 18 marsupial genomes available now. Among these genes are the highly divergent immune genes, such as cytokines, natural killer cell receptors, and antimicrobials. The work revealed the level of complexity of the marsupial immunome as compared to the human.