Improving crop performance by seeing plants in a new light

Project Leader: George Sainsbury, University of Adelaide

The Australian Plant Phenomics Facility (APPF) allows academic and commercial plant scientists from Australia and around the world to measure phenomic information (physical and biochemical traits) about thousands of plants a day over the course of their life cycle.  In combination with controlling environmental conditions for growth, this information is being paired with plant genetic information to identify the most promising traits for growing crops tolerant to heat, drought, salinity or limited nutrients for our changing climate.  But as plant trials get larger and phenotype measurement systems get more powerful, data management and storage requirements also grow.

 
Nimbus Cores across five virtual machine
 
GB of Storage
 
TB of archival storage per year
Partner Institution: University of Adelaide, The Australian Plant Phenomics Facility System: Nimbus Areas of science: Plant Biology

The Challenge

The APPF is a national research infrastructure platform funded by NCRIS that provides large-scale plant growth research facilities to address complex problems in plant and agricultural sciences.  A major focus is developing improved crops and more sustainable agricultural practices in the face of declining arable land and the challenges of climate change.

Dr Bettina Berger, Scientific Director of the APPF node at the University of Adelaide, explains: “We explore how the genetic makeup of a plant determines its appearance, function and performance.  At the APPF, our high-throughput, automated phenotyping systems use imaging technologies to phenotype over 2,000 plants a day moving on conveyors in our ‘smart’ greenhouses with watering, nutrient and temperature control.”

Visible light and fluorescence imaging has previously provided measures of plant height and width, canopy density, leaf colour, and evidence of ageing, nutrient limitations or disease.  Complementing these, the advent of hyperspectral imaging (collecting images across a much wider range of discrete electromagnetic frequency bands) and automation of camera operation on tractors or conveyors to increase measurement speed has greatly expanded the volume of phenotypic data available, and enabled repeat measurement of plant properties over several stages of plant growth.

“Our experiments can now record colour images of thousands of plants and hyperspectral images and X-ray CT scans of hundreds of plants over the course of an experiment,” says Dr Berger.  “As a national research facility allowing researchers to run their plant trials on this scale, we’re now having to partner with another national research facility like Pawsey to support our researchers in managing, accessing and analysing these data sets.”

The Solution

Mr George Sainsbury, a data architect and software engineer at the APPF, notes that because plants grow at a steady rate, and because images are being collected at a steady rate, the ongoing real-time analytical and computational requirements of an experiment are still manageable.  “We don’t need supercomputing to analyse our image sets – yet.  But as a time-series of images accumulates, data storage and management becomes challenging.”

The APPF now uses Pawsey’s Nimbus cloud service for server infrastructure and the data portal for data storage and sharing.  Mr Sainsbury explains: “As soon as we acquire an image it is now sent straight to a data store at Pawsey.  We access it from there for analysis and the results are then added to another database at Pawsey.  The whole process is automated.”

Nimbus is used as the active data store for experiments, but as experiments finish the image and databases are also archived at Pawsey.

“We’re using around 30 cores across five virtual machines on Nimbus to operate our various server software and analytical platforms, and our archival storage is growing at around 15–20 TB each year.”

Outcome

“We’ve got to the point that our plant researchers don’t have to worry about how all their experimental data is stored and analysed,” says Dr Berger.  “We support plant research across the country and internationally, so researchers outside of Adelaide can now follow their experiments remotely and use data for decision making while the experiment is still running.”

“We’re using machine learning and other advanced computational techniques to interpret the hyperspectral images as they’re generated, and then applying a range of smoothing methods to extract growth traits over time that the researchers can then use in their genetic analyses.”

Mr Sainsbury points out that the APPF’s use of Pawsey facilities is only likely to increase.  “We’re still learning how to extract the most plant growth- and plant health-relevant hyperspectral, x-ray, and other remote-sensing data at scale.  As the datasets become more layered and complex, the number of plants in trials increase and the resolution of the images increases, we’ll reach a point where we may need to also incorporate Pawsey supercomputing into our automated data pipeline to analyse hundreds of gigabytes of imagery at a time.”

Dr Berger sums it up: “Between Pawsey and the APPF, we can now bring massive amounts of genomic and phenomic information together, to better understand what specific genes of a plant contribute to crop success in the face of various environmental stresses.  This supports plant breeding and crop improvement efforts worldwide.”

As the datasets become more layered and complex, the number of plants in trials increase and the resolution of the images increases, we’ll reach a point where we may need to also incorporate Pawsey supercomputing into our automated data pipeline to analyse hundreds of gigabytes of imagery at a time.
George Sainsbury, University of Adelaide,
Project Leader.