Australian Square Kilometre Array Pathfinder (ASKAP)

The Australian Square Kilometre Array Pathfinder (ASKAP) is the latest CSIRO Astronomy and Space Science (CASS) national facility radio telescope located in the Murchison Radio-astronomy Observatory (MRO) in outback Western Australia. It is designed to perform rapid surveys using a combination of Phased Array Feed (PAF) technology to achieve a large field of view, a large number of small antennas to provide excellent spatial frequency sampling, and antenna mounts with three axes of rotation to keep the orientation of the reflecting structure fixed on the sky. ASKAP is designed to process a wide bandwidth and this, combined with the large number of antennas, ensures that ASKAP produces what is currently considered to be a very large amount of raw visibility data. The incoming data rate is 2.5 GB per second, which results in 108 TB of data for a 12 hour observation. This data needs to be processed in the Pawsey Supercomputing Centre within the time it takes to observe it in order to maximise use of the telescope. The telescope and the accompanying data processing system was not built and simply turned on. There was a lengthy commissioning process during which the telescope and computing system was made operational in stages. ASKAP is now in a state where full scale data can be collected and processed and now the task of further verifying the system via a number of Pilot Surveys is being planned
Person

Principal investigator

Juan Carlos Guzman eric.bastholm@csiro.au
Magnifying glass

Area of science

Astronomy, Radio Astronomy
CPU

Systems used

Magnus, Zeus, Galaxy, Athena, Data Stores, Visualisation and others
Computer

Applications used

gcc, java, python, but ASKAP mostly builds its own libraries and deployed using modules system - ASKAPsoft
Partner Institution: CSIRO| Project Code: ASKAP

The Challenge

During the commissioning of ASKAP the 36 dish antennas were integrated into an interferometer array in several stages. At each stage the sensitivity of the instrument increased along with an increase in data rates. One of the main challenges is to determine how to configure and use the data reduction software, ASKAPsoft, to process the data efficiently in terms of compute power, memory and disk space. There are many ways to configure the software and optimising it for a given science case and field of view continues to challenge and forms a basis for ongoing research.

The Science Data Processing (SDP) software development team had the task of testing readiness to process full-scale, full-resolution data using the ASKAPsoft data reduction pipeline. Experience has shown that real telescope data (rather than simulations) must be used to fully exercise all features of the software and with our current observations we can perform a good test at close to full scale. Initial results were short of the required processing rate and in some cases the processing was not able to be completed due to memory and time constraints. The immediate challenge was to analyse the performance bottlenecks and get the processing to complete in the best possible time

The Solution

For the tests we used a 28-antenna observation of the field surrounding the Galactic Centre at 864 MHz with 15552 spectral channels and 288 MHz of bandwidth. Due to the presence of bright, extended emission, this is a challenging field to image. It therefore tests many aspects of the software’s performance. The SDP team reserved all nodes on the Galaxy supercomputer in order to run exclusive tests in single-user mode in the same way our operations team will work next year. We used the existing science data pipeline to run a demanding full spectral resolution imaging task designed to produce large image cubes.

The test highlighted several issues similar to those experienced by the ASKAP science teams during early science operations, as well as a few new ones. In general, the pipeline was observed to be much more stable in the controlled environment of a global reservation than it was in a more traditional multi-user environment. The test also confirmed that the pipeline had a lot of overhead due to serial tasks. Many of these were expected to be done on-the-fly in the original processing plan and the tasks written to do these jobs offline for early science are not well optimised.

Performance improvements were made by increasing parallelism both within the application running on a node and also by redistributing the processing load across more nodes. This resulted in some significant gains in processing times for certain elements of the data pipeline, where in some cases there was an improvement of 10x. Some 30 hours was taken from the total processing duration.

The SDP team also experimented with making the largest possible images in terms of number of pixels (a function of the field extent and pixel size). It was found that we do not have enough memory to make images as large as we would like. This is improved by allocating a dedicated node to the coordinating task, thereby providing more memory where it is most needed

The Outcome

Since fully integrating all of the antennas we have demonstrated stable correlator data capture into Pawsey at a significantly increased rate in excess of 1 GB per second for the first time. Observing was able to be consistently performed using 28 antennas, a significant increase on the 18 previously, and this highlighted the importance of having large high performance disk access for data ingest and processing. During this time the decision was made to stop archiving the raw visibility data, due to the size of the data sets, and to introduce a strict purging of data once processed. This will be normal for future operations.

Initially, data processing performance was found wanting, but the optimisations made to the software now mean we can process the increased incoming data albeit with some constraints. We also had to amend our processing workflows to be more operations based rather than user based. This means that we now have less users perform processing tasks reducing the strain on disk and compute resources.

The work done on telescope commissioning and optimising the data reduction software has made it possible to look forward to initiating Pilot Surveys and enhancing “operations mode” and the Pawsey Supercomputing Centre, considered an integral extension of the telescope, is an important part of that.

List of Publications

A list of papers can be found here:
https://www.atnf.csiro.au/projects/askap/askap-publications.html

(galactic-centre.png) Commissioning observations of the galactic centre. A complex field used for benchmarking ASKAPsoft processing. 6 hours integration, 288 MHz bandwidth This composite image has spectral indices in colour (with blue indicating negative and red indicating positive values) superimposed over the total intensity emission. The colour traces different physical processes with blue showing largely synchrotron emitting regions and yellow indicating thermal emission from HII regions. This is a preliminary image and needs careful understanding of the effect of noise on spectral indices especially at the edges of the bulge where the S/N is relatively low. [credit: Wasim Raja]
(galactic-centre-gs.png) A region from the image of the Galactic centre field by ASKAP. [credit: Wasim Raja]