In 2018 the Australian Government awarded $70 million to upgrade Pawsey’s supercomputing infrastructure, on top of the $80 million granted in 2009 to establish a petascale supercomputing facility.
The Pawsey upgrade, as a major part of the national HPC infrastructure, is ensuring Australia continues to enable computationally-intense research.
Pawsey capital refresh is a complex upgrade will be a staged process. Some ancillary systems, including storage and network infrastructure, have been procured prior to the main system.
The delivery of the main supercomputing is expected to occur in two-phases.
The first phase will provide researchers with a system that is at least equivalent in capacity to what they are currently using, with the latest generation of processors and increased memory per node.
During this phase, researchers with an active allocation on Magnus will transition to the new system.
Phase one is due to be commissioned in mid-2021.
The second phase is expected to be in production by mid-2022. It will provide an exponential expansion in capacity and state-of-the-art technology.
You can see all the parts that make the Capital Refresh Project on the infographics displayed at the bottom of this page or they can be download here.
On 24 February 2021, The Pawsey Supercomputing Centre announced the name of the new supercomputer – Setonix. will be delivered by Hewlett Packard Enterprise (HPE) as part of the biggest upgrade to the Pawsey computing infrastructure since the centre opened in 2009. The new supercomputer will deliver up to 50 petaFLOPs, or 30 times more compute power than its predecessor systems Magnus and Galaxy, to help power the future high-impact Australian research projects.
Setonix is being built using the HPE Cray EX architecture, featuring significantly increased compute power and more emphasis on accelerators with future-generation AMD EPYC™ CPUs and AMD Instinct™ GPUs, and including expanded data storage capabilities with the Cray Clusterstor E1000 system.
In exciting news, the Setonix Test and Development System (TDS) was delivered to Pawsey in May. It has been installed in the whitespace and will soon be available to Pawsey to test the final configuration of Setonix.
Setonix Phase 1 is expected to be delivered to Pawsey in July with handover to Pawsey scheduled for early September 2021. Phase 1 will provide a 45 percent increase in raw compute power in one-fifth of the size compared with the Magnus and Galaxy systems.
Setonix Phase 2 (the full system) will be commissioned in the second quarter of 2022.
Setonix will be at least ten times more energy efficient than its predecessors Magnus and Galaxy, while providing a 30-fold increase in raw compute power. The supercomputers are cooled by a groundwater cooling system specially developed by CSIRO for the supercomputing centre, which is offset by a 118kW solar photovoltaic system.
To support the move from the existing Pawsey Supercomputers (Magnus and Galaxy) to Setonix, Pawsey are working on a number of initiatives designed to support users as they migrate to the new technology. The first of these initiatives is Pawsey Supercomputing Centre for Extreme scale Readiness (PACER) program. PACER is assisting researcher’s code optimization and application and workflow readiness by running a Grand Challenge Problem; on a previously unavailable scale on the next-generation supercomputer. To solve these problems, Pawsey is co-funding Australian postdoctoral or PhD positions, embedded within a subset of successful PACER projects. These positions will work to solve computational problems in collaboration with researchers.
PaCER projects were showcased in July as four separate events around scientific fields; presentations were welcomed with interest by the research community and the events videos will be available in July on Pawsey’s youtube channel.
Pawsey are well advanced on the development of a new suite of documentation to support Pawsey customers through the migration to Setonix and the ongoing use of Setonix. The next phase of this work will be the development of training packages for our Customers.
The upgraded Long Term Storage has been named Acacia.
The offline storage will provide a replacement of the storage management software with an open system which will provide an expandable platform to build on and leverage the investment in object storage. It will use Pawsey’s current investment in tapes by re-using the existing tape libraries and utilise a new 5 PB cache to take full advantage of the new 100 GBe network infrastructure.
The new online storage will provide over 60 PB of object storage for long term archiving of researcher data. The system will be divided into two zones, one designed for data which needs to be accessed faster than the other which is designed for energy efficient long-term storage.
The new storage is planned to go live in Q3 2021.
Pawsey have commenced planning the migration to Acacia and will work with all users to provide a smooth transition to the new system.
Procurement of the upgraded Network is progressing well. Pawsey is moving from a monolithic single core router to a spine-leaf architecture with a 400Gbps backbone and 100 Gbps links to host endpoints. This will allow the Nimbus cluster, which was recently upgraded to 100 Gbps internally, to connect at that speed to the Pawsey network. The network has been designed to be easily expandable to support the Ceph based object storage platform being purchased as part of the Long-Term Storage procurement as well as integration with the PSS. This will allow all network endpoint (ie. login nodes, visualisation servers, data mover nodes, etc.) to realise a ten-fold increase in bandwidth from moving from 10 Gbps to 100 Gbps ethernet.
The ASKAP ingest nodes are one of the most critical components of the pipeline between the ASKAP telescopes and the data store which houses the final data products. They receive the data from the correlators located at the Murchison Radio Observatory and write them to disk ready for post processing on the Galaxy supercomputer.
As part of the capital refresh, the sixteen ASKAP ingest nodes have been replaced with nodes with the latest AMD processors designed for I/O. They have twice as much data bandwidth as the previous generation and more memory channels, ensuring that they can keep up with the torrents of data that are produced by the telescopes. Along with three dedicated nodes for providing ancillary services, they have dedicated storage in the form of the ClusterStor E1000. Approximately half a petabyte of NVMe storage has been dedicated to the ingest process, capable of speeds in excess of 150 GB/s.
Migration to the new nodes was successfully performed in December 2020.
The new MWA Compute Cluster is named “Garrawarla”, meaning spider in the Wajarri language; whose land the Murchison Radio Observatory is on. The new 78-node cluster provides a dedicated system for astronomers to process in excess of 30 PB of MWA telescope data using Pawsey infrastructure. The new cluster provides users with enhanced GPU capabilities to power AI, computational work, machine learning workflows and data analytics.
Please refer to Garrawarla documentation for detailed instruction on how to access Garrawarla cluster, the system details, to compile and run jobs and other relevant information. Several packages are available as system modules. Some modules with python/2.7.17 support can be accessed from /pawsey/mwa/software/mwa_sles12sp4/modulefiles directory.
If you have any questions or issues accessing Garrawarla, please contact the Pawsey Helpdesk at Pawsey Service Desk.
The upgrade to the Nimbus high-throughput computing (HTC) infrastructure is complete. The new infrastructure provides improved computational flexibility, accessibility and speed. The upgrade allows researchers to process and analyse even larger amounts of data through additional object storage and the Kubernetes container orchestrator, building on Pawsey’s existing container technology for its HPC systems.
New users can apply for an allocation on Nimbus via apply.pawsey.org.au.
Finally, to support all the above upgrades to the Pawsey Supercomputing centre, the Pawsey building needs to be upgraded to provide power and cooling to the new infrastructure. This work commenced in April 2021 and will be delivered in phases to support Setonix (Phase 1 and 2) and the Long term Storage upgrades. Cutting in the new power and cooling will require partial shutdowns at Pawsey but these will be aligned, as much as possible, with the regular Pawsey maintenance days to minimise the impact to Pawsey customers.
To read the stories published related to the project milestones referred to the information below:
- Powering the next generation of Australian research with HPE 20/10/2020
- PACER – upscaling Australian researchers in the new era of supercomputing 25/07/2020
- New Pawsey Nimbus Cloud infrastructure available for Australian researchers 10/03/2020
- HPE to deliver a dedicated system for astronomy needs 28/02/2020
- Pawsey Capital Refresh Boosts Cloud Infrastructure 21/11/2019
- Tender released for Australia’s new research supercomputer 14/11/2019
- Three times more storage and performance for SKA pathfinders 11/11/2019
- Pawsey Capital Refresh – Reference Groups Established 5/04/2019
- New funding to accelerate science and innovation 28/04/2018
Pawsey is committed to engage with its diverse stakeholders and keep it update regarding the procurement. Some of the channels the Centre has established to achieve this are the Pawsey user forums, Capital Refresh Update for potential vendors, Pawsey newsletters and more recently our podcasts.
You can listen to the Capital Refresh Podcast from the list below:
Episodes in the podcast:
- Episode 7: PACER, accelerating researchers in the new era of supercomputing
- Episode 6: Spiders for the sky
- Episode 5: Nimbus – HTC Cloud Service Upgrade & Training
- Episode 4: The year that was, the year ahead
- Episode 3: HTC Cloud Procurement
- Episode 2: Capital Refresh Status Update
- Episode 1: What is the Pawsey Capital Refresh?
Find below an infographic regarding the current status of the Project (last updated on 20/10/2020). They can also be download here
Pawsey Capital Refresh Status
Tape library expansion
Additional tape storage has been procured to expand the existing tape libraries from 50 to 63 Petabytes in each library.
Long term storage
Long Term Storage will be composed of online and offline systems. Awarded to Dell, the new online storage will provide 60 PB of object storage for long term archiving of researcher data. The offline storage, provided by Xenon, will replace the storage management software with an open system, making available an expandable platform to leverage Pawsey investment in object storage. reusing Pawsey’s tape libraries. New services will optimise data upload and download times.
Pawsey partnered with Dell EMC to expand its cloud system with 5x more memory and 25x more storage to form a cutting-edge flexible compute system. This expansion provides better service to emerging research areas and communities who benefit more from a high throughput compute.
Astronomy high-speed storage: 3x more storage and performance. The existing Astro filesystem was expanded to service the MWA community. Powered by HPE, it has been upgraded to 2.7 PB of usable space and capable of reading/writing at 30 GB/s. The New buffer filesystem, a dedicated resource for ASKAP researchers, provides 3.7 PB of usable space and is capable of reading/writing at 40 GB/s. It is manufactured by Dell.
High-speed storage filesystems: Designed to deal with thousands of users accessing them at the same time. The Pawsey high speed filesystems will be procured as part of the main supercomputer system to increase speed and storage capability to general purpose science.
Garrawarla, the 546 TeraFlops MWA cluster, is a resource tuned to MWA’s needs, powered by HPE. Procured ahead of the Main Supercomputer, this cluster allows ASKAP to use the full CPU partition of Galaxy.
Pawsey is moving to a CISCO spine-leaf architecture with a 400Gbps backbone and 100 Gbps links to host endpoints. The network has been designed to be easily expandable to support the object storage platform being purchased as part of the Long-Term Storage procurement as well as integration with the Pawsey new supercomputer.
The remote visualisation capability has been procured as part of the main supercomputer. When the new capabilities become available, researchers will be able to visualise their science in real-time, while being processed.
This new capability will allow researchers to steer their visualisation while the data is processing and fine tuned to the desired outcome.
PSS will be built using HPE Cray EX supercomputer architecture, will deliver 30x more compute power than its predecessors and will be at least 10x more power efficient.
It will be delivered in two phases, phase 1, available by Q3 2021, will provide researchers 45 percent increase in compute power in one-fifth of the size compare with Magnus and Galaxy. Phase 2 will become available in Q2 2022, providing up to 50 petaFLOPS of raw compute power.
Pawsey Data Workflow