Setonix is the name of Pawsey’s new supercomputer being delivered by Hewlett Packard Enterprise (HPE) as part of the biggest upgrade to the Pawsey computing infrastructure since the centre opened in 2009. Setonix will deliver up to 50 petaFLOPs, or 30 times more compute power than its predecessor systems Magnus and Galaxy, to help power the future high-impact Australian research projects.
Setonix is being built using the HPE Cray EX architecture, featuring significantly increased compute power and more emphasis on accelerators with future-generation AMD EPYC™ CPUs and AMD Instinct™ GPUs, and including expanded data storage capabilities with the Cray Clusterstor E1000 system.
Setonix Phase 1 is forecast to be available for researchers at the end of May 2022 and will provide a 45 percent increase in raw compute power in one-fifth of the size compared with the Magnus and Galaxy systems.
Setonix Phase 2 (the full system) will be delivered at the end of 2022. Setonix will be at least ten times more energy efficient than its predecessors Magnus and Galaxy, while providing a 30-fold increase in raw compute power.
The upgraded compute capability is be supported by a number of other upgrades at the Pawsey Centre designed to provide researchers with an improved user experience:
The ASKAP ingest nodes are one of the most critical components of the pipeline between the ASKAP telescopes and the data store which houses the final data products. They receive the data from the correlators located at the Murchison Radio Observatory and write them to disk ready for post processing on the Galaxy supercomputer.
As part of the capital refresh, the sixteen ASKAP ingest nodes have been replaced with nodes with the latest AMD processors designed for I/O. They have twice as much data bandwidth as the previous generation and more memory channels, ensuring that they can keep up with the torrents of data that are produced by the telescopes. Along with three dedicated nodes for providing ancillary services, they have dedicated storage in the form of the ClusterStor E1000. Approximately half a petabyte of NVMe storage has been dedicated to the ingest process, capable of speeds in excess of 150 GB/s.
The new MWA Compute Cluster is named “Garrawarla”, meaning spider in the Wajarri language; whose land the Murchison Radio Observatory is on. The new 78-node cluster provides a dedicated system for astronomers to process in excess of 30 PB of MWA telescope data using Pawsey infrastructure. The new cluster provides users with enhanced GPU capabilities to power AI, computational work, machine learning workflows and data analytics.
The upgrade to the Nimbus high-throughput computing (HTC) infrastructure is complete. The new infrastructure provides improved computational flexibility, accessibility and speed. The upgrade allows researchers to process and analyse even larger amounts of data through additional object storage and the Kubernetes container orchestrator, building on Pawsey’s existing container technology for its HPC systems.
Pawsey is moving from a monolithic single core router to a spine-leaf architecture with a 400Gbps backbone and 100 Gbps links to host endpoints. This will allow all network endpoint (ie. login nodes, visualisation servers, data mover nodes, etc.) to realise a ten-fold increase in bandwidth from moving from 10 Gbps to 100 Gbps ethernet.
To support all the above upgrades to the Pawsey Supercomputing centre, the Pawsey building needs to be upgraded to provide power and cooling to the new infrastructure. This work commenced in April 2021 and will be delivered in phases to support Setonix (Phase 1 and 2) and the Long Term Storage upgrades.
It has been a busy and exciting month at Pawsey with acceptance testing for Setonix Phase 1 being successfully performed and the Pawsey platforms team are now configuring Setonix for Researchers. This is a key milestone on our journey to migrating researchers to Setonix commencing at the end of May 2022.
As we work on the configuration of Setonix, we are also finalising documentation and training material in preparation for the upcoming migration of workflows from the previous generation of Pawsey supercomputers.
The Supercomputing Documentation has been updated to provide detailed instructions and examples on how to interact with Setonix and to remove information about the systems being decommissioned, Magnus and Zeus. The structure and presentation style of the documentation pages has also been standardised to make them more accessible.
The current documentation will be marked as deprecated and will coexist with the refreshed one until the migration progress is completed.
The team is working on Training modules to support our researchers to transition to the new Setonix supercomputer.
There are a total of six training modules:
- Module 1: Getting Started with Setonix
- Module 2: Supercomputing Filesystems
- Module 3: Using Modules and Containers
- Module 4: Installing and Maintaining your Software
- Module 5: Submitting and Monitoring your Job
- Module 6: Using Data Throughout the Project Lifecycle
Recordings of the first two training modules are already available on Pawsey’s YouTube channel, and the remaining modules will also be made available as recordings.
There will be live online training sessions throughout the start of the migration. You can check Pawsey’s events page or subscribe as a Pawsey Friend to keep up to date with latest training and events. We will also update Pawsey researchers in their monthly newsletter with upcoming sessions.
Weekly interactive Q&A sessions will also be provided as an opportunity to ask questions and seek guidance on migrating computational work to Setonix. Registrations for the training sessions and Q&A sessions will be available in the coming weeks.
As there are a number of changes to the configuration and programming environment of the Setonix supercomputing system, we strongly recommend engaging with the training materials as they become available.
Migration to Setonix for NCMAS and Pawsey Partner projects is forecast to commence at the end of May.
The migration of merit allocation projects will occur over a 12-week period.
Project access will be staggered across the first few weeks to spread the support load for Pawsey staff, prioritising projects with larger allocations to make the most of the available compute time.
Acacia is the new online storage system and will provide over 60 PB of object storage for long term archiving of researcher data. The system is divided into two zones, one designed for data which needs to be accessed faster than the other which is designed for energy efficient long-term storage. Acacia went into production in February 2022 and early adopters have been asked to commence moving data from /Group to Acacia.
All users will need to migrate to Acacia as the existing /Group filesystem will not be available once users migrate to Setonix.
Banksia is the new offline storage and provides a replacement of the previous storage management software with an open system which will provide an expandable platform to build on and leverage the investment in object storage. It uses Pawsey’s current investment in tapes by re-using the existing tape libraries and utilise a new 5 PB cache to take full advantage of the new 100 GBe network infrastructure.
PaCER (Pawsey Supercomputing Centre for Extreme scale Readiness) is a collaboration between Pawsey and research groups across Australia. It allows researchers early access to Pawsey’s supercomputing tools and infrastructure, training and exclusive hackathons focused on HPC performance at scale.
Information about PaCER and upcoming PaCER events can be found here.
We expect the rest of 2022 to be busy as we migrate Pawsey researchers to Setonix Phase 1 and then work through the challenges of delivering the full power of Setonix in the midst of worldwide supply chain issues, travel restrictions and the impacts of COVID 19 on our workforce.
Thank you for being patient with us on this journey, we are looking forward to delivering the southern hemisphere’s fastest research supercomputer, accelerating Australian science.
To read the stories published related to the project milestones referred to the information below:
- Pawsey provides the first look at Setonix, wrapped in stars 21/09/2021
- Pawsey to deploy 130PB of multi-tier storage 16/08/2021
- Pawsey unveiles its super-fast tribute to the quokka 24/02/2021
- Powering the next generation of Australian research with HPE 20/10/2020
- PACER – upscaling Australian researchers in the new era of supercomputing 25/07/2020
- New Pawsey Nimbus Cloud infrastructure available for Australian researchers 10/03/2020
- HPE to deliver a dedicated system for astronomy needs 28/02/2020
- Pawsey Capital Refresh Boosts Cloud Infrastructure 21/11/2019
- Tender released for Australia’s new research supercomputer 14/11/2019
- Three times more storage and performance for SKA pathfinders 11/11/2019
- Pawsey Capital Refresh – Reference Groups Established 5/04/2019
- New funding to accelerate science and innovation 28/04/2018
Pawsey is committed to engage with its diverse stakeholders and keep it update regarding the procurement. Some of the channels the Centre has established to achieve this are the Pawsey user forums, Capital Refresh Update for potential vendors, Pawsey newsletters and more recently our podcasts.
You can listen to the Capital Refresh Podcast from the list below:
Find below an infographic regarding the current status of the Project (last updated on 20/10/2020). They can also be download here
Pawsey Capital Refresh Status
Tape library expansion
Additional tape storage has been procured to expand the existing tape libraries from 50 to 63 Petabytes in each library.
Long term storage
Long Term Storage will be composed of online and offline systems. Awarded to Dell, the new online storage will provide 60 PB of object storage for long term archiving of researcher data. The offline storage, provided by Xenon, will replace the storage management software with an open system, making available an expandable platform to leverage Pawsey investment in object storage. reusing Pawsey’s tape libraries. New services will optimise data upload and download times.
Pawsey partnered with Dell EMC to expand its cloud system with 5x more memory and 25x more storage to form a cutting-edge flexible compute system. This expansion provides better service to emerging research areas and communities who benefit more from a high throughput compute.
Astronomy high-speed storage: 3x more storage and performance. The existing Astro filesystem was expanded to service the MWA community. Powered by HPE, it has been upgraded to 2.7 PB of usable space and capable of reading/writing at 30 GB/s. The New buffer filesystem, a dedicated resource for ASKAP researchers, provides 3.7 PB of usable space and is capable of reading/writing at 40 GB/s. It is manufactured by Dell.
High-speed storage filesystems: Designed to deal with thousands of users accessing them at the same time. The Pawsey high speed filesystems will be procured as part of the main supercomputer system to increase speed and storage capability to general purpose science.
Garrawarla, the 546 TeraFlops MWA cluster, is a resource tuned to MWA’s needs, powered by HPE. Procured ahead of the Main Supercomputer, this cluster allows ASKAP to use the full CPU partition of Galaxy.
Pawsey is moving to a CISCO spine-leaf architecture with a 400Gbps backbone and 100 Gbps links to host endpoints. The network has been designed to be easily expandable to support the object storage platform being purchased as part of the Long-Term Storage procurement as well as integration with the Pawsey new supercomputer.
The remote visualisation capability has been procured as part of the main supercomputer. When the new capabilities become available, researchers will be able to visualise their science in real-time, while being processed.
This new capability will allow researchers to steer their visualisation while the data is processing and fine tuned to the desired outcome.
PSS will be built using HPE Cray EX supercomputer architecture, will deliver 30x more compute power than its predecessors and will be at least 10x more power efficient.
It will be delivered in two phases, phase 1, available by Q3 2021, will provide researchers 45 percent increase in compute power in one-fifth of the size compare with Magnus and Galaxy. Phase 2 will become available in Q2 2022, providing up to 50 petaFLOPS of raw compute power.
Pawsey Data Workflow