Building drug databases for COVID-19 testing

Supercomputers in Australia joins forces to find a cure for COVID-19

Researchers using Pawsey supercomputing facilities are supporting the global effort to identify existing drugs that could be repurposed to treat COVID-19.

Professor Alan Mark and his team at the University of Queensland have over the last 10 years created the Automated Topology Builder (ATB), a globally-recognised molecular modelling tool that lets researchers turn a molecular structure into an accurate 3D representation with high-quality atomic interaction parameters required for computational drug design.

Mark’s team is now working with Pawsey and the NCI to ensure that all pharmaceutically active compounds that have already passed Phase II clinical trials for human safety are incorporated into the ATB, to support research investigating if existing drugs can be repurposed to treat COVID-19.

“There are around 7,300 compounds that have passed Phase II clinical trials, we’ve added around 4,000 of these to the database in the last month,” says Mark.

These can then be used to predict which compounds have the potential to interact with the SARS-CoV-2 spikes or other viral proteins.  The detailed molecular geometry and atomic interaction parameters provided by the ATB can also be used to understand how the viral protein dynamically adapts to the presence of a given compound.

The ATB repository is one of the largest pre-calculated molecular structure and parameter databases in the world, and already contains over 420,000 compounds which are freely accessed up to 1,000 times daily by researchers worldwide.  Its creation has required a significant computational effort over 10 years, most recently through the Pawsey Supercomputing Centre in Perth and the National Computational Infrastructure (NCI) in Canberra.

The computational cost of processing a compound at a given level of theory grows rapidly with the number of atoms.  Previously, only compounds with less than 50 atoms could be processed at the highest level of theory.  With extra supercomputing allocations made available for COVID-19 related research, Mark’s group is now able to process all compounds of interest at the highest level.

The new Gadi supercomputer at NCI is being used to routinely process the smaller compounds, whereas Pawsey’s Nimbus cloud service is being used to process larger compounds, where it is hard to predict how long the calculations may take – to optimise the geometry of one 148 atom compound required a Nimbus high performance node (16 cores in parallel) for five days and nine hours.

“Nimbus is very flexible and we can run about 20 compounds asynchronously across multiple nodes without blocking the pipeline.  New molecules are automatically queued so as soon as one computational job is finished, the next one can start,” says Mark.  “It’s running 24 hours a day.”

Although all 7,300 Phase II therapeutic compounds will be available within the ATB within a month, many of these compounds exist in multiple forms within the human body  by the end of the year-long project Mark’s team anticipates having all of the biologically-relevant forms of all of the Phase II compounds – between 50,000 and 100,000 structures – both in the database and bundled for distribution to COVID-19 drug researchers.