Staging in medical context refers to identifying the cancer types and stages from various relevant data sources - pathology tests, free text, and tabular data. Currently, manual clinical coding is needed to ascertain cancer type(s) and stages.

Principal investigator

Shiv Meka
Magnifying glass

Area of science


Systems used

Nimbus and Topaz

Applications used

Cancer staging prediction, Data streaming
Partner Institution: WA Health | Project Code: AL-1763

The Challenge

Coding different phases of cancer is time consuming as it involves a clinical coder with a subject matter expertise in oncology to manually read elaborate free text reports. Is it possible to automate clinical coding in the context of cancer staging?

The Solution

Validating the use of various algorithms (rule based & deep-learning methods) to identify cancer stages and score tumor-node-metastases (tnm) of cancer site(s). On retrospective cohort between 2010-2019, ML models performed with ~94% accuracy while simple regex rule-based methods’ accuracy was around 95% and consumed a fraction of the inference time. We have a rather unique case where simple regex based methods outperform ML both in both accuracy and computational-complexity.

The Outcome

Nimbus was used as
i) a front end for coders to upload deidentified pathology reports
ii) a resource for running simultaneous model inferences
iii) a medium to run airflow scripts that automate training on topaz

Figure 1. Project’s front end hosted on Nimbus