Analysis of cancer genomes and transcriptomes

Our work continuously generates large amounts of sequencing data, in the form of RNA-seq, exome-seq and WGS from samples of various cancer types. This mainly involves samples from melanoma patients, but also metastatic tumours of unknown primary cancer type, as well as cell line sequencing experiments, among others. We have also recently acquired and plan to analyse large amounts of public RNA-seq data in the near future, from a cohort of about 500 metastatic tumours (approximately 4TB of raw data). In addition, we also have access to a large cohort (> 10000 samples) from The Cancer Genome Atlas. Some of these samples are already stored on the UWA IRDS storage platform, which we plan to access and analyse via Nimbus. Our work is likely to utilize the requested resources continuously over a longer (> 1 year) period, as more samples are acquired for various cancer-related projects and for occasional reanalysis of previous samples, as new hypotheses are posed.

Principal investigator

Joakim Karlsson
Magnifying glass

Area of science


Systems used


Applications used

STAR, HTseq, BWA, GATK, ASCAT, Control-FREEC, Nextflow, and many others
Partner Institution: University of Western Australia | Project Code: A000637

The Challenge

Through our contacts with the clinic, we routinely obtain tumour samples from melanoma and other cancer patients. In addition, we also have access to large amounts of public cancer RNA and DNA sequencing data. Our goals broadly imply carrying out thorough characterisations of these samples using computational methods, as well as evaluating different hypotheses using statistical methods and data mining.

The Solution

DNA and RNA sequencing data from tumour samples will be processed and characterized by comprehensive genomics pipelines. This involves alignment of sequencing reads to a reference genome, quantification of gene read counts (in the case of RNA), mutation calling, DNA copy number analysis, screening for structural genomic variation and fusion genes, as well as searching for evidence of viruses. We are also building a pipeline to predict tumour types of clinical samples based on gene expression data, which will be used for difficult metastatic cases of so called tumours of unknown primary.



The Outcome

The resources provided to us on the Nimbus platform has so far been suitable for our needs in regards to the above described use cases. These analyses benefit from a large amount of CPU threads and RAM being available for analysis, so that one can run many samples at the same time and/or process many sequencing reads in a parallel and optimal fashion. In additional genomics data of this type requires a very large amount of storage space, of which we have been generously awarded 20 TB. It is also useful being able to administer one’s own instance and install all the required tools freely. Installing and evaluating new tools and pipelines is a common chore in genome bioinformatics and the admin rights being provided in a virtual instance like this are invaluable in that respects. In terms of research outcomes since the granting of this Nimbus instance in November this past year, pipelines for routine processing of DNA and RNA samples have been set up and about 20 melanoma-related patient samples have been analysed. Analysis results for one patient with an unknown cancer type have also been provided to assists treating physicians and the pathologist in their assessment of the case. 500 public cancer samples to be analysed have also been transferred to the instance and are about to be analysed, pending some more tweaks to the RNA pipeline involved. So far the experience has been good, and the support excellent, although storage needs may increase in the future