Bembicium-vittatum
Gastropods are a class of invertebrates, characterized by the presence of large muscular foot and a shell that encloses their organs. Gastropods include species such as land snails and sea snails and are highly underrepresented by genomic resources, meaning we know little about the number, structure and function of genes in these species. My study species Bembicium vittatum is an intertidal snail that is endemic to Australia. B. vittatum has been well-studied in terms of taxonomy and population genetic structure, however, we know little about the composition of its genome. Producing a genome assembly for B. vittatum , therefore, provides an opportunity to increase or knowledge of gastropod genomics and animal genomics in general, which is a field of growing importance.
Area of science
Genomics
Systems used
Nimbus
Applications used
FastQC, KAT, Trimmomatic, w2rapThe Challenge
There is currently no reference genome for B. vittatum, nor is there genomic information for many closely related species so to fill this knowledge gap we will produce a de novo genome assembly for B. vittatum. Genome assembly involves sequencing millions of short fragments of DNA and putting them back together to represent the original layout of chromosomes for a particular species. The process is termed de novo genome assembly when there is no prior knowledge of the layout or composition of the genome. Such genomic information will allow us to better understand the development of this species and it’s capacity to adapt to vastly different environments. We may also gain some insight into an unexpected hybridisation event between B. vittatum and a closely related sister species, which resulted from the translocation of B. vittatum to a site where it had not previously occurred.
The Solution
By producing a de novo genome assembly we will gain knowledge of the location, arrangement and variation of genes in B. vittatum. DNA has been extracted from B. vittatum and sequenced to produce millions of short reads. We also used HiC sequencing, where whole nuclei were extracted to provide information about the proximity of sequences to each other in the genome. The quality of the short reads was checked and then a draft assembly was created using a contigging tool. HiC sequences will be aligned to this draft genome to correct any misassembles and to re-order the draft genome if necessary
The Outcome
The raw sequence data that is required to produce a de novo genome assembly is very large in size as multiple replicate of each short DNA sequence are required to produce an accurate representation of the genome. The amount of raw sequence data is also influenced by the size of the species’ genome and B. vittatum has a relatively very large genome. As such, access to cloud computing is essential to analyse the genomic data of B. vittatum. In addition, assembling short DNA sequences in the correct order to represent the original chromosomes is dependent on complex algorithms that require a large amount of compute power and large amounts of working memory. The high throughput infrastructure on Nimbus has been invaluable to this project so far and will continue to be over the next few months.