I am an undergrad Biology student working on a microbiome analysis project with little information about computer science (python 101 only). I am learning QIIME 2 with the goal of performing alpha/beta diversity, phylogenetic analysis on Microbiome data that my professor sequenced from waste water. She used iontorrent sequencing and she has given me UBAM files which are making no sense to me, but i know that they have barcodes attached to sequences of the samples. How can I import them into qiime?
Hi @osaama.shehzad! QIIME 2 doesn’t directly support importing UBAM files; I created an issue to track this feature. I don’t have an ETA for when this will be available, but we’ll follow up here when it makes it into a QIIME 2 release!
In the meantime, you could try finding an external program to convert UBAM files to FASTQ files. Once you have FASTQ files, you have some options depending on the type of sequencing data. Can you please provide the following info so that I can point you in the right direction?
Is this single- or paired-end sequence data? If it’s paired-end, have the reads been joined already or do you have separate forward and reverse reads?
Have the sequences been demultiplexed already? In other words, do you have a single FASTQ file (or two FASTQ files, if it’s unjoined paired-end data) per sample? If the data aren’t demultiplexed, I can help you with that step, which is necessary before denoising or OTU clustering.
Do the sequences have barcodes, primers, or adapters attached to them, or have those been removed already? You’ll want to remove all sequencing artifacts before denoising (e.g. with DADA2 or Deblur) or performing OTU clustering.
Hi @jairideout! Thank you so much for your comprehensive response and generous offer to help! I shared your response with my advisor, and here is what we know about UBAM files which IonTorrent has given us after sequencing.
It’s a paired-end sequencing data which has not been demultiplexed. The forward and reverse reads are within the UBAM file and they are not joined already. The UBAM files contain sequences with barcodes and adapters attached to them.
How do you suggest I should proceed?
Thanks for the details @osaama.shehzad! Here’s what I’d recommend trying:
Convert the UBAM files to FASTQ files using an external program. You should end up with two FASTQ files: one file for the forward reads and one file for the reverse reads.
Import the multiplexed FASTQ files and use the
q2-cutadapt plugin to demultiplex the sequences. You’ll also want to use that plugin to remove any other sequencing artifacts such as primers and adapters (the demultiplexing will remove the barcodes for you). Check out the q2-cutadapt community tutorial for examples of how to import, demultiplex, and trim out sequencing artifacts.
Note that the tutorial uses a toy data set with single-end reads. Since you have paired-end data, you’ll want to use
--type MultiplexedPairedEndBarcodeInSequence in the
qiime tools import command, and
qiime cutadapt demux-paired and
qiime cutadapt trim-paired instead of the single-end versions of those commands used in the tutorial.
Once you have demultiplexed and trimmed out all sequencing artifacts, you can proceed with denoising your data with DADA2 or Deblur (e.g. see this section of the Moving Pictures tutorial; since you have paired-end data the commands will differ slightly). We don’t have official support for denoising IonTorrent data yet, but the current recommendation from the DADA2 developer is to use
--p-trim-left 15 to trim off the first 15 bases during denoising (see this forum topic for details).
Let us know how it goes!
PS: If you haven’t already, I highly recommend working through the Getting Started guide and the Moving Pictures tutorial before getting started with your own data set. Thanks!
An off-topic reply has been merged into an existing topic: importing Ion Torrent data
Please keep replies on-topic in the future.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.