Paired, partially demultiplexed ion torrent data with barcodes in sequence

Lorinda · September 23, 2019, 11:04pm

Hey QIIME2,

I have partially demultiplexed (and previously paired) ion torrent data such that there are 96 individual .fastq.gz files. However, each of the 96 files has sequences from 3 different runs, so needs to be further demultiplexed based on three different reverse barcodes still embedded in the sequences, (there are 288 samples total, from 3 different plates). I've searched through the forum and plugin descriptions but can't find a good solution for data in this format. So far I have imported data using the Manifest single end format, but am unsure how best to proceed to separate the three runs. Any suggestions?

Thanks!
--Lorinda

Nicholas_Bokulich · September 24, 2019, 4:47pm

Hi @Lorinda,

Yowza what an uncomfortable-sounding problem!

I recommend starting at square one. What do the raw data look like? I am assuming you might have two big fastqs (forward and reverse PE reads) with dual index barcodes??? If so:

import the raw data as MultiplexedPairedEndBarcodeInSequence format (see q2-cutadapt tutorial in the tutorials section of this forum)
run qiime cutadapt demux-paired — note that there are separate forward-barcodes and reverse-barcodes options so that you can specify your DI barcodes.
Profit

But maybe you don't have DI barcodes... let me know if this works for you!

Lorinda · September 24, 2019, 9:48pm

Hey @Nicholas_Bokulich,

Thanks for the reply. Unfortunately the data I have is previously paired -- so no forward and reverse reads to work with, just single paired reads. The forward identifying barcode was removed when paired sequences were partially demultiplexed into the 96 .fastq files. However, each of these 96 files needs to be further demultiplexed by 3 barcodes still contained in the sequences.

I've been through the cutadapt tutorial and haven't had any luck so far. The only solution I can think of is if I could run the imported manifest single end format data through cutadapt trim, trim by one of the three reverse barcodes and retain only those sequences containing the barcode. This would demultiplex 96 of the samples. Repeating this 2 more times with the other 2 reverse barcodes would give me the remaining samples.

I realize this problem may be unique to me, but wanted to put it out there in case anyone has some simple solutions or there is something I am missing. I've worked with lots of 454 and Illumina data, but am less familiar with ion torrent data. I'm looking into getting the raw forward and reverse reads, but this is older data so life would be easier if I could work with what I have (at least that's what I thought).

Cheers,

--Lorinda

Nicholas_Bokulich · September 24, 2019, 10:02pm

I understood this much, but I assumed that you had done the read joining and demux. Now I assume this has been done by some third party and you cannot access the raw data...

In that case, I think your idea to use cutadapt trim may be your best bet. It should be straightforward enough to loop over the files, and it may be easiest to do this on the fastqs outside of QIIME 2, and then import everything to QIIME 2 using manifest format (otherwise you will need to run inside QIIME 2, export everything, and then re-import everything since QIIME 2 does not yet have a method for merging demux artifacts)

Let's see if anyone else has any advice to offer! I cannot think of any method (inside or out of QIIME 2) that can handle partially demuxed files.

Good luck!

system · October 26, 2019, 4:14am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.