Paired End Dual Index Demultiplexing

AviTil · March 5, 2024, 6:13am

QIIME2 does not currently allow demultiplexing of dual index paired-end reads (4 files - I1, I2, R1, R2). Looking at previous post histories, I can see that the workaround is to use QIIME1's expact_barcodes.py script or cutadapt. After trying and failing to get QIIME1 with its outdated Python2 dependencies installed, I am considering cutadapt, which I have no issues installing and getting to work. Now I need help with making sure I am using cutadapt correctly. The I1 and I2 files are index reads of the sequences, and are in FASTQ format (unzipped from fastq.gz). cutadapt requires 2 barcode files and the 2 read files as input, but however, the barcodes are expected to be in a FASTA format. Here is where my query is, is this FASTA just the I1 & I2 FASTQ files without quality scores (i.e, should I convert FASTQ to FASTA)? Or are these FASTA barcodes a user-generated map between exact barcodes (with no "N' bases) and the sample names?

AviTil · March 5, 2024, 7:01am

Okay, I've got this working using QIIME1.8 now, but using a narrow range of version dependencies.

For anyone else in the future. cutadapt can only be used if your reads also contain the barcodes at the start of the ends of the reads. In mycase, the index reads were not included as a part of the actual sequence reads, which means cutadapt could not be used. Hence, I had to brute force my way through installing the QIIME1.8 dependencies manually.

Nicholas_Bokulich · March 5, 2024, 7:07am

Hi @AviTil ,
Thanks for sharing and for managing to install QIIME 1.8! That deserves a trophy, I did not think that it was even possible any more!

That's correct, cutadapt only supports demultiplexing dual-indexed reads when the barcodes are in the reads. AND q2-cutadapt does as well. This is possible with the qiime cutadapt demux-paired action, just in case anyone reading this thread is interested.

But What you have, @AviTil , is basically EMP format but with dual-indexed reads (an uncommon format I think). One possibility might also be to concatenate the barcodes (using a custom script) and then passing this concatenated barcode to qiime demux emp-paired. Sounds like you have this working now with QIIME 1, but just in case you wanted a possible solution with QIIME 2, or an idea others reading this topic in the future.

AviTil · March 5, 2024, 7:33am

Hi Nicholas,

Thanks for your reply. I did come across other threads here explicitly talking about manual concatenation of the barcodes. But I was reluctant to do so. In my inexperienced eyes, it was just another source of error. I was concerned about a couple of things

Each sequence identifier in the reads also contain a Y/N value which reflects if it has passed a "filter" step during the BCL to FASTQ conversion. I had no idea if this mattered, and how it was calculated. I didnt know how i would define my identifier be if i were to concatenate a I1 that passed and I2 that didnt.
i did not know how i would deal with if a certain spot had an I1, R1 and R2 But lacked an I2 for some reason (insufficient cluster density or some sequencing error).
How to deal with Ns in Index reads?

Im pretty much a novice here, and I am trying to tread carefully keeping reproducability of my data in mind, and so wanted to use something that i could just repeat rather than use a custom script.

If you could provide some pointers and lay these questions to rest, id be grateful, since i would now be comfortable with using a custom script.

Nicholas_Bokulich · March 7, 2024, 6:19am

Hi @AviTil ,

If one read or index passes and the pair does not, you should discard both.

These should probably also be discarded, as otherwise how can you be sure that you are mapping to the correct barcode?

Good luck!