Dear All - I have 4 Illumina MiSeq 300 bp paired end read sets that I’m trying to demultiplex. The sequences are barcoded with combinatorial duel indexes, and they are mixed orientation, ie some of the sequences in the file that would normally just contain forward reads have a reverse barcode on the 5’ end, and vice versa for the reverse file.
I’ve searched the forum and learned that others who needed to demux files like these first used qiime1 to run the extracy_barcodes.py function, to generate proper forwards and reverse read files and a barcodes file, and then exported them into qiime2 to demultiplex them.
But that was back in 2018. I’ve been unable to install qiime1, and the error messages indicate that this is because python 2.7 has been deprecated since Jan 1 2020. Even the minimal install fails. If anyone has any advice on either working with this data in qiime2, or using an alternative strategy, I’d be very grateful.
Great question. I know some folks with mixed-orientation barcoded reads use this tool to extract the barcodes and orient the reads prior to importing to QIIME 2 (possibly by doing two passes through this tool? I have not looked): https://github.com/najoshi/sabre
However, since you have combinatorial dual-indexed reads that complicates matters, so I’m not sure if sabre will cover your needs but it’s worth a look.
If any of the qiime1 virtual machines are still around, you may be able to download and use one to access extract_reads.py…
this is functionality that is long-overdue to get into QIIME 2 but since combinatorial dual-index reads (with/without mixed orientation) represent a small fraction of users’ data (that’s been reported on the forum at least!) this has not been a very high priority… any volunteers??? I will be curious to hear what tools others are using to prep these data, since those could be wrapped in QIIME 2 to make a seamless workflow.
This looks promising, though the fqgrep tool that they use for re-orienting requires the primer to be present (it often is not otherwise q2-cutadapt can work for this, not sure about the fate of the primers in @Danyl_McLauchlan’s data).
It’s also unclear if this pipeline can handle combinatorial dual barcodes… but if you have used this tool to re-orient and extract barcodes from similar data, please share your workflow! we could even consider wrapping this in QIIME 2 to streamline this type of analysis.
Oh, right, my bad. Sorry for being misleading in some way. I realised that I only experimented with these tools. If reads are merged, then I think Fastx Toolkit together with fqgrep still can be used.
My final (and maybe odd looking) workflow for sorting the reads before demultiplexing in qiime2:
First I used Seed2 (http://www.biomed.cas.cz/mbu/lbwrf/seed/Windows based) to sort out which reads should be in R1 and which in R2 (some sort of simple demultiplexing based on the pattern within the sequences that one can assign to the level of interest, in my case - barcode+primer; R1 to FWD1 and REV1, R2 to FWD2 and REV2, then I used cat to merge FWD1+FWD2=forward.fastq, REV1+REV2=reverse.fastq). Then I used tool/script called “Sort paired end Illumina seqs” (https://github.com/enormandeau/Scripts/blob/master/fastqCombinePairedEnd.py implemented in the Pipecraft) to “to sort paired-end sequences to contain only matching sequences and discard sequences that exist only in one file and not in the other”. Nothing much was lost during these two steps.
There is another tool - LotuS which has in-built option for mixed orient reads (look for CheckForMixedPairs and CheckForReversedSeqs http://psbweb05.psb.ugent.be/lotus/documentation.html ). This is particularly relevant for those who have their sequences in two files (R1 and R2) where reads are constructs of either barcode+forward_primer+sequence or barcode+reverse_primer+sequence mixed approx. 50:50 among R1 and R2 files. My sequences are like this, except each sample amplicons have the same barcode at both ends.
It does not look like this pipeline addresses @Danyl_McLauchlan’s issue either, which is that the reads have combinatorial barcodes (i.e., the reads are dual indexed but the index on each end is used multiple times, so the combination of indices is unique for each sample, but the index on either end is not unique).
The mixed orientations are an additional challenge, but QIIME 2 can handle mixed orientation reads just fine with some simple changes. The main issue we see with mixed-orientation reads is that one of the taxonomy classifiers (classify-sklearn) assumes that reads are in a single orientation, but the other taxonomy classifiers in q2-feature-classifier can work with mixed orientations.