mixed orientation, dual-indexed, demultiplexed samples

skfagervols · July 11, 2025, 5:09pm

Hi
So I have the dreaded dual indexed, mixed orientation (but demultiplexed) data.
I am having trouble with the fungal sequences.
I was able to deal with the mixed orientation on the 16S rRNA data since I could run the orient-reads command. However, the orient_reads command is not great with the fungal ITS data.

So, is there any news on how to import mixed_orientation, dual indexed data? The mixed orientation will not be great for running dada2 since it inflates the number of ASVs. I tried using cutadapt, but I dont think it works with dual-indexed data.

Any good ideas?

Best, Sonja

SoilRotifer · July 11, 2025, 5:36pm

Oh no @skfagervols!

What did you use as a reference to orient your reads?

Some ITS sequences are very short and you can get read-through into the opposing primer, and other adapter sequence, which may interfere with the orientation detection. I'd suggest running cutadapt to remove the reverse compliment of the opposing primer from the 3' end of the reads using the --p-adapter-* flags.

For example, your command might look something like this:

qiime cutadapt \
    --i-demultiplexed-sequences its-pe.qza
    --p-adapter-f <reverse compliment of forward primer> <reverse compliment of reverse primer> \
    --p-adapter-r <reverse compliment of reverse primer> <reverse compliment of forward primer>  \
    --o-trimmed-sequences

I am using two primers as I've done here and here.

Note, what we need to do is a little different... I think with most ITS pipelines, like ITSx, etc... it is often recommended to leave the PCR primer in the sequences for better extraction. So, double check that logic, but I am not sure if leaving the primer in the sequence will impede orientation detection. If so, then remove both primers. It is okay to run cutadapt multiple times if needed.

Anyway, as we are dealing with mixed oriented reads, and presuming we want to keep the 5' primers within the sequences, we'd have to add a list or primers we expect to see on the 3' end of the read if there is read-through, and only trim if we observe them them, otherwise leave the sequence alone, hence no --p-discard-untrimmed. You can use this website to return the reverse compliment of your primer sequences, it will work with the standard IUPAC ambiguity codes.

Then try re-running rescript orient-reads.

Hopefully, this makes sense. Let us know if this works.