Discarded results produced by q2-cutadapt's trim-paired when working with mixed-orientation reads

Summary

We recently discovered a bug related to the q2-cutadapt plugin's demux-paired command that, in certain situations, can unnecessarily discard a significant portion of reads, when the command is run with the --p-mixed-orientation flag.

Impact

This issue impacts the following versions of q2-cutadapt:

  • 2020.6
  • 2020.8

If you think you may have been impacted please check your provenance at https://view.qiime2.org for the versions listed above to verify that you ran demux-paired with the mixed-orientation flag enabled (we are happy to help with identification, please share a demux summarize visualization with us).

Details

Mixed orientation reads are produced by some sequencing protocols, where R1 and R2 multiplexed files don't necessarily correspond to forward and reverse reads (that is to say, the orientation is "mixed" in both files). In general we haven't seen this strategy very often (but if you use this, let us know - we'd like to learn more about the use case for this functionality)!

Internally, q2-cutadapt follows a mixed-orientation demux protocol based on:

https://cutadapt.readthedocs.io/en/stable/guide.html#demultiplexing-paired-end-reads-in-mixed-orientation

Essentially, we demux the mixed-orientation reads in two rounds. In the first round we treat R1 and R2 as forward and reverse reads, respectively, as in a typical paired-end demultiplexing workflow. This will produce per-sample demultiplexed R1 and R2 files, as well as new multiplexed files ("unknown"), containing all of the reads that weren't able to be demultiplexed. We then demultiplex the unknown reads, but swapping R1 and R2 so that R2 is treated as the forward reads and R1 as the reverse reads in the second demultiplexing round.

While making unrelated changes to q2-cutadapt, we discovered that on the second demultiplexing round, rather than appending reads to the results of the first round of demultiplexing, we were overwriting those results. As a result, any reads that were demultiplexed in round 1 were discarded, if additional reads were matched to a sample during round 2.

Resolution

We have implemented a fix for this issue which will be available next week as part of the QIIME 2 2020.11 release.

We apologize for any inconvenience this may have caused, and are happy to discuss further if you have any questions, or need help determining if you have been impacted by this bug.

6 Likes

I have data like this!

I wasn’t involved in generating it - but I think the background was that at the time it was cheaper/easier to ligate on the barcodes and sequencing adaptors after amplifying all samples with the same primers, as compared to incorporating all that into really long primers.

In Qiime1 we used to just run demultiplexing twice, followed by reverse complimenting one set and then concatenating them - just getting started with Qiime2.

With cut-adapt and the mixed-orientation flag, will the reads which are “backwards” (found in the second round) get reverse-complimented and switched R1/R2, before they are combined with the “forwards” ones, or do you end up with a mix of orientations in the output fastq files?

Hi @alison,

You can use this approach and see if it helps:

-Mike

1 Like

Thanks Mike!

That seems like it will work to get all the primers removed. But half of the sequences would still be in the reverse orientation, right? I'll be using dada2 to denoise and make ASV's - would having half of the sequences reversed make problems, or is dada2 able to take that possibility into consideration?

I forgot that the issue outlined here appears to have been fixed in qiime2-2020.11. So you should be fine.

However, if you ever run into a case in which you need to re-orient your sequences you can use RESCRIPt's orient-seqs command.

EDIT: I forgot that the orient-seqs command will currently only run on FASTA inputs. We hope to work on a solution for FASTQ.

-Mike

Hi @SoilRotifer

So to clarify, Alison's concern of half the reads being in the reverse complement is no longer an issue? That being as part of this function, it would re-orient any miss oriented read?

-Rob

I have the same question as Lamm-a.
@SoilRotifer @Nicholas_Bokulich Does "qiime cutadapt trim-paired" re-orient the sequence or it is not yet possible to do in Qiime2 as RESCRIPt's orient-seqs command works only on FASTA file?

Currently, this is still the case.

3 off-topic replies have been split into a new topic: RESCRIPt: accepting FASTQ as input to orient-seqs

Please keep replies on-topic in the future.