Incorrect results produced by q2-cutadapt's trim-paired

q2d2 · June 19, 2018, 11:14pm

Summary

We recently discovered a bug related to the q2-cutadapt plugin's trim-paired command that, in certain situations, can produce incorrect results. In particular, the forward and reverse reads are swapped, causing the underlying cutadapt command to apply the forward read trimming parameters on the reverse reads (and vice versa).

Impact

This issue impacts the following versions of q2-cutadapt:

2017.12
2018.2
2018.4

If you think you may have been impacted please check your provenance at https://view.qiime2.org for the versions listed above.

Details

This problem occurs when the reverse read sorts alphabetically before the forward read. This can happen when the filename's barcode ID is different between the forward and the reverse read:

# here the barcode IDs are `S01`, `S00`, and `S02`
sample_a_S01_L001_R1_001.fastq.gz
sample_a_S01_L001_R2_001.fastq.gz
sample_b_S00_L001_R2_001.fastq.gz  <== here R2 comes *before* R1
sample_b_S02_L001_R1_001.fastq.gz

There are a few ways this could occur:

You imported using a manifest format and your rows didn't exactly follow the pattern of forward, reverse, forward, reverse, etc (this is not a requirement of the format).
You imported via a Casava format and your barcode IDs were nucleotides (or any identifier) in a non-alphabetical order. This might happen with certain barcode schemes or protocols.

Resolution

We have implemented a fix for this issue which will be available this week as part of the QIIME 2 2018.6 release.

We apologize for any inconvenience this may have caused, and are happy to discuss further.

Special thanks to @jcmcnch for bringing this issue to our attention, and for providing a well documented issue report, complete with supporting data. It made troubleshooting this issue very straightforward. Thanks!

emescioglu · July 13, 2018, 1:07am

Hi all,

Thank you very much for noticing and fixing this bug!!!!

I just wanted to let you all know that I have re-run some data using the updated version and there is, in fact, a difference in my results. So if there is anyone out there wondering if it is worth it to re-do everything - Yes, it is worth it.

Esra