Hello! Long time lurker, first time poster
I find myself in the weeds of a mixed orientation read situation. This forum has been super helpful as I try and figure out what's happening, but after reading through I'm still not settled on a best practice for how best to prepare these reads for use in DADA2.
In particular this post helped me track down the problem and this one seems the closest to the situation I'm in, but the latter was from 2019 and hints at potential fixes in the future...
I have 2x300bp Miseq reads using V3V4 primers, which I received already demultiplexed.
Both the R1 and R2 have all the reads in 5'-3' orientation. However, both files have a mix of reads beginning with the forward and reverse primers. So where FWD and REV represent the forward and reverse primers 5'-3', the files look like:
R1 R2
FWD... REV...
REV... FWD...
FWD... REV...
I know DADA2 in R seems to have an orient.fwd function that might help but I'd like to run as much as possible within Q2 for the sake of provenance and reproducibility and all that good stuff
So the way I see it I have 3 options:
-
Just trim the FWD and REV primers with 2 rounds of Cutadapt, easy. Feed this to DADA2 and let it deal with the mixed orientations.
-
Trim separately in Cutadapt so that the outputs are split into different files. Now I will have two output files each for R1 and R2; one containing reads in the "expected" orientation, and one containing reads in the "other" orientation. And feed these into DADA2 (as separate runs?)
-
Reorganize the reads by concatenating output files from #2 -- now I have a "fixed" R1 and R2 file in which all reads are in the expected orientations. I feel like this should work, but will it bias the DADA2 error model because the reads aren't "actually" from the original R1 and R2 files?
So far I've just been joining reads with Vsearch and running Deblur but I think DADA2 makes a lot more sense for this data, so I'd be very grateful for any advice on how to make these reads work!
Thank you in advance!
PS if it helps anyone lurking, I've included a diagram of what the reads look like and the pre-processing strategies I'm considering.