Preparing mixed orientation reads for dada2

shreyaramachandran · January 26, 2024, 5:56am

Hello! Long time lurker, first time poster

I find myself in the weeds of a mixed orientation read situation. This forum has been super helpful as I try and figure out what's happening, but after reading through I'm still not settled on a best practice for how best to prepare these reads for use in DADA2.

In particular this post helped me track down the problem and this one seems the closest to the situation I'm in, but the latter was from 2019 and hints at potential fixes in the future...

I have 2x300bp Miseq reads using V3V4 primers, which I received already demultiplexed.
Both the R1 and R2 have all the reads in 5'-3' orientation. However, both files have a mix of reads beginning with the forward and reverse primers. So where FWD and REV represent the forward and reverse primers 5'-3', the files look like:
R1 R2
FWD... REV...
REV... FWD...
FWD... REV...

I know DADA2 in R seems to have an orient.fwd function that might help but I'd like to run as much as possible within Q2 for the sake of provenance and reproducibility and all that good stuff
So the way I see it I have 3 options:

Just trim the FWD and REV primers with 2 rounds of Cutadapt, easy. Feed this to DADA2 and let it deal with the mixed orientations.
Trim separately in Cutadapt so that the outputs are split into different files. Now I will have two output files each for R1 and R2; one containing reads in the "expected" orientation, and one containing reads in the "other" orientation. And feed these into DADA2 (as separate runs?)
Reorganize the reads by concatenating output files from #2 -- now I have a "fixed" R1 and R2 file in which all reads are in the expected orientations. I feel like this should work, but will it bias the DADA2 error model because the reads aren't "actually" from the original R1 and R2 files?

So far I've just been joining reads with Vsearch and running Deblur but I think DADA2 makes a lot more sense for this data, so I'd be very grateful for any advice on how to make these reads work!
Thank you in advance!

PS if it helps anyone lurking, I've included a diagram of what the reads look like and the pre-processing strategies I'm considering.

colinbrislawn · January 26, 2024, 2:51pm

Hello Shreya,

Welcome to the forums! :qiime2:

This is an awesome post. The diagrams are very helpful.

As you have discovered, there are several ways to address mixed orientation reads, and you have found and discussed almost all of them.

Is it possible your reads are in the 'interleaved fastq' format?

In a format called 'interleaved', forward and reverse reads are not random but placed one after the other. Your diagrams make it look like Forward and Reverse are always alternating, which is the interleaved format. You can convert this to normal fastq:

If forward and reverse are random, then you already know all the options that I know of

shreyaramachandran · January 26, 2024, 5:26pm

Hi Colin, thanks so much for your response!
The diagrams were fun to make but unfortunately oversimplify things. The reads aren't alternating, there seems to be no rhyme or reason to which reads within a file are forward-oriented or reverse, or how many reads for each sample are in the expected vs reverse orientation. The reads for R1 and R2 do line up though, so the first read in the R1 file does match the first read in R2 and so on, regardless of which orientation they're in.
In my digging around it seems maybe the Rescript plugin might help but so far it only works for fastas? indeed.

colinbrislawn · January 26, 2024, 5:47pm

So not the interleaved format. Got it!

I like options 2 and 3 from your first post. The primers provide strong evidence of the direction of each read, so as long a cutadapt can find primers this is a good option.

SoilRotifer · January 26, 2024, 6:10pm

Hi @shreyaramachandran,

This may help.. though a bit onerous.

That being said...

We've been discussing ways to implement the vsearch --orient ... command into QIIME 2 for orienting fastq files. You could try running this vsearch command externally on each of your R1 and R2 reads, or you can use reads that are already merged. vsearch is available within your QIIME 2 environment. It might also be easiest to run this prior to importing into QIIME 2.

tl;dr:

vsearch \
   --orient  R1-seqs.fastq \
   --db reference-database.fasta \
   --fastqout  R1-seqs-oriented.fastq \
   --notmatched  R1-seqs-not-oriented.fastq

more detail

You can download and export any of the marker gene reference databases from here, as your input to --db.

More details can be found within the vsearch manual.

So you can do the following...

Export SILVA reference sequences (FASTA)
(You can obtain from the Data resources page linked above.)

qiime tools export \
    --input-path silva-138-99-seqs.qza \
    --output-path silva-138-99-seqs-export/

export your raw fastqs (R1 & R2, or merged) if you already imported them
Otherwise just use the fastqs you have prior to importing into QIIM# 2

qiime tools export \
    --input-path raw-seqs.qza \
    --output-path raw-seqs-export

Run vsearch to orient your fastqs
Again, you need to run on the R1 (forward) and R2 (reverse) reads separately. There is a chance one of the pairs will be oriented and the other will not, causing paired read mismatches. But hopefully it'll be minimal. Though there might be some minor manual edits require for one or both files.

Or, if you plan to use deblur as your denoising approach, you can simply merge your reads with vsearch, and then run the vsearch --orient command. From here you can run deblur on your oriented merged reads. This will avoid task of running vsearch --orient on R1 and R2 separately.

vsearch \
  --orient R1-seqs.fastq \
  --db  silva-138-99-seqs-export/dna-sequences.fasta \
  --fastqout oriented.fastq \
  --notmatched not-oriented.fastq

Import oriented fastqs into QIIME 2
Then you can simply import these fastqs as you would normally do.

Note: I've not completely vetted this strategy myself, but I figured this will provide you with a more tenable place to start.

shreyaramachandran · January 26, 2024, 8:44pm

Hi Mike,
Thank you so much!! This is super helpful and I really appreciate the detail-- I have not yet had a need to export sequences out of QIIME2 so I'm happy to have the guidance. Seems like I can concurrently try 2 strategies, reorienting with vsearch and reorganizing the reads in the files with cutadapt. I think it will be interesting to see how the outputs differ! And I will keep my eye on any developments in implementing vsearch --orient into QIIME 2.
Thank you both again!!

system · February 27, 2024, 2:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.