QIIME 2 processing comparatively to QIIME 1

slh277 · February 9, 2018, 2:18am

Hi,
I am trying to compare my results to those of the company that generated the reads. These are the processing/QIIME 1 steps they took:

FastQC
Trimmomatic or Cutadapt to cut out low quality bases (phred score < 30) and adapters
Check that reads are at least 70% high quality (not sure what is "high") and ≥ 50 bp long
Check for primers on each read*
Fastq-join to stitch together the reads w/ primers*

I've run the same data through QIIME 2 but I've gotten substantially fewer sequences per sample. I used DADA2 (forward truncation, 270; reverse: 200). I'm not sure how to incorporate the quality/trimming steps above similarly using QIIME2. Here is the forward/reverse reads.

*Also, I believe the primers must be cut off before doing DADA2, which also joins the reads. Not sure how to incorporate all of these steps.

And for an example of the paired end read differences, after the above processing/DADA2:
Company: My data:
190,449 27,325
211,773 16,469
414,331 26,807
176,799 12,810

I'm assuming it has to do with the truncation parameters I chose, but any other tips or suggestions would be most appreciated.

thermokarst · February 12, 2018, 12:51pm

Hi @slh277!

There are a few options here --- if the primers you need to remove are on the 5' end (and of a fixed length), you can use the --p-trim-left/--p-trim-left-f/--p-trim-left-r parameter(s) to specify a suitable trim length. If you need something with a bit more control, you can check out the q2-cutadapt plugin tutorial, in particular, the qiime cutadapt trim-paired method.

As far as your feature counts in Q2 vs Q1, those could certainly be impacted by adapter sequence, as well as truncation parameters, but DADA2 is a different method than OTU clustering altogether, so you can expect to see some differences. For more info on that, check out this post from @ebolyen and this post from @jairideout (these posts both have some general descriptions about DADA2 and ASV methods, but they do go into some other details that probably aren't relevant to you here). Finally, the DADA2 documentation and tutorials are an excellent resource, so please be sure to spend some time catching up there: DADA2: Fast and accurate sample inference from amplicon data with single-nucleotide resolution

Keep us posted with any more questions, thanks!