About joining paired reads and quality control

colinbrislawn · August 12, 2020, 3:31am

Welcome to the Qiime 2 forums! I appreciate your detailed description of your data type and clear questions.

I would like to address your questions in reverse order.

Regarding Denoising

There are many good options for denoising! Deblur is a good option, and DADA2 is another. This choice is based on your input data (read length, region sequenced, overlap, etc), and is up to you.

Regarding Quality Control

The method used in quality-filter q-score-joined is depreciated because there might be even better options available, depending on the data you have.

Modern methods often perform quality control, joining, and denoising as a single step, or use the processing of joining reads to increase the confidence (q-score) of the output data.

If you only have joined reads, then this method could be your best option. But you have unjoined reads, so you may be able to make use of these better methods.

That section is more of a 'Conceptual overview of QIIME 2'...
Compare that to the Atacama soil tutorial, that also uses paired end Illumina reads from 16S rRNA samples, just like you have! The main difference is that the sample type is Atacama Desert instead of human feces , but the steps for this data type are the same.

The Atacama soil tutorial uses DADA2, which is one of the methods that performs joining and denoising as a single step.

Regarding joining paired reads

Using the Atacama soil tutorial as an example, the goal is to trim off the low quality start and end of your reads, so that more reads can join and help to error correct each other.

In that tutorial, they suggest:

  --p-trim-left-f 13 \
  --p-trim-left-r 13 \
  --p-trunc-len-f 150 \
  --p-trunc-len-r 150 \

Based on your quality score plots, I would suggest starting with

  --p-trim-left-f 25

because I see some low quality score at the start of your forward read.

Regarding joining of paired reads

That depends on your forward and reverse read length (300f + 300r) and and the total length of the V3-V4 region you sequenced. What primers did you use?
(Here is a short example of how read overlapping works.)

DADA 2 does this a little differently than vsearch join-pairs but knowing the expected area of overlap is still very important!

I think you are off to a good start, and are asking all of the right questions. My first question for you is about your primers and their area of overlap. With that answered, we can select a denoising method (dada2, deblur, something else) and start choosing the best settings for your data.

Keep in touch,
Colin