Why do I have more sequences (therefore more taxonomic groups recovered) when I use only my forward sequences ?

SoilRotifer · September 9, 2021, 4:05pm

Remember you are using the denoise-paired option of DADA2. That is both the forward and reverse reads must pass quality checks, otherwise the read pair is discarded. Passing this stage, if the two reads can not be merged the read pair is discarded.

The two main reasons why reads can fail merging are:

low-quality bases calls in the region of overlap. That is, there is an increase mis-matching base calls for the same position. For example, the forward read may have a low quality base call of an A while the reverse read may have a low-quality base call of a C. To many of these mismatches will cause merging to fail.
The reads are not long enough to overlap. For more details see this thread.

As you can see these two issues can be related. That is, the less overlap you have, in combination with low-quality bases in the region of overlap, will cause a failure of read merging.

I'd suggest truncating your reads a bit more if you can. See the thread I linked above to guide your truncation values. Also, what gene / gene region are you sequencing? 16S rRNA gene? V3V4?