Understanding dada2 feature selection based on input reads counts

Hi Qiime2 team,

I am working on 2x150bp Hi-Seq reads where each sample has close to 4-5 million reads. I had to trim the first 20bp from both the forward and reverse reads as they are the 515F and 806R primers for the v4 region of 16s rRNA. So I will ultimately end up with 2x130bp which is more or less the sequence length of the amplicon and I will probably get a 5bp overlap. I used bbmerge to see how the merge worked and I see that only 0.4% (for most of the samples) is joined using default flags values and I got the same from the feature table that I got in dada2 feature table as well.
I looked at the total number of features I got from this analysis and it was less than 1000. I have a few questions -
1- How can I merge the sequences better ?
2- What is the average number of features I should expect with some high quality sequences ?
3- How much does the depth of sequencing affect the number of feature we ultimately get ?

I really appreciate your guidance.


Hi @venkar,

I don’t think you really can with ~5bp of overlap, if the nucleotides were random, you would still only have 1024 possible ways to merge ~5bp which means even if you did merge, you would have a lot of chimeric merges.

I would just use the forward reads on their own.

That depends entirely on the environment. The features here are amplicon sequence variants and represent how many different amplicons exist in the sample (in a perfect world anyway).

Assuming the denoising is working well, you should see the number of features plateau with deeper sequencing as eventually you will have captured every amplicon in the sample (once again, in a perfect world).

1 Like

Hi @ebolyen -
Thank you for your reply. I appreciate it.
I just have a follow up question - As my reverse reads are of good quality, will I still be able to use them separately?


Hey @venkar,

Absolutely, just treat them as single-end as far as QIIME 2 is concerned. They can even still be recorded as “reverse” if you care to annotate them that way in something like a FASTQ manifest.

1 Like

Thank you for the reply.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.