Handling DADA2 and Feature table results

Hello, this is my first time working on a bioinformatics projects and I am having trouble making sense of my feature table results.

The summary showed that over 700,000 features were found which sounds unreasonable for 16srRNA data. I thought it was due to primers not being trimmed but my PI explained that removing primers shouldn't be necessary and they are part of the region were interested in V3-V4. Removing primers would also significantly reduce the length from each 250bp sequence.

The quality of the data is high so this is what my PI suggested I do with the parameters rather than trimming:

qiime dada2 denoise-paired
--i-demultiplexed-seqs .qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 30
--p-max-ee-r 5
--p-max-ee-f 5
--p-min-fold-parent-over-abundance 5
--o-representative-sequences asv-sequences-3.qza
--o-table feature-table-3.qza
--o-denoising-stats dada2-stats3.qza

I understand these parameters allow dada2 to be more lenient but I wanted to make sure if my issue really is the presence of the primers and what options I have to handle it. Thank you!

Hello! Welcome to the forum!

Do you mean 700,000 unique ASVs (very high), or is it the total count of all ASVs with repeats (not surprising)? I assume that reads are merged by the overlapping region, not just concatenated (possible in dada2 R version).

Still, removing primers is recommended - they may introduce false variability, increase false positives at the chimera detection step and affect dada2 outputs. I would trimm them with cutadapt before dada2. Yes, it will reduce the length, but it will increase the quality of the analyses. Reduced length will not affect merging step.

Parameters suggested by your PI looks good, you can try them, but I still would remove primers.

2 Likes

Thank you for your response! Yes that's the number of unique ASVs. I will go ahead and remove the primers as suggested.

Can you also check if barcodes were removed from the sequences during the demultiplexing step?

yes the data was already demultiplexed with bar codes removed before importing to qiime

Thank you for your help removing the primers really helped! The ASVs are now 14,000 instead. I wanted to confirm a few concepts though. The lengths are now 227 for the forward read and 224 for the reverse read after primer removal so the merged length decreased from the expected coverage of 450-480bp for the V3-V4 region is that normal? I understand primer removal is an initial step for analysis and the reasons I have read was because it introduces artificial variance and inflation of diversity which is what happened in my case but aren't primers natural part of the amplified region and the sequencing facility would use one set with degenerate bases to match the natural sequence? I guess my followups are why do primers introduce this high variance if they are part of the amplified region and is it acceptable to shorten the merged length to bellow 450-480bp? Thanks again!

3 Likes

Looks much better than 700.000 unique ASVs!

It depends on how the coverage region is calculated. If it includes the primers, then yes, trimming them should decrease the length. For me, merging stats are more important, especially compared to the number of reads retained after filtering step.

I think that this part of your question "degenerate bases" is the key. Yes, primers are the part of biological sequence but they are designed to target as much of bacterial 16S rRNA gene sequences as possible, ideally all of them at the same time. So they include degenerate bases (googled the figure):

It means that primers are a mixture of highly similar but distinct sequences. Having them in ASVs, even from the same species, may lead to artificial variance because several variants of the same primer can target the same DNA sequence.

Because primers contain degenerate bases and are not 100% specific. To the second part, yes, it is acceptable considering that the length of the V3-V4 region varies between the species, and you trimmed the primers. Again, I would check if merging the stats looks good.

Best,

3 Likes

Forgot to add that you should compare not only the number of unique ASVs, but also the total frequencies. Relatively small differences between total frequencies and big differences between unique ASVs numbers would by a good sign for me, while big differences between total frequencies should alarm regarding possible loss of sequences due to merging/filtering steps.

3 Likes

That makes much more sense now thank you! I checked the total frequencies haven't changed much after removing the primers just the unique features so I continued the analysis and it looks good!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.