Interpretation of results after taxonomy analysis from decontaminated table

Mehrbod_Estaki · November 18, 2021, 10:55pm

Hi @joaomiranda,
Sounds good, keep us posted on the new results.

Based on the quality plots I see my gut feeling is you're still going to lose a lot of reads during the filtering process. But I could we wrong, we'll see after the run is over. For runs like this I like to start with conservative trimming and then if the results are good, I can relax my parameters. In your case I would truncate at say at the 200 position, at least, before that big dip you see on the 3' tail. The "median" marker you may have read in the forum is a recommendation as a starting point, it's not a definitive rule. For trimming the 5', you may be right that your 45 trim parameter might be just fine but remember that your quality plot is based on only a 10,000 randomly subsample of your total reads. When I see that odd second dip, in my head I think there's probably a ton more reads that have that dip in quality, and it's possible all of those might get filtered out before they are denoised. So again, I would start with conservative parameters with DADA2, then relax them if you think your depth is sufficient.

Note that that discussion is about when working with paired-end reads. PE reads are somewhat different because after merging you can have variable length amplicons so we can't cut to a constant length. In your case because you are using single-end reads, you should just make sure you extract the same length as your DADA2 trim/truncate values. You can always compare the results to a "full length" classifier that is freely available in the data resources page to see how they compare.

You already have that information. When you use your primers in amplification, you focus a specific region. Then later on, you use the same sequences to extract reads from a reference database. So essentially you end up with a reference database that has reads containing the exact same region as your primers in real life. But since you are trimming and truncating your reads during DADA2 further, it makes sense to also trim and truncate your reference reads to the same length. Again, this isn't necessary per se, as long as your reference reads fully encompass your primer region (thus why you can use full length classifier), but extracting your reference reads to the same exact region as your primers has been shown to improve classification a bit.

Hope that makes sense?