Adding read number tracking and amplicon size overview to FeatureTable summary

jairideout · October 13, 2017, 7:17pm

Hi @yanxianl, thanks for these suggestions!

This type of DADA2 reporting was discussed in this forum topic. We'll follow up here when the feature is available in a qiime2 release!

@wasade does q2-deblur log this kind of filtering information?

@Nicholas_Bokulich, would this type of filtering make sense in the quality-control plugin you're working on?

I agree tracking this type of filtering information would be useful, though I'm not sure it'd be possible to display this information in the feature-table summarize visualization. The reason is that QIIME 2 is decentralized, such that there is no set of sequential steps in an analysis like we had in QIIME 1. QIIME 2 is more of a "choose-your-own-adventure", where you could, for example, choose to denoise your data with DADA2, or perform quality-score based filtering with q2-quality-filter and then denoise the data with Deblur. You can even avoid denoising algorithms altogether and cluster your sequences into OTUs, similar to QIIME 1.

Each of these steps could perform filtering in different ways, and track that information differently too (it's up to the plugin developer how they want to do that). Thus, by the time a user generates a feature table summary with feature-table summarize, we don't have access to any of those "upstream" quality-filtering steps; all we have is a feature table that tells us how many features we have in our data set, and how abundant those features are. An artifact's provenance tells us what actions were executed to create the feature table, but we don't know how many sequences were filtered out in "upstream" analyses.

Besides the DADA2 filtering reporting I linked to above, you can accomplish what you're looking for by using demux summarize to inspect how many sequences each sample has prior to quality-filtering / denoising. You can then apply whatever denoising or clustering analyses you'd like, and use feature-table summarize to see how many sequences remain in your feature table (that info is listed as Total frequency under the Table summary heading).

So while it would be difficult to display all of the information you're describing in feature-table summarize, it's possible to find that info by comparing demux summarize to feature-table summarize.