Interpretation of summary statistics - Deblur

Mehrbod_Estaki · May 6, 2020, 7:00am

Hi @ptalebic,
Check out this previous post that may give some more detailed insights regarding deblur stats meanings. Some of your other questions below:

Nope. Just depends on what your deblur input was. From the looks of it you didn't do any truncating so none of the reads were truncated and thus the 0 s.

This first visualizer is really only describing # of reads you had (total-input-reads), how many were retained after truncating/length filtering (total-retaind-reads), and the other 3 columns explain what happened in the other filtering steps.

The column reads-hit-reference is going to show the total # of reads you retained following deblur. This should match the number of reads per sample you see if you were to summarize your feature-table with feature-table summarize.

This refers to the positive filter Deblur uses which is the greengenes database ( 88% clustered OTUs by default) with some very permissive inclusion criteria (65% identitty, with 50% coverage). Basically if your reads don't look anything like something in this database then they will be tossed.

Basically if you have 1000 reads that are identical, it makes sense to only denoise one of those reads and apply the results to the rest, instead of doing it 1000 times which is time-consuming, computationally expensive, and redundant. Dereplication is what gives you the rep-seqs.qza which are "representative sequences".

Your questions are great! We're happy to help. I also would recommend searching through the forum for key words in your questions, it is likely they have been asked before. This forum has mounds of useful info buried in it!
Best