Deblur-denoise results

bioinfo_C · June 27, 2018, 4:08pm

Hi,

I am little confused with the results of deblur denoise.
After running the deblur denoise I am not sure which numbers are used in the downstream analysis.
Here is my result table:

I am confused with fraction-artifact-with-minsize, if this % of sequences are dropped then I am loosing more than 50% of the sequences in few of my samples.

Also, when I checked the no. of sequences per sample in the table, the number is very low compared to the raw sequences I started my analysis with (40-50 times lower in some cases).

Please help in figuring out if I am doing something wrong.

Thanks!
PS: the command I used:

qiime deblur denoise-16S \ --i-demultiplexed-seqs joined-filtered.qza \ --p-trim-length 400 \ --o-representative-sequences rep-seqs-deblur.qza \ --o-table table-deblur.qza \ --p-sample-stats \ --o-stats deblur-stats.qza \ --p-jobs-to-start 20 \ --verbose

I am using demuliplexed merged data.

thermokarst · June 28, 2018, 1:22pm

Hey there @bioinfo_C!

Did you happen to see the note at the top of the visualization about column descriptions (I personally didn't know about this feature)? Check this out!

54%20AM

The mouseover description is here:

The fraction of reads which appear to be artifactual including those below the min-size threshold. This is computed as (reads-hit-artifact + (reads-raw - reads-derep)) / reads-raw

So, that sounds to me like this metric is an aggregate of reads dropped from either being marked as "artifactual", or not passing the min-size setting. My next question is, "what is the min-size"? I pulled up the docs and found this:

  --p-min-size INTEGER            In each sample, discard all features with an
                                  abundance less than min_size.  [default: 2]

So, my interpretation of this is that 0% of your reads were "found to be artifactual" (I am getting that from the next column to the right, "fraction-artifact"), which means the ~53% you see in "fraction-artifact-with-minsize" are effectively the fraction of reads dropped due to being observed just one time (since it doesn't look like you changed this setting at all, so the default applies).

Does that help clear things up at all? I think I learned something about deblur today, so thanks for the opportunity! :qiime2:

bioinfo_C · June 28, 2018, 1:51pm

Thanks thermokart for the explanation.

So, my interpretation of this is that 0% of your reads where “found to be artifactual” (I am getting that from the next column to the right, “fraction-artifact”), which means the ~53% you see in “fraction-artifact-with-minsize” are effectively the fraction of reads dropped due to being observed just one time (since it doesn’t look like you changed this setting at all, so the default applies)

I understood that 53% of my reads were singletons and they were dropped from the further analysis.
My question is which column from the table is used for downstream analysis as when I look into the 'Interactive sample detail' in the table file, the number of reads in the same sample is 1.6% of the raw reads I started my analysis with.

So, my raw reads are around 1 million and after deblur, in the interactive table, it is just 16,000.

I just want to make sure that this drop in the number of reads is not because I am doing something wrong and this is normal. Also, my concern is that I will be using only the 2-4 % of my sequenced data, and will this gives me right results?

NOTE:
I used 250 bp paired-end reads, after merging (amplicon size ~450) and quality filtering ( quality > 30) in the qiime deblur denoise-16S step I used --p-trim-length 400. Should I lower this length to get more sequences.

Thanks again for your help.

thermokarst · July 2, 2018, 7:43pm

16,000 unique features, or 16,000 total features?

Have you had a chance to read the deblur paper for more details on the method, including some basic benchmarks?

Hope that helps!