Why does deblur denoise lead to the removal of the complete removal of the larger samples

Dear qiime forum,

I am finding that use of deblur denoise might be too powerful for my dataset, as almost 99% of the input number of reads end up being removed.

  1. Reads per sample after q-score removal (column reads-raw from deblur-stats.qzv)
  2. Reads per sample after deblur filtering (column reads-deblur from deblur-stats.qzv)
  3. Contrast in number of reads per step in the processing of the samples.

Please see the images attached to this post.

I can see 6 of the 7 most abundant samples end up with 0 reads after the deblur step.
deblur-stats.qzv (223.1 KB)

I also see over 95% of remaining reads, after host gene removal, and q-score filtration, are deemed artifacts and I feel this might be too strict.

Could I please ask two questions based on these observations?
a) Is the deblur denoise step appropriate for my input data. I understand it has some limitations with the large difference in reads between samples, so this could be a factor which may lead to some issues. -> And so the dada2 step might be more appropriate.
b) Why are the most abundant samples being reduced to 0... this seems odd to me.

Many thanks to any help on this topic.

1 Like

Hello @KpatelBio,

Apologies for the delayed response.

Re a) We generally prefer to use dada2 over deblur, so yes I would recommend using dada2 if possible

Re b) This I cannot answer. If you want a deeper dive into it then please upload (or DM) your input data and the exact command you ran.

Thank you