Does deblur parameter introduce bias?

I had two samples, each sample are illumina pair-end sequenced. I imported them and joined the pair-end sequence with
qiime vsearch join-pairs --i-demultiplexed-seqs demux.qza --o-joined-sequences demux-joined.qza
then put the .qza to .qzv
demux-joined.qzv (286.5 KB)
then, I did the quality filtering,
qiime quality-filter q-score-joined --i-demux demux-joined.qza --o-filtered-sequences demux-joined-filtered.qza --o-filter-stats demux-joined-filtered-stats.qza
then, I did the deblur step. According to the demux-joined.qzv, I trimmed them at length of 400. I tried three set of parameters, each is:
--p-min-reads 10 --p-min-size 2 (default parameter)
--p-min-reads 0 --p-min-size 0 (I did this because I do not want to delete any low-count sequence)
--p-min-reads 1 --p-min-size 2 (Thanks to @wasade , change to --p-min-size to 2 may be better than 0)
The outcome are:
deblur-stats400-default.qzv (191.7 KB)

deblur-stats400-00.qzv (191.7 KB)

deblur-stats400-12.qzv (191.7 KB)

The thing goes weird as shown in deblur-stats400-00.qzv. In the sample SRR1460604, none of the sequences passed the positive filter, which seems impossible. In deblur-stats400-default.qzv, even if low-count sequence is deleted, there is still some sequence passed the positive filter. Why?

In the third .qzv file, it seems everything went smoothly....I think if I want to retain the low-count sequence, set the parameter to --p-min-reads 1 --p-min-size 2 will be better. But solve this problem may make me settled...

Hi @wym199633, to be honest, I’m not sure if using a value of 0 for --p-min-size is even valid? It is accurate that changing parameters will change the results. In this case, if low count sequences are important for your study design and question, then I think that setting --p-min-reads to 1 makes sense.


It seems that change the value of --p-min-size can make a chaos but don’t know why… But in this post what doubt me most is when change both two parameter to 0, one sample have no result :laughing:
I hope someone can explain this…
If --p-min-size must set to 2 (in order to let it run smoothly), that’s ok to delete one-count sequences in each sample (reluctantly…

--p-min-size affects what sequences are considered during the execution of Deblur which implicitly impacts singletons as Deblur is subtractive. --p-min-reads filters low abundant sequences after the execution of Deblur. It may make sense to look at the algorithm which can be found in a mathematical and pseudocode form in text S1 of Amir et al.


Great! Thanks for explanation!


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.