Does deblur parameter introduce bias?

I had two samples, each sample are illumina pair-end sequenced. I imported them and joined the pair-end sequence with
qiime vsearch join-pairs --i-demultiplexed-seqs demux.qza --o-joined-sequences demux-joined.qza
then put the .qza to .qzv
demux-joined.qzv (286.5 KB)
then, I did the quality filtering,
qiime quality-filter q-score-joined --i-demux demux-joined.qza --o-filtered-sequences demux-joined-filtered.qza --o-filter-stats demux-joined-filtered-stats.qza
then, I did the deblur step. According to the demux-joined.qzv, I trimmed them at length of 400. I tried three set of parameters, each is:
--p-min-reads 10 --p-min-size 2 (default parameter)
--p-min-reads 0 --p-min-size 0 (I did this because I do not want to delete any low-count sequence)
--p-min-reads 1 --p-min-size 2 (Thanks to @wasade , change to --p-min-size to 2 may be better than 0)
The outcome are:
deblur-stats400-default.qzv (191.7 KB)

deblur-stats400-00.qzv (191.7 KB)

deblur-stats400-12.qzv (191.7 KB)

The thing goes weird as shown in deblur-stats400-00.qzv. In the sample SRR1460604, none of the sequences passed the positive filter, which seems impossible. In deblur-stats400-default.qzv, even if low-count sequence is deleted, there is still some sequence passed the positive filter. Why?

In the third .qzv file, it seems everything went smoothly....I think if I want to retain the low-count sequence, set the parameter to --p-min-reads 1 --p-min-size 2 will be better. But solve this problem may make me settled...

Hi @wym199633, to be honest, I’m not sure if using a value of 0 for --p-min-size is even valid? It is accurate that changing parameters will change the results. In this case, if low count sequences are important for your study design and question, then I think that setting --p-min-reads to 1 makes sense.

Best,
Daniel

It seems that change the value of --p-min-size can make a chaos but don’t know why… But in this post what doubt me most is when change both two parameter to 0, one sample have no result :laughing:
I hope someone can explain this…
If --p-min-size must set to 2 (in order to let it run smoothly), that’s ok to delete one-count sequences in each sample (reluctantly…

--p-min-size affects what sequences are considered during the execution of Deblur which implicitly impacts singletons as Deblur is subtractive. --p-min-reads filters low abundant sequences after the execution of Deblur. It may make sense to look at the algorithm which can be found in a mathematical and pseudocode form in text S1 of Amir et al.

Best,
Daniel

Great! Thanks for explanation!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.