Dear all,
I want to utilize q2-breakaway
plugin (qiime2-2020.8).
Here is my state of knowledge:
- I need to retain singletons for breakaway to work its magic
- Results of breakaway depend on the number of singletons
- I can use output of DADA2 with pool=TRUE
- I can use output of Deblur
I have some ideas on how I would like to perform this analysis, but I have doubts:
a) First, I consider using q2-dada2
with pseudo-pooling (--p-pooling-method 'pseudo'
). It retains singletons but, I would assume, not as many as full-blown pool=TRUE option. How big of an issue is this for breakaway
?
b) Second, I consider Deblur. When using Deblur with default options, the singletons are removed, so I assume I need to Deblur data in a special way. I figured I could either retain all possible singletons using --p-min-reads 1 and --p-min-size 1 arguments...
qiime deblur denoise-16S \
--i-demultiplexed-seqs qiime_analysis/qa_joined_noadapter_for_deblur_SampleData[SequencesWithQuality].qza \
--p-min-reads 1 \
--p-min-size 1 \
--p-trim-length 401 \
--p-left-trim-len 0 \
--p-sample-stats True \
--p-jobs-to-start 4 \
--p-no-hashed-feature-ids \
--o-table qiime_analysis/deblur_singletons_FeatureTable[Frequency].qza \
--o-representative-sequences qiime_analysis/deblur_singletons_FeatureData[Sequence].qza \
--o-stats qiime_analysis/deblur_singletons_SampleData[DeblurStats].qza
... OR only "biological" singletons, so the ones which are unique in given sample, but not for the entire dataset (---p-min-size 1):
qiime deblur denoise-16S \
--i-demultiplexed-seqs qiime_analysis/qa_joined_noadapter_for_deblur_SampleData[SequencesWithQuality].qza \
--p-min-reads 10 \
--p-min-size 1 \
--p-trim-length 401 \
--p-left-trim-len 0 \
--p-sample-stats True \
--p-jobs-to-start 4 \
--p-no-hashed-feature-ids \
--o-table qiime_analysis/deblur_singletons_FeatureTable[Frequency].qza \
--o-representative-sequences qiime_analysis/deblur_singletons_FeatureData[Sequence].qza \
--o-stats qiime_analysis/deblur_singletons_SampleData[DeblurStats].qza
I cannot find, whether breakaway wants me to include all singletons or just the biological ones in the input?
c) Finally: option a) and b) return very different number of singletons. As far as I understand the method, this may influence results. I am unsure how to proceed with the analysis given this problem. I believe I could use breakaway_nof1
to check if the number of singletons is to big, but even if it is - what then? And what if the number of singletons is to small?
Help me, smart people, You're my only hope! Perhaps @Pauline_Trinh could be my Obi-Wan Kenobi...?