Singletons vs. breakaway

Dear all,

I want to utilize q2-breakaway plugin (qiime2-2020.8).

Here is my state of knowledge:

  1. I need to retain singletons for breakaway to work its magic
  2. Results of breakaway depend on the number of singletons
  3. I can use output of DADA2 with pool=TRUE
  4. I can use output of Deblur

I have some ideas on how I would like to perform this analysis, but I have doubts:

a) First, I consider using q2-dada2 with pseudo-pooling (--p-pooling-method 'pseudo'). It retains singletons but, I would assume, not as many as full-blown pool=TRUE option. How big of an issue is this for breakaway?

b) Second, I consider Deblur. When using Deblur with default options, the singletons are removed, so I assume I need to Deblur data in a special way. I figured I could either retain all possible singletons using –p-min-reads 1 and --p-min-size 1 arguments…

qiime deblur denoise-16S \
	--i-demultiplexed-seqs qiime_analysis/qa_joined_noadapter_for_deblur_SampleData[SequencesWithQuality].qza \
	--p-min-reads 1 \
	--p-min-size 1 \
	--p-trim-length 401 \
	--p-left-trim-len 0 \
	--p-sample-stats True \
	--p-jobs-to-start 4 \
	--p-no-hashed-feature-ids \
	--o-table qiime_analysis/deblur_singletons_FeatureTable[Frequency].qza \
	--o-representative-sequences qiime_analysis/deblur_singletons_FeatureData[Sequence].qza \
	--o-stats qiime_analysis/deblur_singletons_SampleData[DeblurStats].qza

… OR only “biological” singletons, so the ones which are unique in given sample, but not for the entire dataset (—p-min-size 1):

qiime deblur denoise-16S \
	--i-demultiplexed-seqs qiime_analysis/qa_joined_noadapter_for_deblur_SampleData[SequencesWithQuality].qza \
	--p-min-reads 10 \
	--p-min-size 1 \
	--p-trim-length 401 \
	--p-left-trim-len 0 \
	--p-sample-stats True \
	--p-jobs-to-start 4 \
	--p-no-hashed-feature-ids \
	--o-table qiime_analysis/deblur_singletons_FeatureTable[Frequency].qza \
	--o-representative-sequences qiime_analysis/deblur_singletons_FeatureData[Sequence].qza \
	--o-stats qiime_analysis/deblur_singletons_SampleData[DeblurStats].qza

I cannot find, whether breakaway wants me to include all singletons or just the biological ones in the input?

c) Finally: option a) and b) return very different number of singletons. As far as I understand the method, this may influence results. I am unsure how to proceed with the analysis given this problem. I believe I could use breakaway_nof1 to check if the number of singletons is to big, but even if it is - what then? And what if the number of singletons is to small?

Help me, smart people, You’re my only hope! Perhaps @Pauline_Trinh could be my Obi-Wan Kenobi…?

1 Like

Hi @AdrianS85! This is on my radar to respond to but am a little swamped at the moment! If I don’t address this by the weekend bug me again. Sorry for the delay but in the meantime I would suggest having a read through this if you haven’t already: https://arxiv.org/pdf/1604.02598.pdf

Sorry about the delay!!
Pauline

1 Like