DEBLUR -p-trim-length

Brandon · February 26, 2018, 2:48pm

Thanks for your suggestions.

First, is each sample composed of sequence data from multiple primers, or do samples use a single primer where some samples used 341f-785r and some used 357f-806r?

Yes, half of the samples were sequenced by 341F-785R primer, while the rest were sequenced by 357F-806R primers.

Second, just to verify, the input data were deblur’d. The same input sequence data were then run through a closed reference OTU picking, using the representative sequences of the Deblur process as the reference database. Is that accurate?

Yes, it is right. The input data were deblur'd. rep-seqs-1.qza and table-1.qza were produced from 341F-785R, rep-seqs-2.qza and table-2.qza were produced from 357F-806R.

If so, then it may be worth testing an alternative process. For instance, one strategy would be to take all of your input data and run them through a closed reference approach against an existing 16S reference database like SILVA or Greengenes.
For instance,in Qiita when we integrate across primers, we just use closed reference OTU picking against Greengenes at 97%, which was the strategy used in Debelius et al for the HMP data.

I want to make sure I understand your suggestions. I import the raw reads into qiime2, join the reads from different primers respectively, qiime quality-filter q-score respectively, qiime vsearch dereplicate-sequences, merge the seqs1.qza and seqs2.qza into one seqs.qza, merge table1.qza and table2.qza respectively, then do the vsearch closed reference OTU picking with the following code,

qiime vsearch cluster-features-closed-reference --i-table table.qza --i-sequences rep-seqs.qza --i-reference-sequences 99_otus.qza --p-perc-identity 0.97 --o-clustered-table table-cr-99.qza --o-unmatched-sequences unmatched.qza --o-clustered-sequences seqs_99.qza --p-threads 4

Are these procedure correct? Shall I need to check the chimeras by qiime vsearch uchime-denovo BEFORE closed-reference OTU picking?
And there are three taxonomy assign methods, (1)classify-consensus-blast: BLAST+ consensus taxonomy classifier, (2)classify-consensus-vsearch: VSEARCH consensus taxonomy classifier, (3)classify-sklearn: Pre-fitted sklearn-based taxonomy classifier.
I have tried

qiime feature-classifier classify-consensus-vsearch --i-query rep-seqs.qza --i-reference-taxonomy ref-taxonomy-99.qza --i-reference-reads 99_otus.qza --p-maxaccepts 2 --p-perc-identity 0.97 --o-classification taxonomy-consensus.qza --p-threads 4

However, 6 samples test, 5 hours have passed. Still running.
I saw different suggestions in different places. Like here and here

May I get some suggestions of which taxonomy assign method shall I use? What are the differences between them?

There will be a primer effect, so I recommend testing for it if feasible.

May I know how to test the primer effect? What does that means?

Thanks so much for the patient.

Best.

Brandon