Three question regarding to qiime2 dada2 denoise

Hello

1.We purposed Zymo standard which contains 10.1% E.coli and 10.4% Salmonella. We realize when doing taxonomy at level 6, qiime2 can classify Salmonella very well, however, the majority of E.coli are classified as Enterobacteriaceae rather than Escherichia. Is there a way that we can solve this problem? (I am using green gene to train my classifier)

  1. For qiime2 dada2 denoise, there are three chimera-method [pooled|consensus|none] that we can choose. Where can I find information regarding the algorithms behind these three methods? And how are they different from the VESEARCH borderline and VESEARCH exclude borderline.

  2. For qiime2 dada2 denoise. I was trying to compare the difference between these two commands.
    The first command have len-r 0 and len-f 0 which means qiime2 should do any trimming on the data right. But how come I am getting less total read from the first command compare with the second command?

Thanks for your time

Bowen

qiime dada2 denoise-paired *
** --i-demultiplexed-seqs STDS-demux-paired-end.qza *

** --p-trunc-len-f 0 **
** --p-trunc-len-r 0 **
** --p-n-threads 8 **
** --p-chimera-method none **
** --o-table STDS-trim-table.qza **
** --o-representative-sequences STDS-rep-non-trim-seqs.qza **
** --o-denoising-stats STDS-denoising-non-trim-stats.qza &**

qiime dada2 denoise-paired *
** --i-demultiplexed-seqs STDS-demux-paired-end.qza *

** --p-trunc-len-f 4 **
** --p-trim-left-f 300**
** --p-trunc-len-r 5 **
** --p-trim-left-r 224**
** --p-n-threads 8 **
** --p-chimera-method none **
** --o-table STDS-trim-table.qza **
** --o-representative-sequences STDS-rep-non-trim-seqs.qza **
** --o-denoising-stats STDS-denoising-non-trim-stats.qza &**

The clawback plugin improves species-level classification, but this is meant for realistic datasets so may not help in your case with mock communities (you could make taxonomic weights based on the expected composition of that community, but that would be overfitting to that mock community and not perform well for real samples):

QIIME 2 is just wrapping these other methods, so I suggest that you read the documentation for vsearch, dada2, and deblur to understand how their chimera checking methods differ.

No trimming means the sequences will contain more errors, meaning that dada2 is more likely to throw out those sequences during the pre-filtering step. You should always check out the denoising stats artifact to determine how many reads are being lost at each step.

I hope that helps!

Hello Nicholas

Thanks for your answers. When I am using MOTHUR, looks like Mothur can classify E.coli into species and I am using Green gene to trained both classifier. I was just wondering apart from classifier, is there other factor that could potential influence classify E.coli into genus level rather than family level, for example VESEARCH-clustering which is not method in the “moving picture” tutorial.

Thanks for your time
Bowen

Hi @zhang_sonic,
Yes you could try another classifier like the vsearch-based classifier.

You can also adjust the parameters used by the sklearn classifier in q2-feature-classifier. Primarily the confidence parameter. That classifier should operate very similarly to RDP classifier (it is more or less the same method under the hood), but the default confidence setting is different. You could use confidence=0.5 or another setting to see what optimizes detection of E. coli in your mock community.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.