Variation between Mothur and QIIME2 analysis

(steffi) #1

Dear All,
I started my taxonomic analysis using qiime2 pipeline.
I followed the following steps:

qiime tools import --type ‘SampleData[SequencesWithQuality]’ --input-path se-33-manifest --output-path single-end-demux.qza --source-format SingleEndFastqManifestPhred33

qiime dada2 denoise-single --i-demultiplexed-seqs single-end-demux.qza --p-trim-left 0 --p-trunc-len 220 --o-representative-sequences rep-seqs.qza --o-table table.qza --o-denoising-stats stats-dada2.qza --p-n-threads 4

qiime feature-classifier classify-sklearn --i-classifier gg-13-8-99-nb-classifier.qza --i-reads rep-seqs.qza --o-classification taxonomy_gg.qza

qiime taxa barplot --i-table table.qza --i-taxonomy taxonomy_gg.qza --m-metadata-file sample_meta.tsv --o-visualization taxa-bar-plots_gg.qzv

Same set of analysis was done using mothur pipeline by company. When comparing with mothur results, I got more number of taxonomic classification with more abundance.

Note: We used control (buffer)rep-seqs.qza (10.2 KB)
table.qza (10.8 KB)
taxonomy_silva_200.qza (25.5 KB)
data for this analysis. We expect very less or no taxonomical classification.

What may be the reason?

(Justine) #2

Hi @steffi,

I want to make sure I understand the problem. You ran the same pipeline in QIIME 2 and Mothur. So, you did your denoising via DADA2 in QIIME2. From my understanding, you also ran de novo OTU picking in Mothur? How did you perform taxonomic assignment in Mothur? What algorithm and what reference database?

Is your total number of features different between the Dada2 table and the Mothur table? I’d expect Dada2 to result in more features overall (since it handles ASVs and is able to give higher resolution than OTU-based clustering at 99%), but possibly fewer counts. (I am, to be honest, less familiar with the QC protocols under the hood of each algorithm.)

If you look at the over all taxonomy plots, how do they compare? If you use a metric like Bray-Curtis distance or Jaccard distance, and perform a procrustes, does the relationship between the two methods remain similar, or do you have large scale differences in the inter-sample community structure?