Low diversity... but maybe too low?

jairideout · August 22, 2017, 1:06am

Hi @borgesrodrigo! I merged your other topic with this one since they're similar questions. There's several differences in the processing steps taken by QIIME 1 vs QIIME 2 to get from raw sequences to taxonomic classifications, so there's a few variables you'll need to control for when comparing results:

Quality filtering: QIIME 1 uses PHRED-score based filtering and trimming, while QIIME 2 uses DADA2 to denoise the data.
Clustering/feature selection: QIIME 1 performs OTU picking with uclust by default (closed-reference, open-reference, or de novo), while QIIME 2 uses DADA2 or Deblur to find sequence variants.
Taxonomy assignment: QIIME 1 uses a uclust consensus assigner by default to perform taxonomic classification, while QIIME 2 has a scikit-learn naive Bayes classifier, along with consensus vsearch and blast classifiers.

If you want to just compare differences between the taxonomy classifiers, you could generate your representative sequences using a QIIME 1 OTU picking script, perform taxonomy assignment on them with assign_taxonomy.py, and then import the representative sequences into QIIME 2 and perform taxonomy assignment on them. You could then compare the results between the two classifiers.

You'll need to make sure you're performing taxonomy assignment using the same representative sequences and reference database files. Since you're using the naive Bayes classifier in QIIME 2, I recommend performing taxonomy assignment with the RDP classifier in QIIME 1, since those methods are similar. QIIME 1 by default doesn't use the RDP classifier so you'll need to configure that in your analyses.

You could export the sequence data from the output of qiime feature-classifier extract-reads (i.e. ref-seqs_silva.qza) and see how many sequences were successfully extracted. You can check that their length makes sense and compare how many extracted reads you have to the number of full-length reference sequences.

You might also try performing taxonomy assignment using the full-length reference sequences (i.e. skipping qiime feature-classifier extract-reads). That will be the fairest comparison to QIIME 1's invocation of the RDP classifier.

Let us know what you find, and good luck!