Hi all! I am new in this field of bioinformatics, so I am trying to create a pipeline for our runs and comparing to qiime1 pipeline (created by another student in the lab).
My qiime2 pipeline detect just a few species in my samples while qiime1 detects a lot of different species.
Hi! I am starting working with microbiology and I am very newbie at this matter.
I am comparing a qiime1 pipeline with my new qiime2 pipeline.
When I compare sample1 with qiime 1 I can classify several different organisms. But in qiime2 I have 50% of my sequences classified at the the Domain level only.
I am suspecting I am training the classifier wrong?
Hi @borgesrodrigo! I merged your other topic with this one since they're similar questions. There's several differences in the processing steps taken by QIIME 1 vs QIIME 2 to get from raw sequences to taxonomic classifications, so there's a few variables you'll need to control for when comparing results:
Quality filtering: QIIME 1 uses PHRED-score based filtering and trimming, while QIIME 2 uses DADA2 to denoise the data.
Clustering/feature selection: QIIME 1 performs OTU picking with uclust by default (closed-reference, open-reference, or de novo), while QIIME 2 uses DADA2 or Deblur to find sequence variants.
Taxonomy assignment: QIIME 1 uses a uclust consensus assigner by default to perform taxonomic classification, while QIIME 2 has a scikit-learn naive Bayes classifier, along with consensus vsearch and blast classifiers.
If you want to just compare differences between the taxonomy classifiers, you could generate your representative sequences using a QIIME 1 OTU picking script, perform taxonomy assignment on them with assign_taxonomy.py, and then import the representative sequences into QIIME 2 and perform taxonomy assignment on them. You could then compare the results between the two classifiers.
You'll need to make sure you're performing taxonomy assignment using the same representative sequences and reference database files. Since you're using the naive Bayes classifier in QIIME 2, I recommend performing taxonomy assignment with the RDP classifier in QIIME 1, since those methods are similar. QIIME 1 by default doesn't use the RDP classifier so you'll need to configure that in your analyses.
You could export the sequence data from the output of qiime feature-classifier extract-reads (i.e. ref-seqs_silva.qza) and see how many sequences were successfully extracted. You can check that their length makes sense and compare how many extracted reads you have to the number of full-length reference sequences.
You might also try performing taxonomy assignment using the full-length reference sequences (i.e. skipping qiime feature-classifier extract-reads). That will be the fairest comparison to QIIME 1's invocation of the RDP classifier.
Thank you @jairideout ! I thought that something went wrong with my initial post so I have written the second one. Sorry for the trouble.
Your answers are everything I need (I believe) for now. thank you so much for your time. You have guided me also to filter what I sould read (I have a lot of papers to read about everything related to microbiology).
I will take some time to do all of this, but I will come back here and register the advances.