I am new to running bacterial analysis. I have been running 16s pipeline v1 using Greengenes database. I was unable to get species specific data (most of my outputs looked like this k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f___;g___;s____, for example). I was advised to switch to the MicrobiomeHelper QIIME2 pipeline, as it is uses the SILVA database.
However, I found I encountered the same limitations in the output. Furthermore, I noticed that the output for SILVA and Greengenes were completely different. Some of the absolute quantities of the bacteria had similar numeric values, but none of the Order or Family names overlapped between the two. It appeared almost as if I were using two separate data sets (I had checked that I was not doing this).
Could anyone suggest what I may have been doing incorrectly based on this information?
Could you clarify a few things for us first.
The title of this thread implies a comparison of qiime1 vs qiime2 workflows, though you are only mentioning the difference between Silva vs Greengenes databases in your analysis.
Did you perform your first analysis in Qiime1 and classify with Greengenes and the Silva set using Qiime2 (though MicrobiomeHelper)? If so, these are very very different pipelines and would be like comparing apples to some sort of pineapple-banana hybrid. I wouldn’t even bother trying to figure out why the differences and just stick with the Qiime2 pipeline, or at the very least use the the methods wrapped in Qiime2 (i.e OTU picking vs ASV inference). Qiime1 is officially no longer supported either.
Next, not being able to get taxonomic assignments down to species or even genus level is very common in 16S short-read amplicon sequencing pipelines. This is a limitation of both the length of the reads and the completeness of the databases. Greengenes has not been updated in years while Silva is much more comprehensive and has frequent updates. It would be expected to have some differences in your assignments when comparing the two but if you have sequenced a novel taxon that is not in either databases, then there simply isn’t anything to call that feature and so it will be left at the higher known order. But I’m guessing the vast difference you are seeing is because of upstream differences between how your OTU tables have been created.
Thank you for your excellent answer!!!
Yes. We ran the pipeline with Qiime1 because that’s what we had been using in our lab previously. The data I ran recently wasn’t giving the complete results, so I figured I may as well update it to SILVA to see if we could complete our data.
This is the github link to the pipeline we are currently using to assign taxonomies: [https://github.com/LangilleLab/microbiome_helper/wiki/Amplicon-SOP-v2-(qiime2-2018.6)](http://Microbiome Helper). With this protocol, we are using steps 1, 2, 3 and 6. We are not using the Phylogeny FastTree or Rarefaction Curves, as well as the steps after number 6. Would this cause a major problem?
Hmm, that’s possible that it may be due to the OTU tables. I am fairly new to this whole process. Those are essentially generated in the pipeline right?
Thank you very much for your time!
Not sure what you mean exactly by:
But regardless, like I mentioned before, comparing those 2 pipelines is pretty useless here since they use such different pipelines.
The OTU tables you used to get from processes in Qiime1 were based on clustering methods based on arbitrary identity similarities. There are newer methods that are much better than OTU picking methods and so replacing OTU with higher resolution analogues, referred to amplicon sequence variants (ASV). Have a read through this paper for more details.
As for what you are doing in MicrobiomeHelper, those are essentially all qiime2 scripts so yes you will be getting an ASV feature table. Just carry on and trust this data more than your OTU methods.
Thanks again for the great reply, and thank you for including that paper .