My colleagues and I are doing the analysis of some low biomass samples with Qiime 2 (v2021.4). It’s the first time that we do a microbiota analysis and we have found some strange results. In our Feature table we have a total of 56 samples, 4 commercial mock community replicas and some negative control samples. We have trained the classifier with the SILVA database following the instructions described in “Moving Pictures” and “Parkinson’s Mouse” tutorials and then we have taxonomically classified all the samples.
We have started by analyzing the samples of the mock community to check if the analysis has gone well and we have found that approximately 20% of the reads are unassigned and almost 30% are assigned in the domain taxon (d_Bacteria), which for practical purposes it have not been classified either. Thus, we have half of the reads unclassified even at the phylum level. The commercial mock community does not have low biomass so we expected this analysis to be more accurate than for our samples. We have also reviewed the samples and obtained similar results, with 50% unassigned.
Alternatively, we have used Kraken software available in Illumina BaseSpace to check if the problem is our lab processing or the bioinformatics analysis. This software has given us the proportion of bacteria described by the manufacturer of the community mock, so we think that we are doing something wrong in qiime 2. That’s the code used:
qiime feature-classifier extract-reads
qiime feature-classifier fit-classifier-naive-bayes
qiime feature-classifier classify-sklearn
How can we improve our analysis to obtain a higher number of assignments?
Thank you very much for your help!