Improving taxonomic classification

Hello everyone,

My colleagues and I are doing the analysis of some low biomass samples with Qiime 2 (v2021.4). It’s the first time that we do a microbiota analysis and we have found some strange results. In our Feature table we have a total of 56 samples, 4 commercial mock community replicas and some negative control samples. We have trained the classifier with the SILVA database following the instructions described in “Moving Pictures” and “Parkinson’s Mouse” tutorials and then we have taxonomically classified all the samples.

We have started by analyzing the samples of the mock community to check if the analysis has gone well and we have found that approximately 20% of the reads are unassigned and almost 30% are assigned in the domain taxon (d_Bacteria), which for practical purposes it have not been classified either. Thus, we have half of the reads unclassified even at the phylum level. The commercial mock community does not have low biomass so we expected this analysis to be more accurate than for our samples. We have also reviewed the samples and obtained similar results, with 50% unassigned.

Alternatively, we have used Kraken software available in Illumina BaseSpace to check if the problem is our lab processing or the bioinformatics analysis. This software has given us the proportion of bacteria described by the manufacturer of the community mock, so we think that we are doing something wrong in qiime 2. That’s the code used:

qiime feature-classifier extract-reads
–i-sequences silva-138-99-seqs.qza
–p-min-length 200
–p-max-length 600
–o-reads ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs.qza
–i-reference-taxonomy ref-taxonomy.qza
–o-classifier classifier.qza
qiime feature-classifier classify-sklearn
–i-classifier classifier.qza
–i-reads rep-seqs.qza
–o-classification taxonomy.qza

How can we improve our analysis to obtain a higher number of assignments?

Thank you very much for your help!

Welcome to the forum, @Sergio_Garcia_Segura!

This often indicates that your sequences are in mixed orientation (both forward and reverse together). Take a look at the RESCRIPt tutorial. The orient-seqs command will align your sequences to a reference, and may take care of the issue for you. Let us know how it goes!