I am processing multi-amplicon sequencing data from Ion GS S5 sequencing platform. My basic approach involves DADA2 denoising followed by feature classification using classify-consensus-blast/vsearch.
After classification, I am getting partial taxonomic assignment ( till family level/ class level) eg- "d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;;;__" .
I understand this might be happening due to poor primer trimming/small amplicon length or some other reason associated with sequence quality. However, when I take the ASV sequence and try using blast (NCBI nr) I am able to pickup the genus level taxonomy correctly.
What I want to know is, if there is any other way to improve the taxonomic assignment, to get a genus level assignment (even with little low confidence) for most of my ASVs ??
I tried using vsearch separately for the such ASVs with 0.8 cutoff, and the results are not very convincing.
I am assuming you are running with default settings against using either SILVA or Greengenes reference database?
A few things to note:
You could be observing a limitation of the reference database being used (i.e. SILVA / Greengenes).
Be wary of how BLAST hits are displayed on NCBI. That is, equivalent BLAST hits are arbitrarily sorted, and if you scroll down far enough you may find that there is an identical "hit" to a very different organism.
Given 2, this is why we have classify-consensus-blast and classify-consensus-vsearch. Any hits which cannot be taxonomically resolved have their taxonomy truncated to the last common ancestor. This also applies to classify-sklearn too.
More information can be found here:
You can also try your hand at using RESCRIPt to make your own reference database for classifying your sequences:
This is the limitation of assigning taxonomy using short reads. However, you can use tools like q2-clawback to help improve things:
What is the length of sequences you've generated from this platform? Quality?
It is quite common for many reads to only be classified to upper-level taxonomy. Have you tried using feature-classifier classify-sklearn ?
Can you share your taxonomy barplot qzv fle? You can DM me this file if you do not wish to share publicly. This way I can also look through the provenance and try to piece together what your processing steps are.
I am using data from Thermo 16s multi amplicon Kit.
It consists of 6 amplicons of length 200-250 base pair covering 7 variable regions
One issue is that the primers are unknown. The way we are dealing is by 1) trimming 20 base pairs from both sides. OR 2) Crack the primer sequence to the extent possible and use cut adapt to trim them (however, the reads are of mixed orientation from the sequencing machine, so it's less efficient sometimes I feel).
Since I want the classification to happen without separating the reads based on variable regions, it's not possible to use classify-sklearn.
I am attaching one of the barplot I generated after classify-consensus-blast taxonomy assignment. If you scroll down, you will find many taxa with classification till order or family level. I am not expecting until species level for sure, but it will be good to have genus level.
The only reason I am a little greedy to classify them to genus level is that I realized some of the partially classify reads could change the differential abundance statistics if we take them re-classify using vsearch or NCBI BLAST, and add them to their original taxa.
I am trying to use classify-sklearn now by separating the reads region by region. I hope it works well.