a problem of taxonomy annotation

Hello everyone!
When annotatting the taxonomy of a 16S data, I met a problem. Could anyone help me? Thanks!
There is a 16S sequence data, which includes 3 pairs of primer amplified sequences. Firstly, I imported the data into qiime2, then I use q2-cutadapt to cut the primer and after joining paired end sequences and quality filter, I use deblur to obtain rep-seqs.qza and table.qza. until then, everything is ok ! But When I use q2-feature-classifier to annotate the taxonomy with [Greengenes 13_8 99% OTUs full-length sequences (https://data.qiime2.org/2019.7/common/gg-13-8-99-nb-classifier.qza), I found many sequences unassigned and many sequences only assigned to k__Bacteria, especially in several samples that are amplified using the same one pair of primer called a. Look at the picture below:

I want to know the reason why it happens, So I divided the 16S data into three parts according to the primer. After dealing with the 3 parts following the same work flow, I found the annotation of every part seemed right. Look at the picture below:
Then I used the function-merge-taxa to merge three parts. But it looks something false, there are still many sequences unassigned and many sequences assigned to k__Bacteria. Look at the picture:

It is difficult for me to solve the problem, I want to obtain the right annotation and get the differential taxa between two groups. Could anyone help me? Thanks!

The issue here is that the sequences are not in the same orientation. The classify-sklearn classifier currently only works with sequences that are all in the same orientation. Evidently primer “a” is in reverse orientation relative to the other primers, so you can achieve good classification when you classify these sequences alone, but the classifier is getting confused (due to mixed orientations) when you attempt to classify together with the other amplicons.

I am assuming you are using merge-taxa with the original classification, which already contains (incorrect) classifications for those sequences. There is an easy fix; see the --help docs for that method, the order in which you list the taxonomy files determines which takes priority as the “correct” classification to keep whenever there are duplicate IDs. This will allow you to merge the taxonomies and keep the correct classification, i.e., when primer “a” sequences are classified on their own.

Good luck!

@Nicholas_Bokulich Ok, I got it! Thank you very much!

Hi @Zhanzhan,
I agree with @Nicholas_Bokulich, but Im not sure I understand well your dataset. Did you amplify different 16S regions, using 3 different primer pairs? If so you should separate them before the analysis. Also, you should probably train the classifer for each your region, I would not assume the available classifer is optimised for all your region.

@llenzi Yes, you are right! Thank you for your advise. But I wonder if there is any other taxonomy annotation method that need not to divide the dataset?Because it is a little bit difficult to go on downstream analysis after dividing.

Hi @Zhanzhan,

I have not done much analysis with different amplicons, but I usually treat them as separate analyses, then maybe merge the final table.

That because my understanding is that the denoising tools start from assumption that all the sequences cover the same region, as well as I don’t think is possible to train classifier on different region (but you may use blast or vsearch for this instead)
As alternative, you may try to create feature table with a close-reference approach using:

@llenzi OK, I will try!Thank you very much!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.