Taxonomy assignment error


I just added two of my target genome to existing greengene database after converting in to single line fasta sequence.

After taxanomy analysis, I was getting taxaplot like this.

The sequence which I added was alone in taxaplot (100 % identity).

I don’t know why I am alone getting this type of error. Could anyone tell me why I got error like this and how to rectify this ?.

did you add the full genome sequences or just the 16S rRNA gene sequences? Do the latter, not the former — adding full genome sequences will totally tangle up the kmer frequencies used for classification.

Let us know if that fixes the problem!

1 Like

Thank so much for your prompt reply sir. I will check and inform you the outcome.

1 Like

Hi sir,

You are absolutely correct. I owe you and justine mam a lot.

I took whole genome sequence from blast analysis result, which is why I got weird taxaplot.

I removed such sequences and performed taxonomy analysis. The problem which I have rightnow is My taxaplot is not the one which I excepted.

If I ran taxonomy analysis with customized database (13 sequences) alone, Taxaplot result was good.

If I add that sequence to existing greengene database mean, I am not seeing that similar plot.
I could see only genus level classification.

For instances,
It was showing only streptococcus in sample 7, Moraxella in Sample 2. No species level classification was seen. But, I could see species level classification in customized database taxonomy analysis. Moreover, Frequency of occurrence was same in both cases.

I performed open reference clustering with different threshold value like 99,75 and so on. No changes was seen.

All identity value was showing same taxa plot . I don’t know the reason why it is coming like that

I also performed classify-consensus-vsearch analysis as per your suggestion. Taxa Plot was almost same. No drastic change.

Here the command I used

qiime feature-classifier classify-consensus-vsearch --i-query rep-seqs-dada2.qza --i-reference-reads si_cs.qza --i-reference-taxonomy si_cs_txt.qza --p-maxaccepts 10 --p-perc-identity 0.99 --p-query-cov 0.8 --p-strand ‘both’ --p-min-consensus 0.51 --p-unassignable-label ‘Unassigned’ --p-threads 100 --o-classification 99bacterialsequenceclassifyfile

I am still trying with different identity and query coverage value. Could you please suggest me which value would be good to see species level classification?

Could you please tell me how to rectify this problem?.

Thanking once again for your support and help. looking forward to your reply.

That is “cheating”

That is the correct way. You are getting genus-level classification because species-level is very difficult to achieve on short sequence fragments… essentially the species in that genus cannot be distinguished based on genetic content alone.

you could add the --p-top-hits option.

Species might just be unobtainable based on genetic content. However, depending on the sample type you are using you may find that q2-clawback — a method to help identify species based on habitat occurrence — could help. See the README here:

1 Like

Hello sir,

Thank so much for your prompt reply. I just want to see how taxa plot looks If I run with customized database. That’s it, I wont publish the paper with customized database taxonomy analysis result alone.

I will check q2-clawback plugin and get back to you soon.

I ran classify-consensus-vsearch command line with --p-top-hits . It was coming “Error-No such word”

Did you mean p-min-consensus as top hit?

Could you please tell me the reason why there was no change in taxa plot after performing open reference clustering with different identity value?

Here the commands I used for open reference reference clustering

qiime vsearch cluster-features-open-reference --i-table table-dada2.qza --i-sequences rep-seqs-dada2.qza --i-reference-sequences 99_otus.qza --p-perc-identity 0.99 --o-clustered-table table-or-99.qza --o-clustered-sequences rep-seqs-or-99.qza --o-new-reference-sequences new-ref-seqs-or-99.qza

qiime vsearch cluster-features-open-reference --i-table table-dada2.qza --i-sequences rep-seqs-dada2.qza --i-reference-sequences 75_otus.qza --p-perc-identity 0.75 --o-clustered-table table-or-75.qza --o-clustered-sequences rep-seqs-or-75.qza --o-new-reference-sequences new-ref-seqs-or-75.qza

qiime vsearch cluster-features-open-reference --i-table table-dada2.qza --i-sequences rep-seqs-dada2.qza --i-reference-sequences 99_otus.qza --p-perc-identity 0.06 --o-clustered-table table-or-0.06.qza --o-clustered-sequences rep-seqs-or-0.06.qza --o-new-reference-sequences new-ref-seqs-or-0.06.qza

I couldn’t see any difference in taxa plots which ran with different percent identity value. Why is it so?

Looking forward to your reply.

Check the help documentation, this should be --p-top-hits-only

Because the same general taxonomic patterns were present. Open-reference clustering performs closed-reference as a first step, so your sequences are still hitting the closest reference sequence no matter what similarity % you set. This looks a little surprising, but strange things do happen :smile:

Thank so much for your reply sir. I couldn’t see that --p-top-hits parameter.


I tried q2-clawback command for taxonomy assignment. I couldnt see species level classification in this analysis also.

If you find any mistake in the command , Please let me know.

Here the command I used

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads readytowear/data/gg_13_8/515f-806r/ref-seqs-v4.qza --i-reference-taxonomy readytowear/data/gg_13_8/515f-806r/ref-tax.qza --i-class-weight readytowear/data/gg_13_8/515f-806r/soil-non-saline.qza --o-classifier gg138_v4_soil-non-saline_classifier.qza

qiime feature-classifier classify-sklearn --i-reads ref-seqs-v4.qza --i-classifier gg138_v4_soil-non-saline_classifier.qza --o-classification f_bespoke-classifier-results.qza

qiime metadata tabulate --m-input-file bespoke-classifier-results.qza --m-input-file rep-seqs-dada2.qza --o-visualization bespoke-classifier-results.qzv
qiime taxa barplot --i-table table-dada2.qza --i-taxonomy bespoke-classifier-results.qza --m-metadata-file v3-v4sample_meta.tsv --o-visualization tc_taxa-bar-plots.qzv

Is v3-v4 of 16s rRNA Sequencing not enough to identify species level classification for microorganism?

Looking forward to your reply

Hello sir,

You told not to take complete genome for the taxonomy analysis. But blast result of OTU sequences were showing 100% identity with complete genome. Could I take this sequences for taxonomy analysis or not?

Looking forward to your reply.

I already answered this:

You need to pull out just the 16S sequences from the genomes. Alternatively, q2-feature-classifier does have alignment-based classifiers that you can use here, including one based on BLAST.

1 Like

Hello sir,

First of all , I owe you a lot for your suggestions and help. I wouldn’t have come to this stage without your help.

I couldn’t run --p top hits option only parameter on qiime 2-2019.1. So , I installed qiime2-2019.7 and ran command with --p top hits parameter.

This was the error I got

Could you help me to solve this error? Thanking you in advance.

Hi @Asha1 - this error message has been improved in QIIME 2 2019.10. The issue is this part of your command:

--p-top-hits-only 2 — the --p-top-hits-only parameter is a boolean value, and can’t accept a number as input:

  --p-top-hits-only / --p-no-top-hits-only
                        Only the top hits between the query and reference
                        sequence sets are reported. For each query, the top
                        hit is the one presenting the highest percentage of
                        identity. Multiple equally scored top hits will be
                        used for consensus taxonomic assignment if maxaccepts
                        is greater than 1.                    [default: False]

Simply drop the 2 and you should be good to go! :qiime2: