Q2 fragment insertion taxonomy error

I am trying to create a taxonomy file using q2 fragment insertion but I keep getting an error that says:

Not all OTUs in the provided insertion tree have mappings in the provided reference taxonomy.

I am using a different set of taxonomy (based on the groEL gene) and I have a taxonomy file from the cpn database. I'm unsure of why I am getting this error. I've attached the files I'm using below (cpn rep seqs is the file containing the sequences I made my base tree with and the cpn taxonomy is the file containing my taxonomy that I'm trying to use to create my new taxonomy file). Thanks!

correct_no_spaces_only_bifido_and_gardnerella_cpn_rep_seqs_upper_no_replicates.qza (10.7 KB)
insertion-tree.qza (60.1 KB)
dada2_rep-seqs.qza (50.0 KB)
correct_no_spaces_only_bifido_and_gardnerella_cpn_taxonomy_to_taxonomy_upper_with_confidence_no_replicates.qza (7.5 KB)

Hi @Stephanieorch,
What is the command that you are using?

Does this command report which OTUs are missing? Have you cross-checked the files to make sure that all rep seq IDs are present in the tree and vice versa?

I am pinging @Stefan who developed this method and could perhaps take a look.

Thanks!

Hi Stephanie,
you have 569 tips in your tree, but only 92 of those tips have taxonomy lineages in your taxonomy file. Thus, my methods realises that the remaining 477 tips cannot be resolved into taxonomic names, reports that and fails.
You need to add the 477 tips into your taxonomy file - maybe with fake lineages like k__; p__; ...

1 Like

My goal is to look at one particular genus. To do this I ran qiime fragment-insertion sepp and inserted my tree and my alignment that only contain my genus of interest. I then run qiime fragment-insertion filter-features to filter my table in hopes that it will only contain my genus of interest. Finally I run qiime fragment-insertion classify-otus-experimental with my taxonomy that only contains my genus of interest to classify the rest of the taxa in the tree.

Are the commands I am running correct in order for my insertion tree, taxonomy, and table qza files to only contain my genus of interest? Thank you.

@Stephanieorch this sounds like where your analysis is going off-track. The tree you are inputting should contain all of the features you are inputting, not only the genus of interest. Perhaps @Stefan can clarify if I am wrong.

1 Like

I tend to agree with @Nicholas_Bokulich that this kind of analysis might have some conceptional short commings. If you already know that your set of sequences belong to one specific genus, why should you want to again assign taxonomy to them?
You might first want to use the default assignment tools like naive Bayes classifier.
I think inserting the "rest" of our sequences in the default reference tree (which is GG 13.8 99%) is a good idea. You might want to look at the generated tree manually to "see" if those sequences were inserted below your genus of interest. (You can also identify the node ID of the LCA of your genus and check if the sequences are inserted at this node in the placement file). But I would not manually limit the tree itself.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.