Taxonomy analysis result error

Hello

I am getting different taxaplot . First of all I performed taxonomy analysis with customized database. After that analysis, I added some more sequence and performed taxonomy analysis again, but that taxaplot result is different from first one.

When I did first time sample 2 has moxella species (82%), but in second time sample 2 has Acinetobacter indicus (50%). In second time sample 7 has Streptococcus;s__pneumonia (94%) whereas in first time k__Bacteria was only seen.

As per my culture report, sample 2 has moxella species, sample 7 has streptococcus pneumonia.

I tried many times performing taxanomy analysis. If I get moxella species in sample 2 mean, sample 7 has nothing in taxplot. If I get strptococcus penumonia in sample 7 mean, moxella species was not seen in my taxaplot.

If I add any new sequence mean, that taxaplot look different from previous one. My biggest surprise is I didnt include pseudomonas aeuroginosa sequence in my taxonomy file first time. But that species was seen in my taxplot.

Could anyone tell me why the taxanomy analysis result is coming like this? Herewith, I have attached taxaplot snapshot for your reference.

First time taxonomy analysis result

Second time taxonomy analysis result

cd.txt (4.3 KB)

that is to be expected. Adding more sequences can significantly alter the outcome, especially if those new sequences are more similar to your query sequences!

Culture reports (especially using specific media) will be notoriously blind-sighted to the greater diversity present. So while I would agree that you should expect species X if you observe it in culture, in no way should you hold up the culture report as the gold standard of what the community composition should be.

Okay this is raising red flags, suggesting that you are inputting the wrong files. q2-feature-classifier simply cannot report a species unless if it is present in the reference database.

It sounds like you are perhaps playing too many games with adding/excluding different species and “trying again” hoping for a better result. Custom databases are fine but you should always use a comprehensive reference containing all species that might be present in the target environment, not just the sequences that you expect in that specific sample based on culture analysis. (at the very least, that is an approach that would not hold up under peer review!)

All in all, this is not an error per se — the classifier will only perform as well as the quality of the input data, and switching around the input sequences will dramatically alter results.

You may also want to use the classify-consensus-vsearch method — this will allow you to search based on sequence identity and select only top hits (if the appropriate option is set), which may be more suitable for your use case given that you have very specific known targets that you are attempting to match.