Many seqs are only annotated to Domain !

I downloaded some amplicon data from ncbi to perform annotation training myself. At the end of the annotation, I found that many seqs were only annotated to the Domain level, and some were only annotated to the class level. I want to know where the problem in my processing caused this phenomenon.
taxa-bar-plots.qzv (586.3 KB)


What region was sampled to produce the data you are using? Unfortunately sometimes there simply is not enough information in data to be able to classify more precisely, for example if only the V3 region is targeted. This also can be a reference database issue where the database itself does not have information about organisms found in your samples. If this is the issue, you can often increase the performance of the classification by using a database tuned to the particular environment you are sampling from. Even using the same database, providing weights based on the environment can drastically improve your classification.

These are good lessons to learn using someone else's data before you go through the cost and effort of your own sequencing runs. If you are designing your own experiment it would definitely be worth searching for issues that others have run into on here as well as asking for feedback on your design before beginning your experiment. There are lots of people on here who are happy to help set you up for success, plus lots of knowledge of pitfalls to avoid, and resources to point you to make sure you can actually answer your questions with the data you generate.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.