What does the NA represent in the legend?

arlandan · July 18, 2019, 7:09pm

Hi All,
I generated a bar plot using "qiime taxa barplot", and visualized it in R. I did not specify anything in these process, but the legend had a NA taxa at the very end. Could anyone tell me what it actually represent? Thank you for your attention.

Best.

Nicholas_Bokulich · July 19, 2019, 11:21am

Hi @arlandan,
That "NA" is being added by R, not by QIIME 2. Since it is being added by R, we can surmise that it most likely represents all sequences that do not have a genus-level annotation.

arlandan · July 19, 2019, 2:15pm

Thank you @Nicholas_Bokulich. The NA takes a larger proportion in most of my samples. If these ASVs can not be assigned to genus level in R, I guess either in qiime2. Would you recommend filter these ASVs out using "--p-include g_" before making the taxa-bar-plot? My goal is to compare community comparison between samples, and these unassigned ASVs look too many. Thank you in advance!

All the best,
Arlandan

Nicholas_Bokulich · July 19, 2019, 2:30pm

This is not an uncommon problem. Your sequences may be too short or otherwise lack enough information to confidently classify at genus level. You can attempt to fiddle with the classifiers, the reference database, try something like q2-clawback if you are sampling a well-studied sample type (search the forum community tutorials for more detail), or just accept that family-level classification may be the best you can achieve with the information you have.

No. These are probably real biological organisms! You could use something like --p-include p__ to exclude those that are totally unassigned (they are most likely non-target DNA and other garbage), but sequences that can only be classified to family level are common... simply because it is difficult (or impossible!) to resolve genus- and species-level affiliation for some clades that have (near-) identical sequences for some common marker-gene targets (e.g., 16S rRNA gene V4 domain). So excluding these will SEVERELY skew your taxonomic results.

Rather, I recommend changing the "NA" label to "Other", or else make family-level barplots.

Good luck!

arlandan · July 19, 2019, 3:27pm

Hi @Nicholas_Bokulich,

Thank you very much for your reply. Very informative and helpful. I would like to go back and check my steps to figure out why there are so many NA this time. I used mothur the first time analyzing this set of data. But later for some reason, I switched to qiime2 and decided to keep going with qiime2. The first time when I went through the basic tutorial, I did not have that many NA in my taxa-bar-plot. Either with mothur. But for this second attempt, it showed up many NAs, which is a bit unexpected. The only thing I can recall is that I grouped the sample, but no other changes. Grouping is very unlikely to cause this, though. Thank again! I really appreciate your help.

Have a great day