Why does most my sample has uncategorized bacteria?

Hi @amm59063,

Note the d__Bacteria;__;__;__;__;__;__ only appears in that form within the visualizer. So searching for that pattern will not work. The ;__;__;__;__;__;__ is simply filled in as you view more ranks in the visualizer. In actuality, the taxonomy information may only contain d__Bacteria.

The best way to figure out what to search for is to make a sample_taxonomy_paired.qzv file and view that.

I usually run something like this when filtering SILVA taxonomy:

  --p-mode 'contains'  \
  --p-include 'p__' \
  --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned' \

Note the p__ in --p-include, and p__; in --p-exclude. The include keeps only taxa with p__ in the label. However, we do not want empty phylum labels, i.e. p__; (note the semi-colon). These two combined have the effect of explicitly removing all taxa that do not have at least a phylum-level designation. So, in this case the d__Bacteria; p__; ... or d__Bacteria ... (similarly for Archaea too) will be removed. Depending on the reference database you are using you may need to change Unassigned to Unclassified or simply add it.

-Mike