How to check the amount of different microbes from 16s data

Hey everyone, I'm using Qiime2 for taxonomic analysis using the silva classifier and 16s data from paired-end Illumina reads and I have a few questions about the interpretation of the results. After running the taxonomic assignment on 30 samples by "classify-sklearn" and "taxa barplot", I get "metadata.tsv" (connecting each feature id to a Taxon) and "level-7.tsv" (connecting each found otu in species level and their frequencies to a sample), respectively for downstream analysis. My main aim is to find how many different microbes are in my samples. When I check e.g. genus Rhodococcus from my taxa barplot, I see there are 700 assignments to it from one of my samples (many more assignments also exist from other samples) however, in "metadata.tsv" file (from all samples) I only see 140 feature-ids assigned to that.

My questions are:

1- Does it mean that I have 700 microbes that could only be assigned to the genus level of Rhodococcus (not counting the species level assignments from Rhodococcus) in one sample and they can theoretically be coming from relatively diverse species or they are all the same species? If they are coming from diverse species, is there a way to assess that in a way e.g., the number of 16s sequences assigned to a genus but they are quite different from each other?

2- Or, should I just check the metadata.tsv file from all taxonomic assignments and consider the number of features found there as unique microbes (although then I would have to map each feature to its respective samples which I guess should be doable from the tables created beforehand) ?

Thanks for your time and the awesome pipeline!

Welcome to the forum!

As I understood, you mean your taxonomy file, in which each ASV ID is assigned to certain taxon.

And here you mean the file that you can download from barplot visualization with counts of all ASVs, collapsed to certain taxon at desired level.

If you want to check it at level 7 (species), then you can just count how many taxons in corresponding csv file have counts > 0 (not their sum!) by sample.

In "level-7" file you have counts of each taxa in a sample (Taxa1, which includes ASVs and they frequencies sum).
In taxonomy file you have ASVs IDs, not their counts. It is different metrics and they are not comparable.

Yes, at genus level it will include all ASVs with 700 be a sum of all their frequencies. You can export feature table to biom, convert it to tsv table and select (count) all ASVs, assigned to this genus, with their frequencies in given sample not equal 0 (nan). The same can be done with species (collapse feature table to species level).

The same taxon can be represented by different ASVs in different samples, so I would go for option 1 (above).

1 Like

Thanks a lot for the fast reply!

Yes, at genus level it will include all ASVs with 700 be a sum of all their frequencies. You can export feature table to biom, convert it to tsv table and select (count) all ASVs, assigned to this genus, with their frequencies in given sample not equal 0 (nan). The same can be done with species (collapse feature table to species level).

Just to make sure that I got it right. I converted my feature table to biom and then tsv and got the mentioned files. And the resulting two lines below for example tell me that two OTUs have been found assigned to Rhodococcus. Even though they couldn't be assigned to a specific specie, they are coming from the same microbe, is that correct? And I guess the same logic goes for the unassigned OTUs or OTUs assigned to higher ranks e.g. phylum.

#OTUID    taxonomy    confidence

aee22922735eed61386099218ccc4fe8|d__Bacteria; p__Actinobacteriota; c__Actinobacteria; o__Corynebacteriales; f__Nocardiaceae; g__Rhodococcus    0.9998908663313103

a8c883a472651f9239cc0f62ce51a3dd|d__Bacteria; p__Actinobacteriota; c__Actinobacteria; o__Corynebacteriales; f__Nocardiaceae; g__Rhodococcus    0.999611058928425

That's right. Classifier is confident, that those ASVs should be assigned to certain genera, but confused about species, so only genera is indicated. In other words, you can be sure, that those ASVs are Rhodococcus, but species level is unknown due to various reasons (low resolution of ASV sequence, lack of references in database. unknown species).

2 Likes

One final question, can those different OTU IDs belong to the same specie/strain, or does the classifier know that they are different (hence the different OTU IDS) but just not sure about what they are?

Thanks for all the help!

Different ASVs are different even is there is a difference between them in one nucleotide. So it is not necessary different species. It can be simple SNP.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.