Percentage of reads classified at each taxon

Hey guys,

I would like to know the percentage of reads classified (and unclassified) at each taxonomic level (class, order, family, genus). I used DADA2. Can you help me?

Good afternoon,

Great start! For a point of comparison, you are here in the PD-Mice tutorial

DADA2 does denoising to make ASVs, and also makes a feature table of counts of those ASVs. But it does not assign taxonomy.

Taxonomic classification happens later, for example at this step in the PD-Mice tutorial.

Then you can make a taxonomy bar chart that includes the percentage of unclassified reads.

Keep going through that workflow and let us know if you have any questions along the way.

2 Likes

Hi @colinbrislawn ,

I have already processed my data (following moving pictures tutorial) but I would still like to know if there is a way to directly know the percentages of reads classified and unclassified for each taxon, other than by sampling as shown in the taxonomy chart (the percentages displayed on the graphic refer to the number of reads?). Also, if it would be possible just like is displayed on "taxonomy.qzv", but that it appeared the number of reads (the feature ID refer to ASVs, right? but is there a way to know how many reads were considered for each of them?)

OK, I'm glad you have made it to the end of Moving Pictures!

Like this?
E . coli - Pseudomonadota 99%
E . coli - Enterobacterales 99%
E . coli- Enterobacteriaceae 95%
E . coli- Escherichia: 90%
E . coli- E . coli 0%

Yes, each feature made by DADA2 is an ASV.

So instead of percent in the chat, you would like raw read counts?

Hi @colinbrislawn ,

Yes, exactly like this! I don't need the raw read counts, only the percentage of reads. And if it would be even possible, plot the total amount of percentages of reads of each taxon (class, order, family, genus) into a chart.

I would like to know for example:

class:

the total percentage of classified and unclassified reads into each class classification: class %

c__Clostridia_258483 - x%
c__Bacteroidia - y%
c__Gammaproteobacteria - z%
.. etc

order:

the total percentage of classified and unclassified reads into each order classification: order %

o_Lachnospirales - x%
o__Bacteroidales - y%
..etc

family:

the total percentage of classified and unclassified reads into each family classification: family %

f_Veillonellaceae - x%
f__Ruminococcaceae - y%
f__Lachnospiraceae - z%

genus:

the total percentage of classified and unclassified reads into each genus classification: genus %

g_Faecalibacillus - a%
g_Bifidobacterium_388775 - b%
g_Porcincola - c%
g_Acinetobacter - d%
g_Anaerotignum_189125 - e%
g_Fusobacterium_A - f%
.. etc

1 Like

I think I see where you are going...

Do you want that for each sample separately or for the whole run?

Related questions:
How many samples do you have?
Does each sample contain many microbes or just one (axenic / mono-culture )?

For the whole run.

For this dataset I have 950 samples (490 affected patients and 460 healthy controls).
I think each sample contain many microbes (since it is from human feces).

OK, try this:

First, group/merge all your samples into one big sample:
https://docs.qiime2.org/2023.9/plugins/available/feature-table/group/

Then collapse/merge all the taxonomy annotations into a single level:
https://docs.qiime2.org/2023.9/plugins/available/taxa/collapse/


Here's the idea behind this:
You current problem is that you have lots of ASVs in lots of samples. But you don't want that much detail. Merging samples into sample and merging features by observed taxonomy matches what you have described to me...

You may be looking for something different. In that case, this super-simple table should serve as a counter-example to see what you need to break apart.

Keep me posted!

Thank you so much!

These both steps worked, but how can I transform it into a qzv file/chart?

Great!

To make a chart / CSV, rerun qiime taxa barplot with the new table.

Then open with view.qiime2.org and
Level Max > Download CSV
image

Thank you again!

I was facing an error but encountered the solution on this other topic:

Then, I runned the qiime taxa barplot like this (I had to remove the --i-taxonomy because it returned an error):

qiime taxa barplot
--i-table collapsed-table-l6.qza
--m-metadata-file sample-metadata.tsv
--o-visualization taxa-bar-plots.qzv

And I had to rerun taxa collapse for each level I wanted, and then generated the barplot for each of them and it worked.

And it appeared like this (this one at genus level):

Then, to know each taxon in percentage I did qiime feature-table relative-frequency and then transformed on csv with qiime metadata tabulate.

1 Like