How to tabulate the number of sequences classified in each taxon

Hey there,
I am new to the Qiime2 world, so sorry whether this is a so silly thread.
I intend to run Qiime2 for taxa identification and relative abundance for each ideintified taxa. At the end of my analysis I would like to be able to generate a table containing taxa and number of reads classified per taxa, but I am stuck on the output of the classification analysis I generate.
The pipeline I followed was:
Input: paired-end demultiplexed Illumina sequences (~400000 seqs) ==> dada2 for denoising and selecting sequence variants (remained 29 features and ~363000 seqs) ==> feature-classifier classify-sklearn (all the 29 features were classified)
(I’ve trained the classifier with the “feature-classifier fit-classifier-naive-bayes” command using a previously primer-cutted silva subset for the V3-V4 regions. This subset had 179665 sequences)

The output looks like that:
Feature ID; taxon; confidence
052ba7abaeaa968c4f79e3f97d1f0a2f D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas 0.9999974971847285
42f42bd9c69b033046f35399a152812f D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus 0.9999994587523952
and so on…

However, I have no idea on how many sequences felt down within each taxa and that’s what I would like to access. The output of the classification is a FeatureData[Taxonomy] and TSVTaxonomyDirectoryFormat file.
Are there ways to convert it in some file that contains the information of how many sequences are within each classified feature?
At the end I would like to have something like that:
taxa; #sequences
D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas; 40000
D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus; 50000
and so on…

Thanks in advance

1 Like

Hello Leo,

I think I can help!

This sounds super similar to this answer. Do you think that method would work for you? I think the download CSV file is precisely what you are looking for, or really close so you could build the table you need using Excel.

Let me know what you think. I'm sure there are many ways to do this, and we're happy to help you find a method that works for you.

Colin

3 Likes

Dear Colin,
Many thanks for your attention! That is indeed what I want.
All the best

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.