Hey there,
I am new to the Qiime2 world, so sorry whether this is a so silly thread.
I intend to run Qiime2 for taxa identification and relative abundance for each ideintified taxa. At the end of my analysis I would like to be able to generate a table containing taxa and number of reads classified per taxa, but I am stuck on the output of the classification analysis I generate.
The pipeline I followed was:
Input: paired-end demultiplexed Illumina sequences (~400000 seqs) ==> dada2 for denoising and selecting sequence variants (remained 29 features and ~363000 seqs) ==> feature-classifier classify-sklearn (all the 29 features were classified)
(I’ve trained the classifier with the “feature-classifier fit-classifier-naive-bayes” command using a previously primer-cutted silva subset for the V3-V4 regions. This subset had 179665 sequences)
The output looks like that:
Feature ID; taxon; confidence
052ba7abaeaa968c4f79e3f97d1f0a2f D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas 0.9999974971847285
42f42bd9c69b033046f35399a152812f D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus 0.9999994587523952
and so on…
However, I have no idea on how many sequences felt down within each taxa and that’s what I would like to access. The output of the classification is a FeatureData[Taxonomy] and TSVTaxonomyDirectoryFormat file.
Are there ways to convert it in some file that contains the information of how many sequences are within each classified feature?
At the end I would like to have something like that:
taxa; #sequences
D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas; 40000
D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus; 50000
and so on…
Thanks in advance