OTU-table with taxons and reads per sample, without clustering of taxons?


At this moment, I am doing an analyses in which I trimmed, merged and denoised my 16S reads. I used qiime vsearch cluster-feature-de-novo to cluster my reads with perc-identity of 0.97, and used qiime feature-classifier classify-sklearn for the taxonomic assignment of my reads (to the rep-seq-dn-97 file in this case) with the help of the silva-132-99-nb-classifier. After this I used qiime taxa collapse to combine my table-dn-97 file with my taxonomy-dn-97 (see step before this). I am quite happy with the results this far, but I am wondering if it would be possible to do get to following:

What I would like to have as an output-file is an overview of the OTU’s (OTU-IDs), followed by the taxonomic assigning of this OTU, followed by my samples with the total number of reads in each OTU.

In my experience (correct me if I am wrong), the command qiime taxa collapse automatically merges all the OTUs with the same taxonomic outcome to one (new) OTU (ID). For my project it would be very helpful for me to actually see which sequence within a certain taxon contains most reads, if this makes sense to you. So, I would still get the same overview as qiime taxa collapse gives me, but now I will still deal with all the rep-seqs-dn-97 separately.

Can anybody tell me if this is possible with the help of QIIME?

Thank you very much in advance!


Interesting question, @Magreet. I believe your understanding of qiime taxa collapse is correct, but I don’t believe this functionality is currently available in QIIME 2.

Big-picture, I suspect what you’re trying to do is:

  • First, add taxonomic assignments to your original FeatureTable[Frequency].
  • Second, join that table with the Feature detail table of collapsed taxa, joining by taxon.

This would produce a long-form table with N ASVs for rows, and columns for feature count, taxonomy, taxa count, and maybe counts per sample.

If I’m understanding your desired outcome correctly, I think it will take a bit of data manipulation. There are a bunch of different ways to go about this, and Python, R, whatever language you prefer are probably the best approach. You could use Excel, but I suspect it would be rather painful (and I love spreadsheets!).

Here are a couple of somewhat-related topics you might want to check out for bread crumbs: 1, 2.

Hope that helps,
Chris :elephant:

1 Like

Hi @Margreet,

Could you please clarify a bit? Do you just want an OTU table with taxonomic annotations for each individual OTU included? Or are you trying to transform or summarize the contents of the table also?

You can use “qiime metadata tabulate” to view taxonomy and an OTU table simultaneously, if that is your goal. Something like:

qiime metadata tabulate \
    --m-input-file taxonomy.qza  \
    --m-input-file otu-table.qza \
    --o-visualization table-with-taxonomy.qzv

Let me know what you think!

Hi @Nicholas_Bokulich,

Thank you very much for your responds.

Sorry for being a bit unclear. At this point, my output file (feature table) is as follows: OTU ID (which is taxonomic annotation) - #reads per sample.

What I would like to get as an output file is the following:
OTU ID (feature ID) - Taxonomic annotation - Confidence level - #reads per sample. This is the combination of the feature table.qza and taxonomy.qza. The only problem is that it seems like QIIME is merging the OTU IDs, since my feature table has a total of 4383 feature IDs, and in my final output file I end up with (only) 1250 OTU IDs. Because of this difference, I cannot just simply copy paste my feature IDs into my final output file.

Does did make more sense to you? What I would like to see is which sequence within a certain taxon (same species for multiple sequences for example) contains most reads.

Please let me know if you need more explanation.
Thanks again!


Sounds like you collapsed the feature table. You do not want to collapse prior to using the command I posted above.

It sounds like what I shared above would do this… you just need to merge the full table rather than a collapsed table.

You could also use “qiime feature-table group” to sum the number of observations of each feature across all samples before tabulating, if that is more in line with what you want.


Thank you very much for all the help, effort and tips!
I am going to give it a try!



This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.