What is the simplest way of working out how many representative sequences exist in each sample of a QIIME2 experiment?

danwiththeplan · March 11, 2019, 2:38am

I understand that I can obtain the total number of rep-seqs (or what used to be called OTUs) across the entire experiment, and the taxon (if any) that they are applied to, but for the life of me I can't work out a simple way of finding out how many total rep-seqs exist in each individual sample (since some rep-seqs occur in some samples but not others).

Thanks for your help.

danwiththeplan · March 11, 2019, 2:51am

Addendum: I understand that I can get this information from the CSV file that is derived from the barplots, but this only gives me the number of rep-seqs in each sample that have been assigned to a taxonomic group. All the "unassigned" rep-seqs are lumped into a single column, but within the "unassigned" group there will be multiple different rep-seqs that have not been assigned to a taxonomic group.

Mehrbod_Estaki · March 11, 2019, 6:48am

Hi @danwiththeplan,

From the top of my head, I don't think there isn't an immediate simple way to get this in qiime2, but there are simple enough ways. What you are describing is essentially the 'richness' of each sample.

If you are comfortable with R, (or even Excel) you can simply export your feature-table there and count the non-zero counts in your samples plus additional flexibility of doing whatever you want with that.
If you rather stick within qiime2, you can also get these values by using the diversity alpha plugin with the metric selected to observed_otus. That will give you the same thing and you can simply export the data file from there which will give you a table with the per sample 'rep-seqs' numbers.
Hope that helps.

dwt · March 11, 2019, 2:35pm

Hi Dan,
You can use the Artifact API to do this fairly easily.
The python script would look like:

#!/usr/bin/env python
from qiime2 import Artifact
from pandas import DataFrame
table = Artifact.load("table.qza").view(DataFrame)
rep_seq_count = (table > 0).sum(axis = 1)
print(rep_seq_count.to_csv())

You can put that in a file then run it like python filename it will print a csv of the samples and number of rep-seqs.
Devin

danwiththeplan · March 11, 2019, 7:44pm

Absolulutely fantastic. Much appreciated.

danwiththeplan · March 11, 2019, 10:54pm

thanks, this is a good alternative to try.

system · April 12, 2019, 5:04am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.