What is the simplest way of working out how many representative sequences exist in each sample of a QIIME2 experiment?

otus
(Dan Jones) #1

What is the simplest way of working out how many representative sequences exist in each sample of a QIIME2 experiment?

I understand that I can obtain the total number of rep-seqs (or what used to be called OTUs) across the entire experiment, and the taxon (if any) that they are applied to, but for the life of me I can’t work out a simple way of finding out how many total rep-seqs exist in each individual sample (since some rep-seqs occur in some samples but not others).

Thanks for your help.

(Dan Jones) #2

Addendum: I understand that I can get this information from the CSV file that is derived from the barplots, but this only gives me the number of rep-seqs in each sample that have been assigned to a taxonomic group. All the “unassigned” rep-seqs are lumped into a single column, but within the “unassigned” group there will be multiple different rep-seqs that have not been assigned to a taxonomic group.

(Mehrbod Estaki) #3

Hi @danwiththeplan,

From the top of my head, I don’t think there isn’t an immediate simple way to get this in qiime2, but there are simple enough ways. What you are describing is essentially the ‘richness’ of each sample.

  1. If you are comfortable with R, (or even Excel) you can simply export your feature-table there and count the non-zero counts in your samples plus additional flexibility of doing whatever you want with that.
  2. If you rather stick within qiime2, you can also get these values by using the diversity alpha plugin with the metric selected to observed_otus. That will give you the same thing and you can simply export the data file from there which will give you a table with the per sample ‘rep-seqs’ numbers.
    Hope that helps.
2 Likes
(Devin Thomas) #5

Hi Dan,
You can use the Artifact API to do this fairly easily.
The python script would look like:

#!/usr/bin/env python
from qiime2 import Artifact
from pandas import DataFrame
table = Artifact.load(“table.qza”).view(DataFrame)
rep_seq_count = (table > 0).sum(axis = 1)
print(rep_seq_count.to_csv())

You can put that in a file then run it like python filename it will print a csv of the samples and number of rep-seqs.
Devin

4 Likes
(Dan Jones) #8

Absolulutely fantastic. Much appreciated.

1 Like
(Dan Jones) #9

thanks, this is a good alternative to try.

1 Like
(system) closed #10

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.