How to find OTUs per sample from DADA2 qzv files?

Hi Friends,

I have performed Qiime DADA2 denosiing and the resultant files are below; could you please let me know how to see the OTUs per sample from the files? Thank you.
ww-DNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table2.qzv (576.3 KB) ww-DNA_1to11_paired-end-demux-trimmed_dada2-rep-seqs-stats2.qzv (1.2 MB)

Hi there,
If I understand well, you are looking for how many counts each OTU have in each sample? Like, the “OTU table”?
So, that is one of the DADA2 outputs:

$ qiime dada2 denoise-paired --help

 --o-table ARTIFACT PATH FeatureTable[Frequency]
                                  The resulting feature table.  [required if
                                  not passing --output-dir]

You can then convert the table into a .tsv file and open it in Excel, for example.

qiime tools export --input-path tableFromDada.qza --output-path table
#that results a biom table. Then convert it into tsv
biom convert --to-tsv -i table/feature-table.biom -o table/table.tsv

Cheers

Apart of that, I see in your rep-seq-stats file that you are loosing too many reads during dereplication, as an illustration sample ww10_DNA: input 168252; merged 11396; output 8409 sequences. Are you sure that is ok? Is it possible that you are trimming so much (like using --p-trunc-len-r/f)?

thanks much @lca123 ! I did the steps and got this tsv file:
table.tsv (285.0 KB)

So, if I am right, to get the number of OTUs per sample, I need to sum the numbers for each sample in tsv file?

If you’re just summing the numbers in each collumn (sample) you are getting the total counts (reads) that were present in that sample and that were transformed into an ASV. But, not all features in the table are present in all samples, so if you were to look into how many OTUs (features) you’ve got, you should sum all the features (not all counts) that have counts > 0 for each sample.

Thanks @lca123. Does this mean that, I can say “Sample 8” has 4501 OTUs from the table qza file of DADA2?

No. The whole table has 3967 OTUs (or features) which is the number of lines. Sample 8 has 339 features whose counts sum > 0, or 339 lines where the sum of counts is > 0.
Having counts > 0 mean that feature, represented as a code, b31a963a67991d1e7a2bb3a2ecebbe06 for example, was present in that sample with that number of counts (120 for this feature). So, there were 120 sequences in your fastq file wich represented the same sequence. DADA2 identified it and gave the ASV this code and registered the number of counts, the 120.

Thanks @lca123, So, 339 is the number of OTUs for sample 8? And , if I am right, to get the number of features for a sample, simply sum all the lines where the sum of the counts is > 0 for that sample, rt?

Yes, then you’ll find that the number of features = number of OTUs. As of how DADA2 works, they don’t call it a “OTU” anymore, but a “ASV” or feature. not sure why the table from DADA2 comes with “OTU ID” in the header. If you want to understand why DADA2 produces ASVs instead of OTUs there are threads in this forum and outside. I don’t remember which ones to point it out but you can find with the search tool.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.