Low number of ASVs identified per sample

Hello,

I’ve pushed my FastQ files through the standard Qiime2 pipeline, including generation of taxa-bar-plots using a naive bayes classifier trained on GreenGenes 13_8, 99% coverage specific for the V4 region. After doing so, I downloaded the level 7 abundance table from opening taxa-bar-plots.qzv in Qiime2 View and counted the number of taxa per sample that were not at 0 abundance. The resulting table is below, with some extra data in terms of raw reads and post-DADA2 denoising reads:

Sample ID Number of Reads (Raw) Number of Reads (Post-denoising) Number of Present Taxa
2 118804 58274 38
3 112727 81853 52
4 142620 58338 47
5 128072 52239 42
6 193453 86421 35
7 89085 39114 28
8 121904 62873 38
9 172140 73083 42
11 111996 55781 36
12 118305 54268 41
14 141056 57778 43
15 189805 80680 32
16 195722 70553 27
17 74397 29416 33
18 122435 50683 33
19 181045 84414 40
20 113045 54888 37
21 108239 47567 39
22 188242 74300 43
23 155599 66667 38
24 145141 76540 47
25 102303 44739 42
26 119093 56709 51
28 149389 59318 46
32 124498 50327 39
33 143925 57970 39
35 179358 70535 44
37 109177 46049 35
61 93691 41645 36
62 124207 55919 33
63 133185 51560 45
64 77994 34602 38
65 155243 61649 42
66 142843 58761 40
67 137037 55959 37
68 74932 39056 36
69 232934 90356 43
70 114607 53696 31
71 173364 75957 39
72 187139 70096 43
73 177668 67684 46
74 92141 36289 41
76 185198 46310 41
77 127603 53014 39
78 90606 52899 45
79 162333 74500 38
80 147334 70998 33
81 135201 58808 31
82 148715 59863 37
83 147705 55903 44
84 160177 66707 33
85 232705 86532 45
87 201699 113258 34
88 201511 80031 36
89 173151 77918 37
90 162203 66236 55
91 142455 57835 39
92 241390 87642 31
94 209580 98855 45
95 225306 80569 32
96 215141 82699 56
97 182283 80232 36
98 201126 87847 50
102 164319 69062 43
103 116668 55097 46
105 139821 54007 43

An average of 40-ish present taxa per sample seems quite low to me, is there anything obvious that I’m doing wrong in my analysis here?

For additional info: these reads were extracted from human fecal samples as paired end reads, which merged well during DADA2. The rarefaction depth was set to 10,000 reads, and the taxa were identified using the command qiime feature-classifier classify-sklearn, using the classifier “Greengenes 13_8 99% OTUs from 515F/806R region of sequences” retrieved from https://docs.qiime2.org/2020.8/data-resources/.

Thank you in advance for you time

If I correctly understand your post, it is not a low number of ASVs you are reporting in the table, but observed species (assuming level 7 is species). This would not be strange to me at all as I assume a lot of your ASVs would not be getting a species or are getting the same species and are being aggregated. Have you done the same analysis on your ASV table?
Two other notes: (i) why 10,000 reads when you have at least ~29000 reads in all samples, (ii) you lost a lot of reads between the raw reads and the post-denoising, you may want to take a closer look at where they are being lost.

3 Likes

Hi @jbisanz, thanks for the reply!

First off, that was my mistake, we indeed did rarefy to 29000, not 10000 reads. And yes, sorry, it’s observed species, not ASVs.

We classified down to level 7, which includes taxa that resolved only at higher levels (so it includes both taxa such as “k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__equorum” and “k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__;g__;s__”. This means that we resolved up to the species level, not only at the species level.

The majority of reads were lost at the first filtering step, with an average of 60% of all reads passing the step. I’m not sure what options to tweak here to fix this, perhaps max EE?

To find the ASV table, I converted table.qza from DADA2 into a tsv file, and 3382 ASVs. I assume that most of these eventually group together though, and all ASVs beyond ~1000 seem quite rare (low abundance) between the samples. I’ve attached the table here: table.tsv (999.3 KB)

Does this look like an issue, or is something like this expected from an ASV table?

Hi @michael-nakai, would it be possible for you to share your demux summarize and q2-dada2 denoising results with us?

Yep of course @thermokarst, here’s the demux (297.5 KB) and the denoising results (1.2 MB)

Hi @michael-nakai!

You probably should be using a rarefied table when generating your taxa bar plots (unless you have a really compelling reason to). The taxa bar plots normalize the table provided (by displaying the ratio of per-sample taxa), so no need to rarefy.

As @jbisanz pointed out above, you’re losing a lot of reads during your denoising step - I think you can relax your trunc params of 243/224 a bit, because there is a pretty big drop in read count during the merging step. Try experimenting with some larger values there (also, if you have the computational resources, tweak the n_threads parameter to something higher (consult your sysadmin if you need guidance), this can significantly improve the denoising runtime.