I’ve pushed my FastQ files through the standard Qiime2 pipeline, including generation of taxa-bar-plots using a naive bayes classifier trained on GreenGenes 13_8, 99% coverage specific for the V4 region. After doing so, I downloaded the level 7 abundance table from opening taxa-bar-plots.qzv in Qiime2 View and counted the number of taxa per sample that were not at 0 abundance. The resulting table is below, with some extra data in terms of raw reads and post-DADA2 denoising reads:
Sample ID
Number of Reads (Raw)
Number of Reads (Post-denoising)
Number of Present Taxa
2
118804
58274
38
3
112727
81853
52
4
142620
58338
47
5
128072
52239
42
6
193453
86421
35
7
89085
39114
28
8
121904
62873
38
9
172140
73083
42
11
111996
55781
36
12
118305
54268
41
14
141056
57778
43
15
189805
80680
32
16
195722
70553
27
17
74397
29416
33
18
122435
50683
33
19
181045
84414
40
20
113045
54888
37
21
108239
47567
39
22
188242
74300
43
23
155599
66667
38
24
145141
76540
47
25
102303
44739
42
26
119093
56709
51
28
149389
59318
46
32
124498
50327
39
33
143925
57970
39
35
179358
70535
44
37
109177
46049
35
61
93691
41645
36
62
124207
55919
33
63
133185
51560
45
64
77994
34602
38
65
155243
61649
42
66
142843
58761
40
67
137037
55959
37
68
74932
39056
36
69
232934
90356
43
70
114607
53696
31
71
173364
75957
39
72
187139
70096
43
73
177668
67684
46
74
92141
36289
41
76
185198
46310
41
77
127603
53014
39
78
90606
52899
45
79
162333
74500
38
80
147334
70998
33
81
135201
58808
31
82
148715
59863
37
83
147705
55903
44
84
160177
66707
33
85
232705
86532
45
87
201699
113258
34
88
201511
80031
36
89
173151
77918
37
90
162203
66236
55
91
142455
57835
39
92
241390
87642
31
94
209580
98855
45
95
225306
80569
32
96
215141
82699
56
97
182283
80232
36
98
201126
87847
50
102
164319
69062
43
103
116668
55097
46
105
139821
54007
43
An average of 40-ish present taxa per sample seems quite low to me, is there anything obvious that I’m doing wrong in my analysis here?
If I correctly understand your post, it is not a low number of ASVs you are reporting in the table, but observed species (assuming level 7 is species). This would not be strange to me at all as I assume a lot of your ASVs would not be getting a species or are getting the same species and are being aggregated. Have you done the same analysis on your ASV table?
Two other notes: (i) why 10,000 reads when you have at least ~29000 reads in all samples, (ii) you lost a lot of reads between the raw reads and the post-denoising, you may want to take a closer look at where they are being lost.
First off, that was my mistake, we indeed did rarefy to 29000, not 10000 reads. And yes, sorry, it’s observed species, not ASVs.
We classified down to level 7, which includes taxa that resolved only at higher levels (so it includes both taxa such as "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__equorum" and "k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__;g__;s__". This means that we resolved up to the species level, not only at the species level.
The majority of reads were lost at the first filtering step, with an average of 60% of all reads passing the step. I’m not sure what options to tweak here to fix this, perhaps max EE?
To find the ASV table, I converted table.qza from DADA2 into a tsv file, and 3382 ASVs. I assume that most of these eventually group together though, and all ASVs beyond ~1000 seem quite rare (low abundance) between the samples. I’ve attached the table here: table.tsv (999.3 KB)
Does this look like an issue, or is something like this expected from an ASV table?
You probably should be using a rarefied table when generating your taxa bar plots (unless you have a really compelling reason to). The taxa bar plots normalize the table provided (by displaying the ratio of per-sample taxa), so no need to rarefy.
As @jbisanz pointed out above, you’re losing a lot of reads during your denoising step - I think you can relax your trunc params of 243/224 a bit, because there is a pretty big drop in read count during the merging step. Try experimenting with some larger values there (also, if you have the computational resources, tweak the n_threads parameter to something higher (consult your sysadmin if you need guidance), this can significantly improve the denoising runtime.