Low number of ASVs identified per sample

michael-nakai · November 18, 2020, 6:25am

Hello,

I’ve pushed my FastQ files through the standard Qiime2 pipeline, including generation of taxa-bar-plots using a naive bayes classifier trained on GreenGenes 13_8, 99% coverage specific for the V4 region. After doing so, I downloaded the level 7 abundance table from opening taxa-bar-plots.qzv in Qiime2 View and counted the number of taxa per sample that were not at 0 abundance. The resulting table is below, with some extra data in terms of raw reads and post-DADA2 denoising reads:

Sample ID	Number of Reads (Raw)	Number of Reads (Post-denoising)	Number of Present Taxa
2	118804	58274	38
3	112727	81853	52
4	142620	58338	47
5	128072	52239	42
6	193453	86421	35
7	89085	39114	28
8	121904	62873	38
9	172140	73083	42
11	111996	55781	36
12	118305	54268	41
14	141056	57778	43
15	189805	80680	32
16	195722	70553	27
17	74397	29416	33
18	122435	50683	33
19	181045	84414	40
20	113045	54888	37
21	108239	47567	39
22	188242	74300	43
23	155599	66667	38
24	145141	76540	47
25	102303	44739	42
26	119093	56709	51
28	149389	59318	46
32	124498	50327	39
33	143925	57970	39
35	179358	70535	44
37	109177	46049	35
61	93691	41645	36
62	124207	55919	33
63	133185	51560	45
64	77994	34602	38
65	155243	61649	42
66	142843	58761	40
67	137037	55959	37
68	74932	39056	36
69	232934	90356	43
70	114607	53696	31
71	173364	75957	39
72	187139	70096	43
73	177668	67684	46
74	92141	36289	41
76	185198	46310	41
77	127603	53014	39
78	90606	52899	45
79	162333	74500	38
80	147334	70998	33
81	135201	58808	31
82	148715	59863	37
83	147705	55903	44
84	160177	66707	33
85	232705	86532	45
87	201699	113258	34
88	201511	80031	36
89	173151	77918	37
90	162203	66236	55
91	142455	57835	39
92	241390	87642	31
94	209580	98855	45
95	225306	80569	32
96	215141	82699	56
97	182283	80232	36
98	201126	87847	50
102	164319	69062	43
103	116668	55097	46
105	139821	54007	43

An average of 40-ish present taxa per sample seems quite low to me, is there anything obvious that I’m doing wrong in my analysis here?

For additional info: these reads were extracted from human fecal samples as paired end reads, which merged well during DADA2. The rarefaction depth was set to 10,000 reads, and the taxa were identified using the command qiime feature-classifier classify-sklearn, using the classifier “Greengenes 13_8 99% OTUs from 515F/806R region of sequences” retrieved from https://docs.qiime2.org/2020.8/data-resources/.

Thank you in advance for you time

jbisanz · November 19, 2020, 6:23pm

If I correctly understand your post, it is not a low number of ASVs you are reporting in the table, but observed species (assuming level 7 is species). This would not be strange to me at all as I assume a lot of your ASVs would not be getting a species or are getting the same species and are being aggregated. Have you done the same analysis on your ASV table?
Two other notes: (i) why 10,000 reads when you have at least ~29000 reads in all samples, (ii) you lost a lot of reads between the raw reads and the post-denoising, you may want to take a closer look at where they are being lost.

michael-nakai · November 20, 2020, 8:47am

Hi @jbisanz, thanks for the reply!

First off, that was my mistake, we indeed did rarefy to 29000, not 10000 reads. And yes, sorry, it’s observed species, not ASVs.

We classified down to level 7, which includes taxa that resolved only at higher levels (so it includes both taxa such as "k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__equorum" and "k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__;g__;s__". This means that we resolved up to the species level, not only at the species level.

The majority of reads were lost at the first filtering step, with an average of 60% of all reads passing the step. I’m not sure what options to tweak here to fix this, perhaps max EE?

To find the ASV table, I converted table.qza from DADA2 into a tsv file, and 3382 ASVs. I assume that most of these eventually group together though, and all ASVs beyond ~1000 seem quite rare (low abundance) between the samples. I’ve attached the table here: table.tsv (999.3 KB)

Does this look like an issue, or is something like this expected from an ASV table?

thermokarst · November 25, 2020, 8:26pm

Hi @michael-nakai, would it be possible for you to share your demux summarize and q2-dada2 denoising results with us?

michael-nakai · November 26, 2020, 2:58am

Yep of course @thermokarst, here's the demux (297.5 KB) and the denoising results (1.2 MB)

thermokarst · December 1, 2020, 3:46pm

Hi @michael-nakai!

You probably should be using a rarefied table when generating your taxa bar plots (unless you have a really compelling reason to). The taxa bar plots normalize the table provided (by displaying the ratio of per-sample taxa), so no need to rarefy.

As @jbisanz pointed out above, you're losing a lot of reads during your denoising step - I think you can relax your trunc params of 243/224 a bit, because there is a pretty big drop in read count during the merging step. Try experimenting with some larger values there (also, if you have the computational resources, tweak the n_threads parameter to something higher (consult your sysadmin if you need guidance), this can significantly improve the denoising runtime.

system · January 1, 2021, 9:46pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.