Small feature numbers were observed

Hello, Qiime2 team

Recently, I analysed 16S and ITS2 amplicon sequencing using QIIME2, but I got much less features than those otu numbers we obtained in our previous dataset using QIIME1. The 16S data is of soil samples, and therefore it is expected that there should be at least tens of thousands OTUs(features) in the final OTU table, but I only see around 1000 features after using DADA2. Although I see dada2 can remove all those noisy otus but seems like in my case the obtained number was too small to be credible.

Details: after DADA2, I had at least 10,000 sequences per sample, but ended up with only 955 features. The length of sequences is about 210 bp.

Commands used: qiime dada2 denoise-single --i-demultiplexed-seqs 16S_single-end-demux.qza --p-trim-left 0 --p-trunc-len 210 --o-representative-sequences 16S_forward_with_primer-dada2.qza --o-table 16S_table_dada2.qza --p-n-threads 16

Is there any possible step could have led to the small feature numbers? I am eager to solve this issue.

1 Like

Hi @hongwei2017,

I’m not that surprised by those numbers. In my experience, you see a dramatic reduction of features with these new algorithms.

Now, you mention 955 features; are these unique SOTUs or sequences per sample? I think is fine if you are getting 955 sOTUs, obviously we will need to better check if it’s sequences.


Hi @antgonza

Thanks for this communication. 955 is the number of features I observed in feature table that is comparable to OTU table in Qiime1. For each sample, 10,000 sequences are obtained after quality check using DADA2. I have no idea why only 955 features, given we saw more than 10,000 OTUs before, this make me not confident with the feature table I got. Any more thoughts?


I think that’s just fine.

Perhaps an “easy” way to ease your concerns is by going to Qiita and check the deblur/close-reference artifacts from public studies so you can see the differences. A quick example is Moving pictures of the human microbiome, which returned for close reference: Number of samples: 1967, Number of features: 22765, Minimum count: 131, Maximum count: 86128, Median count: 27883, Mean count: 24536 and the same raw data, trimmed at 150bps yielded: Number of samples: 1967, Number of features: 7948, Minimum count: 43, Maximum count: 25196, Median count: 986, Mean count: 1629. See how features came down from 22765 to 7948.

Hope this helps.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.