Low feature frequency

Hi guys,

I'm working with paired-end data (from human stool samples) and I'm not sure if I'm okay to go to downsteam analyses.
After the quality filtering steps, I ran the qiime feature-table summarize and I think frequency per sample and frequency per feature are too low.
Is there a minimal number of sequences per sample?
Also which value should I choose for–p-sampling-depth?

I'm attaching table.qzv file. table.qzv (347.4 KB)

Thank you in advance!

Wow, less than 15% of samples have more than 100 reads! That is really really low and I think the issue is probably with quality filtering — you should carefully examine your denoising stats to see if you are losing reads at the filtering stage (in which case, trim more) or at the merge stage (in which case trim less).

This really depends on how you use the data and how much diversity is present… as a rule of thumb, you should really aim for 1000+ sequences per sample but the more the merrier. You can use alpha rarefaction to see what a reasonable sampling depth is for capturing the observed alpha diversity.

So I recommend going back to the denoising step and finding parameter settings that achieve better sequence yields.

Good luck!

Thank you for the reply.

I checked denoising stats and I think I'm losing reads at the merging stage. Is it right?
stats-dada2.qzv (1.2 MB)

So as you said, I should trim less.. but I have no idea which value I should choose..
Could you please look at my data?
I'm attaching demultiplexed sequence counts summary file and the command I used for DADA2.
demux.qzv (295.0 KB)

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --o-table table.qza --o-representative-sequences rep-seqs.qza --p-trim-left-f 0 --p-trim-left-r 1 --p-trunc-len-f 230 --p-trunc-len-r 200 --o-denoising-stats denoising-stats.qza -- verbose

Thank you for your help :slight_smile:

correct, you have a large number of reads but very few pass merging.

I do not know — it all depends on your amplicon length, which I do not know. Your quality looks good, though, so use as much sequence as you need to achieve merging. There are many other forum posts about merging, so look around for tips from other users…

To start, I recommend just setting your trim lengths to the total sequence lengths and seeing if you get a reasonable yield. If your sequences are not long enough to achieve merging, you should just use the forward reads and discard the reverse.

To start, I recommend just setting your trim lengths to the total sequence lengths and seeing if you get a reasonable yield.

So is it okay to trim at 250 both in forward and reverse reads?
Regarding this, what’s the minimum quality score that is acceptable for the analysis?

yes, give it a try and see if it works!

I aim to truncate where median quality drops below 20, as a rule of thumb. But dada2 will just toss any reads that have > 2 probable erroneous base calls (by default), so you can also just try it and see how many reads you lose at the filtering step.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.