I'm working with paired-end data (from human stool samples) and I'm not sure if I'm okay to go to downsteam analyses.
After the quality filtering steps, I ran the qiime feature-table summarize and I think frequency per sample and frequency per feature are too low.
Is there a minimal number of sequences per sample?
Also which value should I choose for–p-sampling-depth?
Wow, less than 15% of samples have more than 100 reads! That is really really low and I think the issue is probably with quality filtering — you should carefully examine your denoising stats to see if you are losing reads at the filtering stage (in which case, trim more) or at the merge stage (in which case trim less).
This really depends on how you use the data and how much diversity is present… as a rule of thumb, you should really aim for 1000+ sequences per sample but the more the merrier. You can use alpha rarefaction to see what a reasonable sampling depth is for capturing the observed alpha diversity.
So I recommend going back to the denoising step and finding parameter settings that achieve better sequence yields.
I checked denoising stats and I think I'm losing reads at the merging stage. Is it right? stats-dada2.qzv (1.2 MB)
So as you said, I should trim less.. but I have no idea which value I should choose..
Could you please look at my data?
I'm attaching demultiplexed sequence counts summary file and the command I used for DADA2. demux.qzv (295.0 KB)
correct, you have a large number of reads but very few pass merging.
I do not know — it all depends on your amplicon length, which I do not know. Your quality looks good, though, so use as much sequence as you need to achieve merging. There are many other forum posts about merging, so look around for tips from other users…
To start, I recommend just setting your trim lengths to the total sequence lengths and seeing if you get a reasonable yield. If your sequences are not long enough to achieve merging, you should just use the forward reads and discard the reverse.
I aim to truncate where median quality drops below 20, as a rule of thumb. But dada2 will just toss any reads that have > 2 probable erroneous base calls (by default), so you can also just try it and see how many reads you lose at the filtering step.