the command I use for DADA2 are same with this tutorial with parameters: --p-trim-left-f 17 --p-trim-right-f 21 --p-trunc-len-f 250 --p-trunc-len-r 205 --p-n-threads 0 --verbose
the results that I get from that analysis are:
Then I compare my analysis using Deblur following this tutorial with parameter attached join-deblur.txt (900 Bytes)
and the results are (I think the number of sequences survived from both analysis are quite similar):
What primers/marker gene are you using? What is your expected amplicon length?
It looks like you are losing a massive amount of sequences, and the yields are (I would say) unacceptably low. From the dada2 log, it looks like the merging step is where this is failing. In deblur, it looks like maybe the trim length is the problem; you use 400, in which case any reads < 400 nt will be dropped. Is that intended, based on your amplicon length?
I would recommend adjusting your parameters to ensure that more reads are merging and passing the length filter. What happens if you process just the forward reads as single-end reads? (that could help triangulate this issue and tell us whether there is something wrong with these reads, e.g., lots of artifact/non-target DNA, or if this really is a merging/length issue)
I disagree, it looks like deblur recovers approximately 3X as many sequences as dada2, accounting for the Archaea that are observed with deblur but not dada2. With such low sequence counts, though, I am worried about the results from both.
That makes sense. Just making sure! Still, would be good to know how this performs when running the single-end data. For some reason merging reads appears to be failing in dada2. What primers are you using?
It looks like your parameter settings should leave enough overlap to permit merging. The sequences are being dropped at the merge step, not denoising, so your truncation parameters do not appear to be the issue. @benjjneb do you have any thoughts on what’s happening here?
Yes. As a matter of fact, this may be beneficial because the quality-filter step before deblur is going to trim sequences at different sites, wherever the quality drops below a certain threshold. So deblur’s trimming step is going to do two things: drop any sequences shorter than that length (unless if you turn off trimming) and trim sequences longer than that length. Hence, depending on how much your sequences are being pre-trimmed in quality-filter you may want to set a lower deblur trimming length than what you are using in dada2.
Oops, looks like I misread what you wrote above. Thanks for clarifying!
Still, the different taxonomic results here do not concern me — because it seems that these data are being over-filtered already, so the results from either analysis is not representative of the whole. If you get wildly different results after improving sequence yields, then I think we have something to worry about!
It’s 460 rather than 420 because this library prep includes the primers on the reads. So, there is 420nts of biological sequence, but 460nts of sequenced amplicon. Right now, everything except a crazy short archaea 16S is being lost in merging due to lack of overlap.
You need to increase your trunc-len to keep at least 460+20+BIOLOGICAL_LENGTH_VARIATION total bases. 485 or up would be a good target.
I have another question, why the sequence improving very much? Previously, I set my trunc-len based on quality of forward and reverse reads, so I set my --p-trunc-len-r 205 (demux.qzv) and got very low sequences.
I already re-analyze my sequence with deblur using --p-trim-length 380, but there is no improving in sequences recovered. I also used default parameter for quality-filter q-score-joined and the result is not improving too (deblur stats). I think my parameter with vsearch join-pairs--p-maxdiffs 2 --p-minovlen 15 --p-minmergelen 300 are not too strict because I still got >18000 raw reads before deblur analysis. Do you have any idea? or my trim-length still too high?