Hi @ksn, hopefully I can help a bit. Though I’d need more information to help answer your questions.
What primer-pair are you using?
What is the expected amplicon size (in base-pairs)? I assume ~ 488 bp? If so, there will be a limited window of overlap given the specified trim settings. Which may be a problem given the quality-scores.
As you are pointed out, the quality at the end of the reads can limit your ability to merge as well. So, there is a balance between length trimming, the quality of the bases near the end of the reads (which affects merging), and the number of reads you would like to retain post-merging.
I’ve processed quite a few data sets like this, more often encountering these issues from 2 x 300 kits. In this case I’ve continued on with only processing the forward reads. You can feed your output from your demux directly to denoise-single (it will know to only use the forward reads).
Given #4, I try to sanity-check how much data I may be losing/gaining by downstream analysis of both the data generated from the forward reads and the merged reads.
The choice of which chimera removal method to use largely depends upon the biological question you are trying to address. Especially, if your question depends upon analyzing low-diversity samples. I found this post about sample diversity on the DADA2 GitHub help page. There is also a discussion about sample “error history”. The following quote taken from this discussion:
First, it is not advised to pool samples that don’t share an “error history”, in particular samples that come from different sequencing runs or different PCR protocols. Samples from different runs should typically be run through the dada(…) function separately, so that the correct run-specific error rates can be learned.
As you know DADA2 will remove chimeras for you using a de novo approach.
Be sure to check the output of uchime-ref. There can be issues with the false-positive detection of chimeras when using usearch / vsearch with default settings. The reference sequence database being used and the parameter settings can affect how well uchime-ref can detect chimeras. There have been a few cases in which some of the most abundant OTUs were removed from the data even though they were clearly not chimeric based on follow-up checks.
To combat this, I’ve found that increasing the --p-minh to ~1.0 - 2.0 works well for 16S (decreases the sensitivity of detecting chimeras). But you should play with the parameters and sanity-check that the sequences flagged as being chimeric are reasonable.
One additional question: I have technical replicate for one of the samples - what’s the process to treat it during downstream analyses such comparing differential abundance.
As for your replicate sample, I’d consider the advice here.
The de novo chimera removal approach occurs by default, e.g. --p-chimera-method consensus, when using DADA2 via QIIME, unless you tell it to use the --p-chimera-method pooled approach or not to perform chimera removal --p-chimera-method none. I’ve not had any issues with using the default. Again, you need to consider if you should use either the pooled or consensus approach we discussed earlier.
I did not mean to deter you from using uchime-denovo. In fact, many (myself included), occasionally make use of both de novo and reference-based chimera removal. I only meant to point out the considerations at each step.
Pairs that failed merging due to various reasons:
25 too few kmers found on same diagonal
1 potential tandem repeat
2624 too many differences
2336 alignment score too low, or score drop to high
40 staggered read pairs
Statistics of all reads:
301.00 Mean read length
Statistics of merged reads:
512.85 Mean fragment length
15.16 Standard deviation of fragment length
0.61 Mean expected error in forward sequences
3.78 Mean expected error in reverse sequences
0.83 Mean expected error in merged sequences
0.40 Mean observed errors in merged region of forward sequences
3.90 Mean observed errors in merged region of reverse sequences
4.30 Mean observed errors in merged region
vsearch v2.7.0_linux_x86_64, 126.0GB RAM, 24 cores GitHub - torognes/vsearch: Versatile open-source tool for microbiome analysis
I am planning to follow deblur method this time. Is there anything I would need to consider esp. on trimming because, when we take into account the joined reads, the quality is low in the middle of the sequence.
Alternatively, even if I continue to use DADA2, do you think that trimming off (--p-trim-left-r 50) around 50bases from 5' instead of truncating (--p-trunc-len-r 250) the length after around 250 will help in joining properly?
I am currently trying multiple options and I am sorry to ask about every issues.
You mentioned that use denoise-single will only use the forward reads. But do you know any method to only use the reverse reads? My data have the high-quality reverse reads, but low quality forward-reads. Thanks in advance!
I suspect that you should be able to trick QIIME 2 to import the reverse reads as forward reads by using the manifest format approach. That is, just set the direction column for these reads to forward for all your R2 reads, and do not include your R1 reads.
I can not speak to any sequence data validation steps that occur via this or other import approaches. Perhaps @ebolyen, @thermokarst, or someone else can provide some insight on weather this will work, or if there is a more appropriate option.
Hi @wym199633, @SoilRotifer is correct — you can trick QIIME2 into using the reverse reads as forward reads by using the manifest file format. With the exception of the CASAVA1.8 paired-end format, which expects specific filename patterns to differentiate the forward and reverse reads, all other formats in QIIME2 are direction-agnostic.