I have 16S V3-V4 amplicon sequences using 341F–805R primers. Samples are extracted from soil. The raw data have barcodes and primers, so I use cutadapt to remove it.
Here is the data after barcodes and primers removal:
I can see I have pretty good read quality both the forward and the reverse, so I didn't trunc too much, also to make sure it can overlap. But the biggest problem is I loss too many reads in the filtering step. I searched some forum posts, but their problem like low reads quality or strict –p-trunc-q value doesn't suit my questions.
I have no idea what causes the problem, any help will be much appreciated!
Hello!
Could you try to set those parameters to lower values?
As you can see from seqs stats, 98% of subsampled reads are shorter than 244. All these reads are discarded in your case. Setting it to a lower value will allow the reads to pass the filter after truncation. Or you can provide a 0 to disable truncation.
Hi!
I would try to run it again but this time with several sets of lower truncation values to compare results with one you got and decide based on it how to proceed. As you can see, you are also loosing reads on merging step and truncation of bad quality ends of reads may improve it.
Thanks for your suggestions! I have two questions now after I read through some other posts.
So based on the below post, it explains --p-trunc-len discard sequence greater than the value it gives, which in my case, it discard sequence longer than 244 instead of discard sequence shorter than 244? If I understand correctly? I got confused by this post and your reply, which is opposite answer to me.
You're exactly correct, good catch! I'll make sure we get that documentation updated.
I actually run it again with lower truncation values, like
--p-trunc-len-f 230
--p-trunc-len-r 240 \
and
--p-trunc-len-f 240
--p-trunc-len-r 240 \
All get worse results, and with the lower value I give to the forward reads, I only got less than 10% reads shown as non-chimeric.
What other parameters I can set to get rid of low-quality reads? Also, what causes the worse results after I lower truncation values since it should be better after I truncate the bad quality reads?
Nope, it will discard reads shorter than given value and truncate longer reads to this value
Could you try to run it with 230 for both forward and reverse reads? If you worry about overlap region in the latest version of Qiime2 you can specify it to a lower value than 12.
You lost a lot of reads when you applied 240/240 since a lot of your forward reads are shorter than 240 and where discarded.
230/240 looks similar to the stats with no trimming.
I will suggest to try 230/230 to see if it will improve stats and choose one with best outputs.
I tried to use the minimum overlap parameter, I set --p-min-overlap 4, actually the result doesn't change too much. It is basically the same as the no truncation result. Seems I can't do anything else to improve my results? Do you have any suggestions? Thanks!
Based on your results, you are still loosing reads on merging step after filtering. From one side, you have enough reads to proceed, from another - you results may be biased towards bacteria with shorter amplicons.
So option 1 will be just proceed with disabled truncation by Dada2 with a risk of biased data.
Option 2 is to use only forward reads. This way you will keep most of the reads but taxonomy annotations will be performed with shorter reads.
Option 3 is to try to merge your reads with vsearch plugin, where you can not only decrease overlapping region but also allow some mismatches and then denoise by Deblur to see if it will improve the output.
Then I run it on dada2, also I indicate maxExpectedError to 2 and 3, here is what I got
No reads passed the filter. trunc_len_f (238) or trunc_len_r (249) may be individually longer than read lengths, or trunc_len_f + trunc_len_r may be shorter than the length of the amplicon + 12 nucleotides (the length of the overlap). Alternatively, other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.
Then I realize, maybe it's because I forgot to set the primer length in figaro as I'm using the original data without trimming. But it still recommend me the same truncation position.
Figaro seems like a great tool to know the truncation value, but I got confused about the result I got when running dada2.
Hi, I've had a similar problem with some soil DNA. What I found worked best was to use only FWD reads and adjust the p=min to 4. Then I was able to keep more than 70% of reads after DADA2, whereas before adjusting I was only keeping between 30-40%. Interestingly, this is only happening for certain soil samples, other soil samples I collected in a different location are working much better. Just out of curiosity, what kind of soil/DNA extraction method are you using?
Thanks for your suggestion, I used V3V4, 341F-804R region, and the forward/reverse reads are around 245bp. So it's because the primer sets using or it's soil DNA caused only FWD reads are the best case you've got? I used powersoil kit (Qiagen) to extract my DNA.
I have used both the PowerSoil and a phenol:chloroform extraction method - the PowerSoil does seem to give me better quality DNA, especially after I add some powdered skim milk to the bead-beating step. I'm sequencing reads on a MiSeq v3 600, also with the 341F-804R primer pairs. The extractions I performed with the PowerSoil kit don't need the p-min drop to 4 to keep over 70% of reads after DADA2. But I do just use the FWD reads, the REV reads I usually get are just not good quality.