On the filtering, the truncation filter excludes any sequences shorter then 380, so you may check how many there are in your initial sequences.
On the number of chimeric sequences you may try to change the chimera detection method, to see if it gives better results.
That said, I would like to highlight the fact that dada2 it is not suitable for pre-merged sequences because the quality scores are defined by pear in your case and dada2 requires the original quality scores. You can pre-merge your sequences if you like, but in this case you should use deblur plug in for the denoising step. Or just use the dada2 pair command and the cutadapt output.
The sequences excluded are not so much, but i was expecting to lose some of them (just not this much).
I'm curious on the fact you mentioned about "original quality scores" and I'd like to know more about it. What you mean by that and how the "trimmed" reads may influence dada2?
Hi @Phoe
Sorry I was not so clear. The trimming does not alter the quality score, neither the cutadapt adapter trimming nor the trimming setting within dada2.
However, the merging step does alter the quality score. The quality score for the sequence merged by pear is derived by aggregating the quality score for read 1 or read 2 (it may be just the sum of them but I don't recall the exact way pear works at the moment), but for sure the quality score for the overlapping sequences is different from the quality score for this region in the separate R1 and R2. This is enough for invalidate dada2 expectations!
Would you please guide me to where should i trim my reads in order to get the overlap? Could --p-trunc-len-f 257 --p-trunc-len-r 218 work or am i being too soft?
Before answering, can I ask you if you can open a new thread? Please because this question is now out of topic!
Now, to answer your question. A topic of interest may be the following:
So you can have the general idea on what to do. I do think that the trimming option you propose are very permissive, and I usually go stricter (such as -p-trunc-len-f 245; -p-trunc-len-r 180). However, given that the quality of your reverse read drop so early, you would loose the overlapping region completely (please note dada2 requires at least 12 bp of overlap). So you may start with your parameters and see how many sequence you get at the end of the dada2 process. Then try more stringent settings. This try and repeat is very normal with this type of analysis so I would not expect to get it right at first try!
In worst case, you may savage your data by using forward read only (and dada2 single as well!)