Hello,
I have a filtering issue with dada2 on paired data on a small target of ~135 pb.
Example of data:
In bold the 2 primers :
R1:
GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGACCAAGTCTCTGCTACCGTACGTCTTCTTAATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
R2:
TGATCCTTCTGCAGGTTCACCTAC GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGCGACGGGCGGTGTGTAC TGTAGAACCATGTCGTCAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
Primers on R1 and R2 are correctly removed :
R1:
GCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC
R2:
GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGC
The target size is ~135 pb, when I see the .qzv after trimming the primers I see a huge drop of quality after the position 134. This is the same things with the R2 file.
small_target_quality_no_primers|690x430
So I want to trim at position 134 with dada2.
qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed_remove_primers_wild_2.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 134
--p-trunc-len-r 134
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
Unfortunatly the merged step fail with this trimming:
input: 82880, filtered: 2098
If I don't trim (with the value 0 for --p-trunc-len-f and --p-trunc-len-r), I don't have this filtering problem:
input: 82880, filtered: 81166
I think the problem is that the target is fully captured by R1 and R2 (in fact the overlap is the whole sequence). Maybe I just have to choose the R1 files? The R1 has the best quality score (as always).
Or maybe I can just use dada2 with no trimming at all because it's seems to work.
Thank you for any advice and have a nice day
Jérémy Tournayre