dada2 | trimming > filtering huge loss | paired data - overlap between R1 R2 is the whole sequence

Hello,

I have a filtering issue with dada2 on paired data on a small target of ~135 pb.

Example of data:
In bold the 2 primers :
R1:

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGACCAAGTCTCTGCTACCGTACGTCTTCTTAATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

R2:
TGATCCTTCTGCAGGTTCACCTAC GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGCGACGGGCGGTGTGTAC TGTAGAACCATGTCGTCAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Primers on R1 and R2 are correctly removed :
R1:

GCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC

R2:

GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGC

The target size is ~135 pb, when I see the .qzv after trimming the primers I see a huge drop of quality after the position 134. This is the same things with the R2 file.
small_target_quality_no_primers|690x430

So I want to trim at position 134 with dada2.

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed_remove_primers_wild_2.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 134
--p-trunc-len-r 134
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Unfortunatly the merged step fail with this trimming:
input: 82880, filtered: 2098

If I don't trim (with the value 0 for --p-trunc-len-f and --p-trunc-len-r), I don't have this filtering problem:
input: 82880, filtered: 81166

I think the problem is that the target is fully captured by R1 and R2 (in fact the overlap is the whole sequence). Maybe I just have to choose the R1 files? The R1 has the best quality score (as always).
Or maybe I can just use dada2 with no trimming at all because it's seems to work.

Thank you for any advice and have a nice day :slight_smile:
Jérémy Tournayre

Hello!
Thank you for very detailed description, that helps a lot.
I think you lost a lot of the reads on the filtering step because you provided 134 as a truncation value. So, dada2 will try to truncate all the reads at position 134 and discard all the reads that are shorter. My recommendation will be to set a lower truncation values so you will keep larger number of reads. You can see it as well in a case when you set it off.

In that case you can use only forward reads since you are not gaining a lot from merging reads.

1 Like