Very low filtering and merged score after using DADA2

RachelPaes · May 8, 2023, 5:08pm

Hello everyone,

I have a question related to dada2. I have seen similar questions to old posts in the forum, but I didn't find a solution to my problem and I really appreciate your help. I´m having problems with filtering and merging. Both are very low.

I am analyzing 16S data of 96 samples.
The region is V4
Amplicon size is 291
Primers are: 515F and 806R
my reads 151 bp
overlap = 11bp

My overlapping is 11bp, which is quite short and I think that is why I am having problems with merging. I know that dada2's default for overlapping is 12bp. There is a way of changing it? How can I get my reads merged?

The parameters I´ve used:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-primer-trimmed.qza
--p-trim-left-f 0
--p-trim-left-r 3
--p-trunc-len-f 150
--p-trunc-len-r 150
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qz

Regarding filtering, I don't know what to do to improve it.

Thank you so much for your help.

You can find the files bellow:

denoising-stats.qzv (1.2 MB)
rep-seqs.qzv (463.1 KB)
table.qzv (485.2 KB)

colinbrislawn · May 8, 2023, 5:12pm

Hello Rachel,

Welcome to the forums!

This is a great first post. Lots of great details here.

You are correct. And you can change it with --p-min-overlap
All dada2 settings here: denoise-paired: Denoise and dereplicate paired-end sequences — QIIME 2 2023.2.0 documentation

However, I think there's a bigger issue because most of your input reads do not pass the filter.

Would you be willing to post the quality score graphs of your reads before running dada2?

Thanks!

RachelPaes · May 8, 2023, 6:43pm

Hi Colin. Thank you for answering my question.

OK, I will take a look at the documentation. Thank you

Of course. I´m going to share with you the primer trimming code as well. So you can check if it is ok.

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-cores 16
--p-front-f GTGYCAGCMGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--o-trimmed-sequences demux-primer-trimmed.qza
--verbose
&> primer_trimming.log

I´ve run the quality score plot after primer trimming.
demux-primer-trimmed.qzv (321.7 KB)

You can also find the quality score before primer trimming
demux-paired-end.qza.qzv (316.1 KB)

I just ran dada2 with the data before primer trimming and it was ok for filtering. Please have a loook.
But merge looks the same.

denoising-stats2.qzv (1.2 MB)

Do you think the sequences came trimmed and that is why the first filtering was so low?

colinbrislawn · May 8, 2023, 7:11pm

Thank you for sharing all this context. I think I now have a complete picture of what happened here.

The unsolved problem is read joining, so let's start with the overlap calculation:

I agree with your math:
151*2 - 806-515 = reads - region
302 - 291 = reads - region
11 = overlap

This is very close, and the exact positions of the primers will make or break joining. Like, the primers used during sequencing (which may be different from those used during amplification), could change the ability to overlap reads.

515 |--------------------------------| 806
f   |-->                          <--| r primers
    |---------------><---------------| sequencing from start of primers
       |-----------<-->-----------| sequencing from end of primers

Because cutadapt trim-paired removes primers, we know primers were in the reads. Without primers, we get 132 f and 131 r, which is not enough to overlap this region.

You can't join these reads.

(If the sequencing core says they should join, see how they do it. Maybe their math is different )

The good news is that the forward read quality looks great, and you should be able to analyze your samples using just the forward read. Let us know if you have any questions about doing this!

RachelPaes · May 9, 2023, 1:05pm

Hi Colin,

Thank you! But I have another question. I´ve read on Initial QIIME Processing : earthmicrobiome That I don´t need to trim the primers, just barcodes. Since the sequencing already came demultiplexed to me, in theory a can just go and use dada2, because I don´t have barcores attached to them, right?
If no trimming is needed, so I will have 150bp?? Then If I use the --p-min-overlap of 6bp I would be able to merge them?

Another question, if I use just the forward read am I going to loose lots of information?

Thank you so much for your help, patience and guidence. I´m new at this and I have a lot questions hehe.

colinbrislawn · May 9, 2023, 3:44pm

Good morning Rachel,

We are making good progress!

Great question! The EMP uses a special sequencing method so there are no primers in the reads. (This is the method I mentioned above)

515 |--------------------------------| 806
f   |-->                          <--| r primers
    |---------------><---------------| Normal Illumina sequencing
       |-----------<-->-----------|    EMP method (no primers!)

Your reads do have primers in them, as you found by running cutadapt.

Try it and see!

The taxonomic resolution may be reduced because the ASVs will be shorter. But you get to keep most of your reads and avoid any bias due to joining, so that's very good!

RachelPaes · May 9, 2023, 7:09pm

Hi Colin,

Thank you so much! I´m going to try it.

system · June 10, 2023, 1:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.