Low feature counts after DADA2

Joselyn_Chicas · August 27, 2020, 6:04am

Hello,
I have a very similar issue. I have been checking several posts on the Qiime2 forum related to removing primers and using dada2. I’m currently working on my paired-end demultiplexed MiSeq primers 341F & 785R ( V3-V4 16S region) with DADA2. I am only working with one sample first, because I want to set the best parameters, and after that I will run all my samples (almost 100 samples).
I started using DADA2 to also remove my primers (forward length 17 and reverse length 21).

With the primers that I am working, my truncating length should not go over 116 bp (785-341=444bp; 2x300reads=600; 600-444=156bp - 40 bp (20bp minimum overlap required + 20bp natural variation) =116 bp) (which was suggested in other post).
So, I decided to do the following:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 17
--p-trim-left-r 21
--p-trunc-len-f 283
--p-trunc-len-r 210
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

However, I was getting very low percentage of input non-chimeric (~20%).
So, after reading more posts I decided to use cutadapt to remove my primers, so I did:
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--o-trimmed-sequences trim-paired-demux.qza
--verbose
And I got this results:

After this result, I decided to run DADA2. I tried several options, 7 different combinations of parameters, but the best percentage I got was when I used the following parameters):
qiime dada2 denoise-paired
--i-demultiplexed-seqs trim-paired-demux.qza
--p-trim-left-f 60
--p-trim-left-r 35
--p-trunc-len-f 275
--p-trunc-len-r 200
--p-max-ee-f 5
--p-max-ee-r 5
--o-table table6-maxee.qza
--o-representative-sequences rep-seqs6-maxee.qza
--o-denoising-stats denoising-stats6-maxee.qza
--verbose

However, I still have some questions:

If I removed my primers using cutadapt, why do I still get some empty space to trim on the sequences? is that normal?, Do you think the parameters I used for DADA2 after cutadapt are ok (--p-trim-left-f 60 --p-trim-left-r 35) ?
Why do I still get so low percentage (around 30%)?, is there any specific percentage of input non-chimeric that we consider as good or bad ?
On my last parameters, I truncated 275 forward and 200 reverse, which means I exceeded 116 bp (I cut 125 bp), does that mean I truncated it too much?

Thank so much,
Any suggestion will be appreciate it.

llenzi · August 27, 2020, 8:24am

Hi @Joselyn_Chicas,

thanks for your extensive description!
Before going to your questions, can I ask you a little bit more on the library preparation? How many PCR cycles did you use in the initial amplification step? How was the input material (good/high quality, high/low quantity)?
That may be reflected in the percentage of chimeric reads!

On your questions now, I'll try to help you as much as I can!

After the cutadapt step, you have still sequences 300 bp long! That suggest me that not all your sequences include the primer you are looking for. Can you share the cutadapt log? You cold also try the '--p-discard-untrimmed' option to exclude sequences which does not show the primer as expected (I strongly suggest this option which should get rid of lots of sequencing noise!)
That may be related to the lab point I asked you earlier (but I am not a lab person really!). I agree 30% sequences left is a bit on the low side, but it may be what it is and you may have enough sequences to go on, depend on the samples and what you want to see.
Looking at the denoising stat you showed, it seems you greatest loss is at chimeric detection: >=70% left after merging step dropping to the 30% after chimeric detection. So, you have room for improvement at merging step if you like but you are good to go with these parameters to me.

As final note, how representative is the sample you are working on? Did you test a different sample? I don't like to work with a single sample in general because it may be an unlucky sampling choice!

Hope it helps.
Luca

Joselyn_Chicas · August 30, 2020, 3:12am

Hi @llenzi
Thanks so much for your reply.
We sent our samples to a third party lab but they reported that sample quality was good and PCR cycles were 30-35 cycles
I followed your suggestion and I tried try the ‘–p-discard-untrimmed’ option.
qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-discard-untrimmed
--o-trimmed-sequences trim-paired-demux.qza
--verbose

Now I was able to get all the sequences 280 bp long. So after that I tried different parameters but I was just able to improve my percentage of input non-chimeric (~31%).
qiime dada2 denoise-paired
--i-demultiplexed-seqs trim-paired-demux.qza
--p-trim-left-f 60
--p-trim-left-r 37
--p-trunc-len-f 278
--p-trunc-len-r 204
--p-max-ee-f 5
--p-max-ee-r 5
--o-table table-maxee.qza
--o-representative-sequences rep-seqs-maxee.qza
--o-denoising-stats denoising-stats-maxee.qza
--verbose

However, I have a few questions:

After cutadapt step, my sequences are 280bp long, Does this mean I have to calculate a different trunc limit that the one I used before (truncating length should not go over 116 bp)?
Do you have any suggestion about truncation and trimming parameters?

With the sample, I am not sure if it is the most representative sample, however I have a limited service units I can use to run my data so unfortunately I don't have other choice than try several times with only a few samples before running my whole data set.

Thanks so much,

llenzi · September 1, 2020, 10:24am

Hi @Joselyn_Chicas,

sorry for the late reply!
For the trimming length after cutadapt, yes I would adjust that to consider the adapter trimmed sequences.
Still, I'd like to work first on the loss-by-chimeric issue, which it is still your main issue!

I had other admin (thanks @Nicholas_Bokulich), pointing to your '-p-max-ee-f' and '-p-max-ee-r' settings! They look rather permissive (by allowing too much error in the reads) and may let chimeric sequence slip through. Is there any specific reason why you changed the default?
At the same time, you are trimming a lot from the beginning of the sequences (60 form f and 37 from r), the quality is looking good and, after the cutadapt step there should not be adapter anymore, so can I ask you why?

PS. In your very first message you refer to 'some empty space to trim', I am not sure I am following you on this, could you clarify, please?

Cheers,
Luca

Joselyn_Chicas · September 2, 2020, 7:10am

Hello @llenzi,

Thanks for your reply. I was using these parameters for ‘-p-max-ee-f 5’ and ‘-p-max-ee-r 5’ because it was suggested in another post (related to low percentage after DADA2), so I just decided to tried it on my data.
For quality plots, I thought I had to trim until the black lines started to show in the plot (that's why I was trimming my sequences around 60 from f and 37 from r), does that make sense or is it wrong? (it's really my first time working with Qiime2 so I guess I'm still not completely clear on those details).
I am trying to work with more samples now, so I tried using the default parameters for ‘-p-max-ee-f’ and ‘-p-max-ee-r’ and no trimming.
This are the parameters I used:
qiime dada2 denoise-paired
--i-demultiplexed-seqs trimpaired-demux-3.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 278
--p-trunc-len-r 204
--o-table table-3.qza
--o-representative-sequences rep-seqs-3.qza
--o-denoising-stats denoising-stats-3.qza
--verbose
This is the quality plots after cutadapt:

However, results are similar (low percentages of input non-chimeric) . And this time, the percentage of input merged also decreased.

Kind regards,
Joselyn

llenzi · September 2, 2020, 9:16am

Hi @Joselyn_Chicas,
thanks for clarify! The quality plots represent the quality distribution (for a certain position shown on the x-axis) as boxplot.
At the beginning of the sequences the quality values are so tight and close to each other that the plot is not showing the percentile rectangle! But this simply means that the quality are really good and no need to discard them (as long as you know that these are not adapter or PCR primer sequences!): so both '-p-trim-left' set to 0 make sense to me after the cutadapt step! (As opposite to the tail of the sequences where the quality is much more variable and the percentile rectangle is clearly visible).
This dataset looks a bit more representative of all your samples. But you are loosing some read overlap so you have less merged sequences. It seems to me that you are much strict on choosing the right trimming length than on choosing the left trimming length (278 vs 204), if you'd like to have the black rectangle above Q20 for the left sequences you may keep this criteria for the right sequences too, I would try 278 and 220 to keep more overlapping sequences.

On the non-chimeric sequence percentage, it may be related to the lab preparation (but I am not a lab person really), or it may be part of what your samples are. However, once you are happy with your trimming settings, it may be worthy to go on with the analysis to see if there are enough sequences to get your aim despite the high-chimeric percentage. It is probably better to discard them asap rather then try to work with potentially wrong sequences.
As alternative approaches, you could try to use left read alone or denoise with deblur which may return different percentages.

Hope it make sense

system · October 3, 2020, 3:16pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.