dada2-reverse reads with poor quality

Hi all,
I am at the QC step of the 16s data analysis. My amplicon size is V4 (about 300 bp), I ran 2x250 bp on miseq platform. I am not quite sure about if i am setting the right values for these commands. Would anyone take a quick look on it? Thank you. Here is my interactive quality plot for both F and R reads:

Below is what I wanted to trim and truncate, do you thinks this is a good enough setting for this data? Should I be concerned with the reads with lower whiskers, such as position 48, 78, 106, and 140-250? How are these poor quality reads are going to be processed in the dada2 denoising step? the entire reads removed or only the low score bases removed from the reads? Thank you very much!

qiime dada2 denoise-paired \
  --p-trim-left-f 0 \
  --p-trim-left-r 1 \
  --p-trunc-len-f 251 \
  --p-trunc-len-r 251 \
  --p-max-ee FLOAT 2.0 \
  --p-chimera-method consensus
  --i-demultiplexed-seqs pilot_trimmed.qza \
  --o-representative-sequences pilot_repseqs.qza \
  --o-table pilot_table.qza \
  --o-denoising-stats pilot_stats.qza```

Hi @arlandan,
Choosing dada2 parameters has been discussed on this forum extensively, I would suggest searching the topic and reading through a few of those topics.
Briefly though, I think setting your truncating values that high is not a good idea, especially when you look at the reverse reads. You will lose many reads during the initial filtering processing that you might otherwise be able to save. I would truncate as much as as the reverse as possible while allowing enough overlap between your reads (min 20bp overlap). Truncating the forward to 220-230 position would probably help as well.
Good luck!

1 Like

Hi @Mehrbod_Estaki
Thank you for your reply. Could you please explain a bit more regarding:

what is the initial filtering processing and Why will I lose many reads during that if I set them to 251?

why the boxes are missing for position 190-220 in forward reads?

My biggest concern is that I don't know if these quality filtering and processing are working well or not. I don't know how to evaluate the results and to make a decision based on that. As a beginner, this really bothers me. I would appreciate if anyone could give some suggestions. thank you very much.

Hi @arlandan,

Before DADA2 builds an error model and denoises your reads it will go through some initial filtering processes to discard bad quality reads and trim/truncate your reads at specific positions you tell it. You can check out those parameters in the dada2 --help file. These are done to remove bad quality sequences as these tend to negatively affect the denoising step.

This is because sequences from Illumina machines tend to lose quality with length, especially the reverse reads, and the longer the sequences the more errors they contain. These expected errors (maxE) are calculated and reads above that value are tossed completely by DADA2s even though most of the errors only occured at the tail end of your read. That means you are going to lose the whole read whereas if you had just trimmed the poor quality sequences at the end you could have saved the rest of the read. In paired-end reads especially since you can share information in the overlap regions you can essentially trim all your poor quality sequences without losing any resolution.

They are not missing, the boxes are just so small that they look like they are not there. If you were to zoom in that region you can see that they are indeed just small.

Upon completion DADA2 gives a summary output of all its various steps, including filtering, denoisiing, merging, chimera removal etc. See an example of this in the 'Moving Pictures' tutorial.

1 Like

Hi @Mehrbod_Estaki
thank you so much for your detailed answers. I am greatly appreciated.

Best!

1 Like

Hi @Mehrbod_Estaki,

Do you think it's necessary to remove the relatively low quality bases at positions 165-170 in reverse reads? Does the dada2 have such function that cuts low score bases in the middle instead of the at the ends? Thank you.

Hi @arlandan,
There is no function to do that dada2 (or really any other tools that I know of), mainly because it wouldn’t make sense to have gaps in your reads and deleting fragments would cause all sorts of other problems. Your reads are actually in pretty good shape. We generally consider reads with a median value (the black line in the dark boxes) of above 20 q score to be in good enough shape so even though that region seems to be low compared to the rest it is still in fact in decent shape. I would not worry about it.

1 Like

Hi @Mehrbod_Estaki
Many thanks for your explanation! Much appreciated.

Happy Friday!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.