How to deal with the demultiplexed and quality controlled data

Dear all,

I got a set of pair end data, which is demultiplexed, with low quality sequence removed, adaptor, barcode and primer trimmed, but haven't been joined yet.

As I know (I am not sure actually), DADA2 has its own algorithm to remove the low quality sequence and to join the read1 and read2, generating representative sequence for the downstream analysis. So I think the best data for DADA2 may be the demultiplexed data with only barcode and primer trimmed. But what can I do with my data which has been quality controlled?

I do tried to import the the data and applied the DADA2, but got the unsatisfied results because almost 30% of the sequences were removed:

May I know is there any other way I can obtain the representative sequence and the feature table from my data?

Thanks for all the ideas and suggestions!!

Yes, dada2 denoises the sequences, i.e., it builds an error model of the sequences to predict and correct errors that it finds. Sounds like the best place to start is reading the original publication for dada2

Sounds like your data has most likely just been filtered based on Q scores. It is probably still okay to run this through dada2.

That is actually very normal for dada2. 30% sounds high, but I have seen higher and this should not be unsatisfying! Let’s pick this apart:

  1. ~5% is being dropped at the “filtered” stage — dada2 does an initial pass to filter low quality sequences based on Q score.
  2. Looks like none are being dropped during denoising.
  3. 5-10% are being dropped during merging. Check around on this forum — there are lots of posts for troubleshooting merging issues — but make sure you are trimming at the right spots and you have ample sequence left over for 20+ nt of overlap between forward and reverse reads. Sounds like you may be using qiime1-style quality filtering, which will also be trimming reads where quality drops off. This can be problematic for attempting to merge sequences if they are being truncated too much!
  4. Another ~5% are being identified as chimera.

So all in all this looks like a reasonable filtering profile, given that error and chimera can be rife. If I were you, I would try to fix the merging issue to increase yields, but I might not try too hard — if you are using qiime1-style quality filtering these are probably just sequences that have low quality appearing earlier on in the read and may just be filtered by dada2 anyway if you attempt to truncate them less prior to dada2.

Good luck!

Thank you very much for the suggestions!!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.