Very low recovered sequences (approx. 10%) after denoising using DADA2

Hello,

I used the following primers to analyze V3-V4 from mice fecal samples:

V3V4_357F_0_Forward CCTACGGGNGGCWGCAG
V3V4_IllR_0_Reverse GACTACHVGGGTATCTAATCC

Unfortunately, there was an issue and we ended up sequencing PE150. For analysis using QIIME2, I took two approaches: 1) treat as single-end, and 2) for merge using DADA2 justConcatenate. After denoising, I loose a lot of sequences (on average, only 10% of entire sequence is recovered).

My questions are:

  1. Is it okay to assume that such low recovery rate after denoising is due to short sequencing reads? (after trimming, a get 110bp reads)

  2. My sequencing depth was 1M per sample, so even with only 10% recovery, I deemed I have enough reads to proceed with the analysis (my rarefraction graphs also seemed fine). Would it be okay to infer information from these data?

Thank you very much for your help!

1 Like

Hello @mhk

Welcome to the forums! :qiime2:

That’s exactly what I would do! :+1:

Maybe… you could try trimming even shorter. But if the quality is bad because the run almost failed, I’m not sure there’s much else you can do.

Yep. Quality over quantity. Heck, my most cited paper uses “a modest sequencing depth of 5,000 observations per sample”, lol

You are good to go!

Colin

1 Like

Nick mentioned that this is really low.

Can you post the quality stats so we can review where you are losing all your reads? We might have some more options!

1 Like

Thank you very much for your reply.

If I understand correctly, DADA2 just Concatenate only works through the R package (not within QIIME2). After merging and preprocessing, I get two files: ASV table as a .txt file and feature sequence file as .fna. What is the best way to import the ASV table to QIIME2? I I tried using the biomformat R package, but keep getting errors related to number of columns.

Again, thank you for your help!

1 Like

I hope this is what you meant by quality stats. I am very new to this, so if there is any particular file that might help diagnose this, let me know!!

Screen Shot 2020-03-14 at 2.27.40 PM|396x500

1 Like

Hello again @mhk,

Yes, that’s exactly the table I was looking for. And it confirms that only ~10% of your reads are being kept, and also shows us which step is removing them.

And yet, quality remains low… How frustrating! :scream_cat:

Having you tried trimming for shorter reads? Say 100 bp or 90 bp? Are you trimming at all at the start of the read?

Colin

1 Like

Hello,

I am trimming adaptors using cutadapt:

qiime cutadapt trim-single
–i-demultiplexed-sequences 16S_SE_R01.qza
–p-cores 4
–p-front CCTACGGGNGGCWGCAG
–o-trimmed-sequences primer-trimmed_16S_SE_R01.qza
–verbose
&> primer_trimming_R01.log

Then, I truncate according to the demux summarize data:

qiime dada2 denoise-single
–p-n-threads 4
–i-demultiplexed-seqs primer-trimmed_16S_SE_R01.qza
–p-trim-left 0
–p-trunc-len 112
–output-dir DADA2_denoising_output
–verbose
&> DADA2_denoising.log

When you say trimming, you mean trimming from 5’-end, correct?

1 Like

I guess that’s confusing…
In dada2, “trim left” is from the start of the read, and “truncate length” is from the end

So I should have said “Have you tried truncating for shorter reads?”

Can you post the quality plots that show average q-score? We can use those to select a good location for both trimming and truncating.

Colin

Screen Shot 2020-03-18 at 1.27.29 PM Screen Shot 2020-03-18 at 1.27.38 PM

I’ve also attached a “zoom-in” version of the plot. I used this plot as reference when choosing my value for --p-trunc-len to 112.

Again, thanks!

OK perfect! base 112 looks like a good choice based on those plots.

Dada2 can be picky about quality. Have you tried truncating at 110, just to make sure you don’t include any poor base pairs? Trimming at 100 could keep your quality near Q39, which is great, should also keep more reads.

Let me know if these other setting work well for you!

Colin

Hi!

Thank you for the comment and I will definitely try that. But one thing that concerns me is that because my reads are too short, if my calculation is correct, my reads only cover around 50bp of the V3 region. Do you think such short coverage can provide useful information about the data?

Thanks!

Sure! Of course long reads and more coverage would be good, but I think your ~110 bp reads are usable.

My teams have usually used the EMP V4 primers, so if you were planning to resequence, you could try using those. https://www.zymoresearch.com/pages/ngs16

Let me know what you discover!

Colin

1 Like