Few sequences left after merge

hesongbing · October 28, 2019, 11:01am

Dear All,

Can anyone advise me about this issue? After running the command below

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed_seqs_bacteria.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 228
--p-trunc-len-r 206
--o-table table_bacteria_dada206.qza
--o-representative-sequences rep_seqs_bacteria_dada206.qza
--o-denoising-stats denoising_stats_bacteria_dada206.qza

I got only few merged sequences. Would it be possible to adjust the command to get more sequences? The amplicon region is V4V5, and I ran 2x250bp on miseq platform.

According to trimmed_seqs_bacteria.qza, the quality of reverse reads is not so good. Should I try just forward reads with qiime dada2 denoise-single command? Would it have any influence on the results?

Thanks!

denoising_stats_bacteria_dada206.qzv (1.2 MB) table_bacteria_dada206.qzv (667.2 KB) trimmed_seqs_bacteria.qzv (296.3 KB)

Mehrbod_Estaki · October 28, 2019, 11:59am

Hi @hesongbing,

The quality of your reads are actually quite good. Those big dips at the tails of your reads are probably representing reads that didn't have the primer sequences in them when you used cutadapt to remove them. Thus those few hundred reads are longer and have low quality tails.

Try running again with forward truncate of 231 and reverse truncate of 218.
This is the maximum length of reads you can afford, if you get unsatisfactory merging from this then you are right to drop your reverse reads and use forwards only. That will certainly significantly increase your reads. However, I should mention that your current # are not too bad to begin with. You can probably just stick with whatever you get out of this new run.

hesongbing · October 28, 2019, 12:23pm

Thanks for your suggestion! I will reset the parameters and run it again.

Actually, I have tried to run with forward truncate of 230 and reverse truncate of 217, but losing lots of reads after filtering. Here is part of the stats file.

If I drop reverse reads and use forward reads only? Would it have any influence on my results?

Mehrbod_Estaki · October 28, 2019, 12:27pm

Hi @hesongbing,
If you have already tried that combination then there is no point in trying mine.
At this point, like I said, you could just move forward with what you have since your lowest sample still has over 7k reads which is fairly good for most datatype.
If you do run only your forward reads (and set a safe truncating value like 200) you should see a significant increase in # of retained reads. The downside is that you lose some resolution as far as taxonomy goes but those differences are rarely ever detrimental to an experiment imo. The overall patterns of your results should stay the same. So you would be pretty safe with either approach.

hesongbing · October 28, 2019, 12:50pm

Hi @Mehrbod_Estaki

Thanks for your patience in answering my questions!

The minimum reads is only 7k, but the maximum reads reached 37k, I just afraid the results will be inaccurate if I resample it with the minimun reads.

ben · October 28, 2019, 1:23pm

I think your first run looks pretty good, yes you're losing 66% of your reads, you may be able to recover some more, but you likely are getting rid of bad quality sequences that you don't want in your samples anyway. Can iI ask you what kinds of samples are these? Are they samples with heavy eukaryotic contamination?

hesongbing · October 28, 2019, 1:30pm

Hi @ben
These are 16S of soil samples and there is almost no eukaryotic contamination.

ben · October 28, 2019, 1:32pm

Hm, thank you for the clarification, did you run a positive control with your runs? You can check if there's something wrong with the run just in case (e.g., a failure w/ the positive control would suggests there was something wrong with the # of sequences you recovered). I am unfamiliar with the V4V5 region so I am not sure if this recovery si typical for your recovery. Ben

hesongbing · October 28, 2019, 1:40pm

@ben
Sorry, which do you mean about positive control? I only trimmed primers before dada2.

ben · October 28, 2019, 1:42pm

Oh, sorry I meant a positive sample where you know the exact composition (such as a culture). We run positive controls of a mock group of bacteria where we can confirm there was nothing that went wrong with our run (upstream of the QIIME2 analysis).

This is just in case there was a problem w/ our run (familiar of primers/polymerases), etc. It's ok if you did not, I would just run the rest of the pipeline w/ your DADA2 results to see what you get.

Ben

hesongbing · October 28, 2019, 2:05pm

@ben
Thank you for your patience in answering my doubts!

I didn't run positive controls. I dealt with ITS of fungi in the same way, and the results are pretty good with minimum reads of 20k.

ben · October 28, 2019, 2:07pm

Great, I don't think you have much to worry about with the reads being filtered out by DADA2, I have similar loss in a "good" run of the 16S v3v4 region (we loss 50-60% of the sequences at the DADA2 denoise step) and the data at the end looked great (supporting our hypothesis). Ben

hesongbing · October 28, 2019, 2:12pm

@ben
Wow, how wonderful it is!
I will move forward with the results of my first run. Wish I am as lucky as you.

system · November 28, 2019, 8:17pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.