DADA2 denoise paired result - 90% loss in reads

codea · September 26, 2018, 12:24pm

Hi all.

I have run the dada2 denoise-paired script to find that ~90% of my reads have been lost. Im using the V1V3 16s Region (27F, 519R).

The denoising stats log file has shown me the following reads:

sample-id input filtered denoised merged non-chimeric
#q2:types numeric numeric numeric numeric numeric
cootharaba 298145 219309 219309 36137 26068
elandapoint 475098 350153 350153 73133 47619
marycairncross 325176 246113 246113 38025 23889
palmercoolumresort 438216 318817 318817 107108 75149
twinwaters 172007 114811 114811 20359 13407
usc-BPHLF 254296 185189 185189 22404 18021

The script i used:

$ qiime dada2 denoise-paired
–i-demultiplexed-seqs manifestout.qza
–o-table dada2-table.qzv
–o-representative-sequences dada2.qza
–o-denoising-stats stats.log
–p-trim-left-f 20
–p-trim-left-r 20
–p-trunc-len-f 280
–p-trunc-len-r 260
–p-n-threads 0
–verbose

codea · September 26, 2018, 12:24pm

Hi all.

I have run the following script to find that ~90% of my reads have been removed. my data is run on the V1V3 region (27F, 519R):

qiime dada2 denoise-paired
--i-demultiplexed-seqs manifestout.qza
--o-table dada2-table.qzv
--o-representative-sequences dada2.qza
--o-denoising-stats stats.log
--p-trim-left-f 20
--p-trim-left-r 20
--p-trunc-len-f 280
--p-trunc-len-r 260 stats.log.qza (7.8 KB)

--p-n-threads 0
--verbose

my stats.log file showed me the following reads:

sample-id input filtered denoised merged nonchimeric
#q2:types numeric numeric numeric numeric numeric
cootharaba 298145 219309 219309 36137 26068
elandapoint 475098 350153 350153 73133 47619
marycairncross 325176 246113 246113 38025 23889
palmercoolumresort 438216 318817 318817 107108 75149
twinwaters 172007 114811 114811 20359 13407
usc-BPHLF 254296 185189 185189 22404 18021

Our worry is that this loss of reads will display a bias within the community results. I have been told this is common of the V1V3 region.
Would the forum recommend that parameters be further optimised, or should the forward read be analysed only?

My end result for this project is to compare multiple variable regions (3).

Thanks in advance for your help.

ebolyen · September 27, 2018, 10:52pm

Hey @codea,

It looks like most of your reads are droping out because they fail to merge. Your trunc lengths are already pretty generous so I really doubt you will be able to merge any better by increasing those numbers.

I assume this is an Illumina 2x300 run?

It is completely fine to analyze only the forward reads and in this case, that’s likely your best option.

codea · September 28, 2018, 3:44am

Thank you , i will try analysing the forward reads only using the " qiime dada2 denoise-paired " plugin to see if i can get a better result.

I will update on this progress.

Thanks again.

codea · September 28, 2018, 4:04am

Touching on this, would you know of any research papers that acknowledge this process? im having a hard time finding some.

Thanks in advance.

ebolyen · October 5, 2018, 6:35pm

I’m afraid not, but maybe others do? Conceptually its the same as using 2x150 vs 2x300 sequencing, you just have less or more information about the amplicon. The results aren’t invalid, just perhaps more limited than in an ideal circumstance.

codea · October 7, 2018, 1:01am

Thanks for your help. I ran these seperately, finding over 70% recovery of reads.

system · November 7, 2018, 7:01am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.