Regarding sequence read and DADA2

silee0621 · May 8, 2018, 11:56pm

Hi, all.
Recently, I ran into some problems during MiSeq run and I was hoping to get some input from more experienced people to save my data.

Below is visualization file after the data import. I used,

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path Name of folder
--source-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux.qza

and

qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv

which gave me

The MiSeq was ran with demultiplexing option so I skipped the demultiplex step in the tutorial.
My question here is, what 'trunc-len value' would you use in DADA2 step when R2 read is all over the place like this?

My second question is in a similar context of saving these files even R2 read was bad.
I understand that I can check out raw sequences of R1 and R2 by using 'more xxx.fastq' prompt in the terminal but where can I find the sequences that is joined?
My thought was that knowing my index, adapter sequences, if I can compare sequence read in R1, R2 with joined sequence, I can kind of get an idea how much sequences were lost.

In Qiime1 there were join pair step which allowed me to investigate the joined sequences but I can't seems to find one in qiime2. Perhaps the R1 and R2 are joined during import?

Lastly, I have total reads, PF reads and demultiplexed sequence counts of 43,659,742, 13,505,280 and 5,434,962 respectively. Do you think its the bad R2 read that caused significant number drop?

39%20PM1642×234 18.5 KB

Please help!

Nicholas_Bokulich · May 10, 2018, 5:15pm

I would personally scrap the reverse reads and proceed with only the forward reads as though it were single-end data.

post-denoising I am not sure if there's an easy way to match these up. You could join paired ends with qiime vsearch join-pairs and then look at reads — but I'm not sure that does what you want.

qiime vsearch join-pairs. Use that prior to deblur or OTU picking, but NOT with dada2 (which joins pairs after denoising).

Probably. I would just scrap the reverse reads... that quality profile does not look good and I would be really suspicious of using those data.

I hope that helps!

silee0621 · May 11, 2018, 11:57pm

Thank you so much for your reply!

I tried the join-pairs command and it seems quite straight forward. quality score up until 250bp was good but from 250 to 400bp was all over the place.
After that, I ran DADA2 with trunc-len of 240 and 400 just to compare each other and as expected, DADA2 with trunc-len value of 400 wasn't even able to run properly (error:'No reads passed the filter').
I will just use the first 250bp for the analysis as you suggested.
By the way, do you know what might caused the bad R2 read? I was thinking maybe the initial concentration of library was too high?

I really appreciate for your help and hope this thread help other too.

Thank you!

Nicholas_Bokulich · May 14, 2018, 2:26pm

I am not really sure. I'd recommend getting in touch with your sequencing center — they may have advice, or might even offer to re-run those samples.

Others on this forum may also have some idea what could cause such high error rates — those reverse reads are exceptionally noisy!