Novaseq data - demux plot

roberto1 · April 15, 2020, 2:18pm

Hello, I'm new to qiime2 so bear with me!

My data is 2 x 150 bp demultiplexed paired-end Novaseq which has already been filtered by the sequencing centre: reads containing PhiX control signals were removed and reads containing (partial) adapters were clipped (up to a minimum read length of 50 bp) by the sequencing centre.

Due to the pre-filtering, it has left approx 25% of the reads between 50bp - 135bp.

Additionally, after searching the forum, the demux plot of my data is similar to others with Novaseq data Odd display of demux plot, Interpretation of demux.qzv, where the quality scores have been binned at 2, 11, 25, 37.

The issue has also been raised here, with benjjneb recommending enforced monotonicity (if using DADA2 denoising). I'm also aware that DADA2 doesn't have an "official “production” solution for NovaSeq data yet". Thus, I am wondering what the best approach is for denoising.......

Regarding the variation in read length, what's the best approach? Do I set the truncation f/r parameters at at 0 (because the overall quality is good), but then DADA2 requires pretty much uniform reads right? What about Filtering out ASVs from DADA2 based on length - #4 by thermokarst?
Regarding the quality filter, is the best approach the enforced monotonicity via R and then import the DADA2 data into qiime2 with Importing dada2 and Phyloseq objects to QIIME 2 ?
Would deblur be best suited for my data? (sorry you must always get asked this question!)

Thank you in advance .

Robert

jwdebelius · April 15, 2020, 3:51pm

Hi @roberto1,

Looking at your demultiplexed data, it appears to me that you that you may have already had some quality filtering applied. (I have worked less wiht Novaseq, but this doesn't look the way I'd expect an Illumina quality profile to look normally). Based on that assumption, I would recommend using deblur. I'm not sure it's been benchmarked for NovaSeq, either, which is a challenge.

Best,
Justine

jwdebelius · April 15, 2020, 10:45pm

Okay, so I just got an update from the brilliant @Nicholas_Bokulich. Novaseq does has changed their error modeling/Phred score between the MiSeq/HiSeq and Novaseq, compressing that error space. I still think Deblur is probably an easier solution than shoe-horning DADA2, but it looks like you've done a fair bit of background reading already if you want to do it in R and import into QIIME.

Sorry I dont have better advice.

Best,
Justine

roberto1 · April 16, 2020, 2:45pm

Hi @jwdebelius. Ok fantastic! Thanks for your help. I will push ahead with deblur and maybe try the DADA2 approach as well.

Any suggestion on how to deal with the variation in amplicon read length in terms of truncation parameters? Would it be best to set p-trunc-len at 0 (good quality overall) for both forward and reverse in order to include shorter reads which have adapters/Phix clipped?

jwdebelius · April 16, 2020, 3:19pm

Hi @roberto1,

I think it's reasonable to try for Dada2. Run it that way, and see how the data looks. If you lose a bunch of reads in quality filtering/denoising, maybe consider a shorter length.

For deblur, you need to set a truncation length because its part of the algorithm.

Best,
Justine

roberto1 · April 16, 2020, 4:07pm

Amazing. Thanks again for your help! Fingers crossed it works out

I will post the results if they're somewhat conclusive.

system · May 17, 2020, 10:07pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.