demux.qzv interpretation for dada2 and/or deblur

ET1335 · October 9, 2019, 11:48pm

Hi,

I've looked around on the forum for help with interpreting an interactive quality plot from demux.qzv that is close to mine, but I haven't found any similar situations.
I'm using QIIME2 version 2019.1 through VirtualBox on a Mac.

After running the following:
qiime demux summarize --i-data et-demux-paired-end.qza --o-visualization et-demux.qzv
and looking at my demux.qzv, I found this:

(Hopefully the image upload works. If not, It's basically a bunch of dashed lines across the entire plot at quality score ~ 37, with only one boxplot at ~ 2 sequence bases.

In the table below the quality plots, all of the box plot features and percentiles have a blank instead of a quality score:

The same thing appears for my reverse reads. The sequences appear to have the same quality regardless of the number of sequence bases, so does that mean I don't have to trim or truncate anything to run dada2 and deblur?

I've already tried both analyses:
dada 2: qiime dada2 denoise-single --i-demultipexed-seqs et-demux-paired-end.qza --p-trim-left 0 --p-trunc-len 0 --o-representative-sequences et-rep-seqs-dada2.qza --o-table et-table-dada2.qza --o-denoising-stats et-stats-dada2.qza

deblur: qiime quality-filter q-score --i-demux demux-single-end.qza --o-filtered-sequences et-demux-filtered.qza --o-filter-stats et-demux-filter-stats.qza
qiime deblur denoise-16S --i-demultiplexed-seqs et-demux-filtered.qza --p-trim-length 0 --o-representative-sequences et-rep-seqs-deblur.qza --o-table et-table-deblur.qza --p-sample-stats --o-stats et-deblur-stats.qza

dada2 seemed to have worked fine, but with deblur I got this error:
Plugin error from deblur:
No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.
Debug info has been saved to /tmp/qiime2-q2cli-err-gnwzrmxj.log

head /tmp/qiime2-q2cli-err-gnwzrmxj.log gives me:
/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/deblur/workflow.py:851: UserWarning: Problem removing artifacts from file (file name)
for several of my files. These specific files involved in the error have no commonalities in terms of number of sequences (they range from 30,982 to 188,975 sequences).

Here is my frequency plot in case that helps:

Does this have something to do with my quality plot, or the fact that I'm not trimming anything? If my quality plot is correct, where should I trim it (if at all)? Thanks for your help.

Nicholas_Bokulich · October 11, 2019, 2:16pm

Hi @ET1335,
Could you please share:

the demux quality QZV
the dada2 stats QZV (use metadata tabulate to make the QZV)

Your demux quality scores look totally bizarre, and I am guessing either (a) the file is almost empty or (b) some sort of quality filtering has already been applied or (c) the Q scores are totally artificial. Any ideas?

ET1335 · October 15, 2019, 7:46pm

Hi Nicholas,

I spoke to my professor about this issue, and we've concluded that the qiime demux summarize command doesn't need to be run at all since these data are already demultiplexed. Does this sound right?

If that's the case, I'm not sure how to find an appropriate place to trim the sequences for dada2/deblur analysis. The Moving Pictures Tutorial used the demux.qzv as justification for where they trimmed, but since I may not get a demux.qzv with these data, I'm confused about what to do.

Thank you!

Nicholas_Bokulich · October 15, 2019, 10:13pm

No. qiime demux is meant to be run on demultiplexed data as a way to visualize per-base quality before downstream processing.

It is okay that your samples are already demultiplexed — you can still follow the moving pictures or other tutorials, just skip any demultiplexing steps that are shown. Start with the demux summarize step!

You did the right thing. When I asked for this file in my response to you I meant that I wanted you to upload the actual QZV instead of the images that you posted in your initial post.

ET1335 · October 17, 2019, 12:44am

Hi Nicholas,

Thank you for your response.

Here are the files you asked for (I hope they upload correctly)

et-stats-dada2.qzv (1.2 MB)
et-demux (copy).qzv (286.5 KB)

Thanks for your patience.

Nicholas_Bokulich · October 17, 2019, 1:48am

Hi @ET1335,
Thanks for sharing those files. This confirms that your sequences have been quality-filtered in some way already (prior to importing), which I would discourage if possible. Using the raw reads (prior to filtering) would be better, and could potentially lead to longer reads (your reads are quite short, probably due to the filtering).

However, if you cannot obtain the unfiltered FASTQ data that is fine — it looks like dada2 was able to work with these data anyway, and is yielding high numbers of reads. The reads are just very short!

This is because you need to hover the cursor over one of the boxes to see the percentiles. The boxes are just lines because almost all nucleotides have the same quality score at each position, so there is no distribution. This is because the reads were already filtered somehow.

Yes the way you've done it appears fine and no trimming is needed (since you have already trimmed so much!) but I would personally seek out the raw fastq data.

Good luck!

ET1335 · October 17, 2019, 2:09am

Thank you so much Nicholas! I'll do my best!

system · November 17, 2019, 8:11am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.