Interactive plot wrong after qiime vsearch join-pairs

pbmail01 · August 24, 2018, 5:42pm

The interactive plots after qiime tools import and qiime demux summarize look normal. The interactive plot after qiime vsearch join-pairs looks wrong.

Seems like the process only used the forward reads. What caused this and how can I fix it?
I used Qiime 2017.12

This happens with the mockrobiota mock_12 sample: mockrobiota/data/mock-12 at master · caporaso-lab/mockrobiota · GitHub

See scripts used and the plots created in attached Word doc
Problem after Vsearch join-pairs.zip (305.1 KB)

Nicholas_Bokulich · August 27, 2018, 1:24pm

Hi @pbmail01,
Is the issue that the quality score plot looks funny and q-scores are unreasonably high? This is actually normal. See here:

In the overlap region the q-scores will increase considerably due to the increased certainty of the base call from two reads. The scores are calculated as in USEARCH and described in Edgar & Flyvbjerg (2015) doi:10.1093/bioinformatics/btv401.

What makes you think that? These are V4 reads, so the length looks about right (though the actual length distribution is not shown in the screenshot you shared).

Your version is nearly a year old! Please install the latest version and use that — if this is in fact a bug we will need to start with the latest version for providing support. Thanks!

Let me know if that clears things up for you!

pbmail01 · August 28, 2018, 12:52pm

Thank you for your reply.
As you can see after position 245 or so it says in red for all positions, that the plot at that position was generated using a random sampling of 1 out of many more sequences. So when selecting the trim length for Deblur for this sequence, what would you recommend I pick?

pbmail01 · August 28, 2018, 10:44pm

Here is the result with Qiime 2018.6. I have looked everywhere and cannot figure out based on this plot what length to use to trim when denoising with deblur. I would be very happy for an answer to: What length do I use to trim with deblur and why?

Nicholas_Bokulich · September 5, 2018, 3:54pm

Hi @pbmail01,
Sorry for the delayed reply.

It looks like the reads are probably joining correctly, but that most joined reads are ~245 nt long. Those other sequences appear to be (one or more) outliers that have longer joined lengths. So I still don't think the quality plot is wrong, it just looks bizarre.

I would recommend truncating at 245, because that is the joined length of all but a handful of sequences. Longer would cause the 245-nt-long seqs to be dropped, and shorter would reduce the amount of information in your sequences.

I hope that helps!

pbmail01 · September 27, 2018, 11:00pm

Thank you! That answers all my questions. And sorry for the late reply.