demux view generates weird paired-end graph

afinaa · April 7, 2021, 9:15am

HI all,

We run analysis on amplicon quite frequently but there is one dataset that provides weird looking (to us) graph when we run demux view as below.

rawdata.qzv (313.7 KB)

We noticed that for reverse reads, from 50-110 bp the graph looks not normal?
Does this mean that the quality is actually not good?

What we did to the raw data before we imported into QIIME2, we did primer and quality trimming using bbduk. Our previous analysis that used the same workflow did not have problem like this so I doubt it was because of the bbduk trimming.

Appreciate any thoughts about this.
Thank you.

colinbrislawn · April 8, 2021, 2:28am

Hello @afinaa

Thanks for posting that graph

Yeah, read 2 looks pretty strange!

Thanks for mentioning your preprocessing steps. I don't think bbduk would cause changes to quality score like that, but you could import the raw data just to double check that the strange quality scores are already inside read 2, and not created by bbduk.

If bbduk is causing problems, don't worry! There are Qiime 2 plugins like cutadapt that can remove primers and do quality trimming all from within the Qiime 2 ecosystem :qiime2:

Let us know what you try next!
Colin

afinaa · April 8, 2021, 9:27am

Hi @colinbrislawn ,

Thank you for the reply.

I imported the rawdata without any preprocessing and seems like the graph for read 2 is still not normal.

rawdata1.qzv (313.6 KB)

But we have also run cutadapt to the raw data but the read 2 graph still looks strange.

cutadapt-trim.qzv (319.1 KB)

Any idea what might caused this? Could it be from the sequencing run?

colinbrislawn · April 8, 2021, 1:00pm

Hello again,

If it's in the raw data, then it must be from the sequencing run itself.

You could try processing this with dada2 and see if the denoising process can help correct for these low quality regions, but if that does not work, it might be best to just use the forward run as the quality is higher.

Colin

afinaa · April 13, 2021, 6:07am

I ran DADA2 on the sample regardless of the quality and from the stats, looks like many samples are filtered out and lesser reads are considered as non chimeric. Is this right?

dada2-stats.qzv (1.2 MB)

We are still thinking if we should proceed with only forward reads.
But may I know what is the significance/implication(?) of this? We know that the region (V3-V4) is around 460 bp but with only forward read, it will consider only half of it: 230 bp?

Appreciate if you can share me your thought on this.

colinbrislawn · April 13, 2021, 3:53pm

Yeah, only 2 samples made it through processing, and in those, most reads failed to pass filter or join.

Good idea!

That's right. And it could be shorter if you trim off the low quality end of read 1.

The largest one I can think of is taxonomy classification. Long reads have more information and can resolve similar taxonomy. Shorter reads might not be able to get down to the family, genus, species level.

On the other hand, taxonomy would also suffer with low-quality long reads and a larger number of high-quality reads will give you deeper coverage of your samples!

Quality over quantity!

afinaa · April 14, 2021, 1:10am

Thank you so much @colinbrislawn for this insight! I will take this into consideration. Would be better to proceed with only forward read if it can help to provide better taxa classification for our samples.

Btw, I just noticed I made a mistake here.

We only did primer trimming because we read in the forum DADA2 will do error correction during its process and it already includes quality trimming by default which is --p-trunc-q 2, is this correct?

For this case, should we do quality trimming prior to DADA2 either using bbduk or cutadapt?
Can you help to advise?

Thanks again!

colinbrislawn · April 14, 2021, 1:17pm

Yes... and q2-dada2 does other things too as part of it's pipeline! Take a look at all the options for the plugin.

It depends. I would suggest to do the quality trimming within the dada2 plugin, but that might not work for all data sets. The goal it to remove primers and barcodes that could mess up denoising, and there's a lot of ways to do that. In some sequencing methods, the reads don't contain adapters at all, so this is not needed.

Try it and see!

Colin

system · May 15, 2021, 7:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.