Strange quality patterns from seqmatic data that does not match FastQC quality scores

Hello, two of my students have data sets that they are re-analyzing in qiime2. These 16S rRNA data sets (515f/806r) were generated by the company Seqmatic. Using FastQC, their quality appears fine for each sample. However, when they import and generate the demux.qvz, they end up with some poor quality scores throughout the forward read (I almost wondered if the sequence was backwards). I've attached the .qvz's

Any thoughts?

Note: if we go forward with analyses in qiime2, taxa look reasonable and match what we previously had using qiime 1.

demux_view_deworm.qzv (279.8 KB)

demux_view_gi.qzv (280.0 KB)

Hi @jessicalmetcalf, Thanks for sharing these. I can see why you thought the reads might be backwards - have you ruled that out? Note that we do see lower quality at the beginning and ends of reads, but usually not quite this extreme at the beginning (in my experience).

The plots that are generated by qiime demux summarize are based on looking at a random 10,000 sequences in input. It’s possible, though unlikely, that the plots would look different if they were based on a larger sample of sequences. You could have your students try increasing the default value provided for --p-n to qiime demux summarize, possibly to 100,000. If those do look considerably different though, please let us know.

Could you post screenshots or zip files of the quality score distributions by sequence position that you’re generating with FastQC? That would help me to figure out what might be different.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.