Illumina data in DADA2: sequencing errors

Cybele_C · August 27, 2018, 3:43pm

Hello - I'd appreciate help from anyone who has done a similar project!

My Illumina data is underclustered, according to the sequencing facility. PF values were very low, and I'm still trying to understand the reference sequence data.

In the rep-seqs.qvz, everything was filtered out except two reads, both E. coli.

This obviously could be an issue with my products, but the technician who runs the sequencing facility told me after the fact that she doesn't trust the primers I used, even though they are the EMP standard universal primers (based, of course, on E. coli originally).

She experimented with my sample by underloading the amount, resulting in very low PF values, despite a QC run. I need to convince her to re-do this, somehow (I'm doing a master's and all the grant money went into this!)

In any case, I feel that I don't have a grasp of what the ref-seqs mean for my data. My left/right trip parameters were 13, and the p-trunc were 250

Thanks in advance for any ideas!

lca123 · August 28, 2018, 6:07pm

Just for you to have an idea, that is a QS plot from some samples I have (with no QC filtering). It is not that good and not that bad =) but for the Forward reads the quality is way better than the RW reads, as it often is for 16S PE Illumina sequencing.

lca123 · August 28, 2018, 6:08pm

That is indeed a low quality run in terms of Reads Passing filter, density's standard deviation and QC/base.
(Assuming they are 16S reads) I would filter them considering an average quality/read (instead of per base) for the RW and FW. Next, I would merge them and get this output. Probably not so many reads... Them, import them on QIIME2 and pick the sequence variants with posterior taxonomic classification. But I don't think you are retrieving many reads, based on the QS graphs you've posted.
Cheers

Mehrbod_Estaki · August 28, 2018, 9:06pm

Hi @Cybele_C,
That's too bad about the run! The EMP primers are widely used without this type of issue so I would be curious as why the technician thinks the primers are to blame for this? To me it just seems like a case of improper library normalization/loading.
Besides, from everything I've glimpsed through in this Illumina technical document, it looks like to be an issue of over-clustering and not under...but perhaps I haven't read this properly enough. Technicians tend to know these details pretty well, so I don't mean to discredit them either! The quality plots you've provided certainly look troubling and as @lca123's plot shows they are not what we usually see. (Thanks for the suggestions btw @lca123!). That being said though, I don't think they are unusable either. It looks as though most of them still have median quality scores above 20 so we should hopefully be able to retain enough reads to analyse your data anyways, though we might have to fine-tune your filtering and truncating parameters. If I'm interpreting the basespace tables correctly you have 2.7 Mil reads for 6 samples, hopefully we can rescue some of those! But just to confirm, the demux output you showed is of your raw reads before any other QC/trimming right?

I believe you're referring to rep-seqs.qza right? This is a list of unique features that was detected from all of your reads after denoising/clustering. This holds no information regarding the frequency of these features but rather just their identity and is used downstream to create phylogenetic trees and assigning taxonomy.

Give your situation I propose starting from the bottom and using very inclusive parameters to retain maximum number of reads and moving up. For example re-run your denoising step on your forward reads only and set your truncating parameter to 100bp. Check to see how many reads/features you were able to retain there. If its good, you can try and increase that by say another 25-50bp. If you're still able to retain a good chuck of your reads/features then you can try using both forward and reverse reads and merging them (but still truncate as much as of the 3' tail as you can).
What are the samples here?
And in the future, if possible, please upload the actual qiime artifacts instead of images as those give us much more information to help with the troubleshooting.
let us know how this goes.

system · September 29, 2018, 3:08am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.