I am currently analyzing oldish Illumina data of ITS2 sequences prepared via a two-step protocol with ITS3 and ITS4 primers. I've noticed a peculiar case of 'missing primers in sequencing data' and am wondering if anyone has come across a similar issue. The reason this is in general discussion is because I'm not having any trouble using qiime or understanding qiime outputs... the data is just troubling.
Here is the situation:
- My sequencing facility does not mix forward and reverse fragments when sequencing; i.e., forward fragments are in R1, reverse in R2.
- The 2 x 250bp Miseq run seemed completely fine from a sequencing quality standpoint. I ran ITS samples with complementary 16S samples.
- I've run
q2-cutadapta few times playing mostly with
--p-overlap. The critical parameter to this situation is
--p-discard-untrimmed. Long story short, removing primers from my 16S sequences proceeded normally and I retained most of my reads for all my samples. For my ITS sequences I retained anywhere from 10-90% of sequences independent of
cutadaptsettings, i.e. there was a huge range of sequences retained regardless of any parameters.
- Upon further investigation, I found that several sequences in my forward ITS samples did not contain primers; interestingly enough, the reverse sequences did. Just for clarification, there were no partial matches, substitutions, insertions, deletions that suggested the bases at the beginning of the afflicted reads were remotely related to the primer.
- The plot thickens. The primerless sequences contained the reverse complement of my reverse primer, were a portion of the ITS region I was sequencing, and corresponded (98-100% identity) to expected taxa. Thus, I have no reason to believe these sequences are contaminants.
- Below is an alignment of the situation:
Essentially, the primerless fragments I am getting begin downstream of my actual primer, and as previously mentioned, end at my reverse primer. I will also note that 'my example sequence' is 250 bp, as expected; if I had longer reads, I would expect the fragments to contain readthrough of my reverse primer, indices, and adapter sequences.
- Maybe PCR artifacts, but there is no portion of the forward primer and the distribution of my fragment lengths was as expected - no large proportion of sequences too large or too small.
- They may be chimeras, but the sequence corresponds to an organism I expect to be there and is extremely similar to the reference sequence.
- They may be an artifact of bridge amplification gone wrong, but the overall quality of the run was fine.
- They may be true fragments but the read 1 sequencing primer misprimed? Seems extremely unlikely but plausible. Is this even possible? Has this happened to anyone else?
Any thoughts on this are more than welcome.