Used degenerate primers for sequencing, concerned that my reads are being confused as chimeras

SoilRotifer · September 1, 2021, 9:49pm

Given the insane degeneracy of the primer sequence I suspect this amplifies an awful lot of other things, making denoising quite difficult. That is, it may be that the generated sequence data are not necessarily orthologous sequences, i.e. the sequences are a mix of different genes. Which may negatively impact the denoising process.

Even running BLAST on these primer sequences returns no significant hits. If you remember I referred you to the following article, in this post and noted much work the authors did to QA/QC the sequence data for nifH. That is, they used a comprehensive pipeline, including HMMs etc... I did not read through it thoroughly but, it seemed like an onerous process.

I'd suggest that you do a little experiment and forgo denoising for now. That is, use the traditional OTU dereplication/clustering approach, to generate your representative sequences, then tabulate the sequences like so:

qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

Then click on the sequences within the resulting qzv file. This will take you to an BLAST form to perform a query search. I'd suggest randomly picking a few sequences and run BLAST to see how consistent, or not, the BLAST results are (i.e. are they the same gene).