I have been working with a set of environmental samples that we submitted to a sequencing facility for Illumina MiSeq sequencing. We sequenced the V4 region using 515F and 806R primers. Since receiving the dataset, we have run into a number of odd things in the dataset.
- The forward and reverse primers are only present in about 72% of sequences (searched for in BioStrings), which forces us to remove a large portion of the sequences
- Several of the R2 reads have ambiguous base calls at around 22 bp, right after the reverse primer from 1-20. These same reads also seem to have high homoplasy. When I work through the mothur workflow to screen sequences (with screen.seqs command), it removes about half of my sequences when max ambiguity = 0 and max homoplasy = 8.
We confirmed with the sequencing facility that the forward and reverse primers should still be present, but we are unsure how to troubleshoot why only 72% of sequences have primers. Additionally, it is unclear at what step in the process caused several R2 reads to have this ambiguous base. We are thinking it might have something to do with library prep, but I have not been able to find information on the internet that indicates what exactly causes this.