I keep running into an issue with bad taxonomic assignment using a primer pair designed to target COI sequences in fish where a large number of sequences are getting classified as ‘cattle’ DNA. Based on the environment I sampled it from, I should not be getting any sequences with such taxonomy, which leads me to believe there is an issue with the processing of my sequences within qiime2 or in the laboratory. I can summarize the causes for improperly assigned taxonomy into two groups: 1) Improper use of classifier for query sequences (Example) or 2) result of non-biological sequences (artifacts). However, I wonder if primer design has a part to play in it, or if I ran my samples for many PCR cycles (I used n = 35 cycles).
What can I do to improve the classification observed in my case? Are there any areas where I could improve sequence quality filtration?
- I am running the latest version of QIIME in conda.
- I first classified the sequences against a reference COI sequence database I designed for the individuals I am interested in (composed of fish sequences), and then I filtered sequences with ‘Unassigned’ taxonomy and used a larger COI database (> 1 millions sequences) to BLAST them against.
- I ran the following commands:
qiime tools import
qiime cutadapt trim-single
–p-front primer sequence (5’ forward)
qiime dada2 denoise-single
–p-trunc-len 234 (based on results above, maintaining median of 25 Phred score)
qiime feature-classifier classify-consensus-blast
–p-perc-identity 0.95 (but also tried 0.99 and 1.00)