I keep running into an issue with bad taxonomic assignment using a primer pair designed to target COI sequences in fish where a large number of sequences are getting classified as ‘cattle’ DNA. Based on the environment I sampled it from, I should not be getting any sequences with such taxonomy, which leads me to believe there is an issue with the processing of my sequences within qiime2 or in the laboratory. I can summarize the causes for improperly assigned taxonomy into two groups: 1) Improper use of classifier for query sequences (Example) or 2) result of non-biological sequences (artifacts). However, I wonder if primer design has a part to play in it, or if I ran my samples for many PCR cycles (I used n = 35 cycles).
What can I do to improve the classification observed in my case? Are there any areas where I could improve sequence quality filtration?
Details:
- I am running the latest version of QIIME in conda.
- I first classified the sequences against a reference COI sequence database I designed for the individuals I am interested in (composed of fish sequences), and then I filtered sequences with ‘Unassigned’ taxonomy and used a larger COI database (> 1 millions sequences) to BLAST them against.
- I ran the following commands:
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path
–output-path
–input-format SingleEndFastqManifestPhred33qiime cutadapt trim-single
–i-demultiplexed-sequences
–p-front primer sequence (5’ forward)
–p-match-read-wildcards
–p-match-adapter-wildcards
–p-discard-untrimmed
–o-trimmed-sequencesqiime dada2 denoise-single
–i-demultiplexed-seqs
–p-trim-left 0
–p-trunc-len 234 (based on results above, maintaining median of 25 Phred score)
–o-table
–o-representative-sequences
–o-denoising-statsqiime feature-classifier classify-consensus-blast
–i-query
–i-reference-reads
–i-reference-taxonomy
–p-perc-identity 0.95 (but also tried 0.99 and 1.00)
–o-classification