I keep running into an issue with bad taxonomic assignment using a primer pair designed to target COI sequences in fish where a large number of sequences are getting classified as 'cattle' DNA. Based on the environment I sampled it from, I should not be getting any sequences with such taxonomy, which leads me to believe there is an issue with the processing of my sequences within qiime2 or in the laboratory. I can summarize the causes for improperly assigned taxonomy into two groups: 1) Improper use of classifier for query sequences (Example) or 2) result of non-biological sequences (artifacts). However, I wonder if primer design has a part to play in it, or if I ran my samples for many PCR cycles (I used n = 35 cycles).
What can I do to improve the classification observed in my case? Are there any areas where I could improve sequence quality filtration?
Details:
- I am running the latest version of QIIME in conda.
- I first classified the sequences against a reference COI sequence database I designed for the individuals I am interested in (composed of fish sequences), and then I filtered sequences with 'Unassigned' taxonomy and used a larger COI database (> 1 millions sequences) to BLAST them against.
- I ran the following commands:
qiime tools import
--type 'SampleData[SequencesWithQuality]'
--input-path
--output-path
--input-format SingleEndFastqManifestPhred33qiime cutadapt trim-single
--i-demultiplexed-sequences
--p-front primer sequence (5' forward)
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--o-trimmed-sequencesqiime dada2 denoise-single
--i-demultiplexed-seqs
--p-trim-left 0
--p-trunc-len 234 (based on results above, maintaining median of 25 Phred score)
--o-table
--o-representative-sequences
--o-denoising-statsqiime feature-classifier classify-consensus-blast
--i-query
--i-reference-reads
--i-reference-taxonomy
--p-perc-identity 0.95 (but also tried 0.99 and 1.00)
--o-classification