Hi again,
I am new to Qiime2 and working on matching my OTUs to my reference sequences. I am working with 12S sequence data from fish, and I don’t think there’s a database like Greengenes for that, so I had to create my own custom reference sequences and reference taxonomy.
Importing the reference database and taxonomy went fine. Here are the commands I used:
qiime tools import --input-path nrdatabase_20180109_qiime2.fasta --output-path referenceseqs.qza --type ‘FeatureData[Sequence]’
qiime tools import --type FeatureData[Taxonomy] --source-format TSVTaxonomyFormat --input-path qiime2taxonomy2.txt --output-path referencetaxonomy.qza
I then tried to match my OTUs to my reference sequences using this command:
qiime feature-classifier classify-consensus-blast --i-query Teleo_OTUs_97_sequence.qza --i-reference-reads referenceseqs.qza --i-reference-taxonomy referencetaxonomy.qza --p-maxaccepts 10 --p-perc-identity 0.9 --o-classification Teleo_OTUs_97_classifications --verbose
And this is the output I got:
Command: blastn -query /tmp/qiime2-archive-f7b9g7dh/9190d430-896d-4c8a-b296-3101a3bdf254/data/dna-sequences.fasta -evalue 0.001 -strand both -outfmt 7 -subject /tmp/qiime2-archive-kygwgb7f/c79a9c41-5b87-46ec-93bc-189fe66065e2/data/dna-sequences.fasta -perc_identity 90.0 -max_target_seqs 10 -out /tmp/tmptqv7y665
Plugin error from feature-classifier:
‘Identifier NC_020760 was reported in taxonomic search results, but was not present in the reference taxonomy.’
Because this is a custom database for only our species of interest, the files are quite small, and I was able to look through them manually and confirm that this identifier is present in both the reference sequences and reference taxonomy. The labels match exactly (NC_020760 Coregonus_nasus), and there are other sequences with underscores and spaces in the names that don’t seem to be causing problems, so I am not sure what I’ve done wrong.
Any ideas how I can make this work?
Thanks!
Erin