I'm combining reference sequences generated with Sanger sequencing with those downloaded by rescript, and am getting the error that one of the sequences present in the reference sequences is not present in the taxonomy.
Error:
'Identifier FISH013_12S_Luxilus_coccogenis_R was reported in taxonomic search results, but was not present in the reference taxonomy.'
When running this command:
qiime feature-classifier classify-consensus-blast \
--i-query r1-results/r1-rep-seqs-dada.qza \
--i-reference-taxonomy 12s-fish-refseq-sanger-filtered-merged-taxonomy.qza \
--i-reference-reads 12s-fish-refseq-sanger-filtered-merged-seqs.qza \
--o-classification r1-results/12s-r1-blasted-taxonomy.qza
12s-fish-refseq-sanger-filtered-merged-taxonomy.qza
is the result of merging a headerless taxonomy (attached) with the taxonomy downloaded from NCBI using rescript
.
12s-fish-refseq-sanger-filtered-merged-seqs.qza
is the results of merging the Sanger fasta (attached, yes I know it's poor quality sequencing...) with the reference sequences downloaded from NCBI using rescript
.
grep
reveals that FISH013_12S_Luxilus_coccogenis_R
is present in both taxonomy and sequences
unzip -c 12s-fish-refseq-sanger-filtered-merged-taxonomy.qza | grep 'FISH013_12S_Luxilus_coccogenis_R'
>FISH013_12S_Luxilus_coccogenis_R k__Animalia;p__Chordata;c__Actinopterygii;o__Cypriniformes;f__Cyprinidae;g__Luxilus;s__coccogenis;
unzip -c 12s-fish-refseq-sanger-filtered-merged-seqs.qza | grep -A 1 'FISH013_12S_Luxilus_coccogenis_R'
>FISH013_12S_Luxilus_coccogenis_R
TAGGTAACTTTATTACATTTCGACAGGGGAGAGTGACGGGCGGTGTGTACGCGCCTCAGAGCCGGGTTCAAAAGGACACGCTGTTTCCTTTTTACTACTAAATCCTCCTTCAAGCACTATTTCATGTTGCATATCCGTAGTGTTCTATAATAGAAAATGTAGCCCATTTCTTCCCGCTCCGTACGCTACACCTCGACCTGACGTTCTGGGCTGTGCCCATTTTGCTTACTCTTATTACCTTCACAGGGTAAGCTGACGACGGCGGNATATAGGCAN
Comparing the file names in the taxonomy file to the headers in the fasta file using setdiff
in R
shows no differences either. Both import to QIIME without a problem.
Per other searches, I've tried dos2unix
, which did not help. The taxonomy and fasta files were made on a mac. I assume it's a formatting problem with tabs or returns, but I'm not sure where to start. FISH013_12S_Luxilus_coccogenis_R is in the middle of the files, so some sequences seem to match just fine...
I'm running Qiime2 2022.2 on Ubuntu 2024.4. (I'm using the older version of Qiime because it was the last version that I could get running with my older chipset, Intel Xeon E7.)
12s_sanger_taxonomy.txt (13.5 KB)
12s_sanger_reference_seqs.txt (31.5 KB)