Hello,
Long-time user, first-time poster.
I'm working with a non-16S dataset for which I have developed a reference database and the associated taxonomic information. Both of these have been imported as artifact files using the --type 'FeatureData[Sequence]' and 'FeatureData[Taxonomy]' options, and have been used successfully in training a naive-Bayes classifier.
However, I'm also interested in using the vsearch method in order to reduce the stringency of my taxonomic assignment and to compare the classification methods.
I have used the following command:
qiime feature-classifier classify-consensus-vsearch
--i-query cpn60_rep_seqs.qza
--i-reference-reads cpn60_refseqs_final.qza
--i-reference-taxonomy cpn60_taxonomy_final.qza
--p-maxaccepts 5
--p-perc-identity 0.97
--p-top-hits-only TRUE
--p-threads 2
--o-classification cpn60_vsearch_taxonomy.qza
--verbose
and am receiving the following error:
"Command: vsearch --usearch_global /tmp/qiime2-archive-p854sa1k/f9468e64-1197-4218-b50e-dd9b7bbc0e28/data/dna-sequences.fasta --id 0.97 --query_cov 0.8 --strand both --maxaccepts 5 --maxrejects 0 --output_no_hits --db /tmp/qiime2-archive-ej5n_loa/0b738b91-7d24-4e29-903c-c8c793595dff/data/dna-sequences.fasta --threads 2 --top_hits_only --blast6out /tmp/tmp4kjexb30
vsearch v2.7.0_linux_x86_64, 125.9GB RAM, 16 cores
GitHub - torognes/vsearch: Versatile open-source tool for microbiome analysis
Reading file /tmp/qiime2-archive-ej5n_loa/0b738b91-7d24-4e29-903c-c8c793595dff/data/dna-sequences.fasta 0%
Fatal error: Invalid FASTA - header must be terminated with newline"
I have validated the 'FeatureData[Sequence]' file using the qiime tools validate command and have double checked for correct formatting. This file did work previously using the naïve-Bayes classifier, so I'm not sure why its bad now.
The fasta file used to generate the FeatureData[Sequence] artifact contains sequences in the following format (with a > in the front of each seq_id):
seq_id
seq_info
seq_id2
seq_info
Any recommendations would be appreciated! Thank you.