Error message: Unable to read from file

Hi,

I am running qiime2 version 2021.2 in a conda environment

I am trying to run feature-classifier for taxonomic assignments of ASVs created with DADA2 in qiime2. The reference data files imported nicely.

This is the command with the error message:

(qiime2-2021.2) Ingas-iMac:DADA2 inga$ qiime feature-classifier classify-consensus-vsearch --i-query denoiseSeqs180.qza --i-reference-reads Refseq.qza --i-reference-taxonomy Reftax.qza --o-classification Class-DADA2-180-98.qza --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-5z_u0uu4/e1b2b1aa-ee63-4dc2-86fc-1873aa55fb2d/data/dna-sequences.fasta --id 0.8 --query_cov 0.8 --strand both --maxaccepts 10 --maxrejects 0 --db /var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-r3mibele/7794c726-a931-4ff8-8756-b8909fcbde17/data/dna-sequences.fasta --threads 1 --output_no_hits --blast6out /var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/tmpib716o0u

vsearch v2.7.0_macos_x86_64, 8.0GB RAM, 4 cores

Unable to read from file (/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-r3mibele/7794c726-a931-4ff8-8756-b8909fcbde17/data/dna-sequences.fasta)
Traceback (most recent call last):
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in classify_consensus_vsearch
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_vsearch.py", line 64, in classify_consensus_vsearch
unassignable_label=unassignable_label)
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_consensus_assignment.py", line 28, in _consensus_assignments
_run_command(cmd)
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_feature_classifier/_consensus_assignment.py", line 74, in _run_command
subprocess.run(cmd, check=True)
File "/Users/inga/miniconda3/envs/qiime2-2021.2/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--usearch_global', '/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-5z_u0uu4/e1b2b1aa-ee63-4dc2-86fc-1873aa55fb2d/data/dna-sequences.fasta', '--id', '0.8', '--query_cov', '0.8', '--strand', 'both', '--maxaccepts', '10', '--maxrejects', '0', '--db', '/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-r3mibele/7794c726-a931-4ff8-8756-b8909fcbde17/data/dna-sequences.fasta', '--threads', '1', '--output_no_hits', '--blast6out', '/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/tmpib716o0u']' returned non-zero exit status 1.

Plugin error from feature-classifier:

Command '['vsearch', '--usearch_global', '/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-5z_u0uu4/e1b2b1aa-ee63-4dc2-86fc-1873aa55fb2d/data/dna-sequences.fasta', '--id', '0.8', '--query_cov', '0.8', '--strand', 'both', '--maxaccepts', '10', '--maxrejects', '0', '--db', '/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-archive-r3mibele/7794c726-a931-4ff8-8756-b8909fcbde17/data/dna-sequences.fasta', '--threads', '1', '--output_no_hits', '--blast6out', '/var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/tmpib716o0u']' returned non-zero exit status 1.

See above for debug info.

The exported dna-sequences.fasta of --i-query denoiseSeqs180.qza file looks like you would expect a fasta file to look like like:

85ecdf4a3e2813dc5239fdf9d759e5aa
CGCAGCCTGCTAAATAATCACAACAATGATTTTTCATTGCTGATGGTTTCTTAGAGGGACATGTAGTATAAAACTACAGGAAGATTGCGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGAACCGCACGCGTGTTACACTGACGCAATCAACGAGCATATAACCTTAGCCGAGAGGCTTGGGCAATCTTGTTAACCTGCGTCGTGATAGGGATAGATTATTGCAATTATTAATCTT
0e058910a64cf7d1b679b035e1e67b1c
TCCAACCTACTAACTAGTGGGCGAATCTTTCTGTTCGCGACACTTCTTAGAGGGATAGGTGACTTTTAGTCACATGAGAAGGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCGGGGCTGCACGCGCGCTACACTGAAAGAATCAGCGTGCCAGAAAACCTTGCTTGACATGGCTAGGTAACCCGTTGAAAATCTTTCGTGATTGGGATCGGGACTTGCAAATGTGTCCCTT

...

It feels silly but I do not even have any idea where to start looking for a solution. Why can't the file be read?

Hi, @inga,

It looks like this might be a permissions issue related to your temporary directory or your input files. Can you share the output of the following commands?

  • qiime tools validate on your .qza input files
  • ls -lah on those same files

You can also run env | grep TMP to see what your temporary directory is set to and shared the output of ls -lah on that directory as well.

Also, regarding this commment:

I'm no export on file formats, but is that actually what we would expect?

Nowadays, modern bioinformatic programs that rely on the FASTA format expect the sequence headers to be preceded by ">"

source

Just want to double check on that!

Thanks for comprehensive and quick answer!

First of all: yes, of course, fasta format requires the ">" at the beginning of the header line and it is there. It somehow got lost when I copy/pasted to the forum (I checked the preview more carefully this time and then deleted this part). There is no problem there!

When I ran the commands you suggested I found that the .qza of my —i-reference-reads was empty. I can only imagine it was a mistake on my side when renaming files (I built this file and therefore had to check the format repeatedly, renaming was the last thing I did). A very silly mistake to not to check again.

Anyway, the results of the commands you suggested are (only examples shown the output is the same for all files):

(qiime2-2021.2) Ingas-iMac:DADA2 inga$ qiime tools validate RefseqB.qza
Result RefseqB.qza appears to be valid at level=max.

(qiime2-2021.2) Ingas-iMac:DADA2 inga$ ls -lah *qza
-rw-r--r--@ 1 inga staff 3.9M Apr 19 09:34 Reftax.qza
-rw-r--r--@ 1 inga staff 3.9M Apr 26 10:51 ReftaxA.qza

I assume that the permissions are fine?

I made the taxonomic files again, made sure that the size is ok, but still ran into trouble when running the command:

(qiime2-2021.2) Ingas-iMac:DADA2 inga$ qiime feature-classifier classify-consensus-vsearch --i-query denoiseSeqs180.qza --i-reference-reads RefseqB.qza --i-reference-taxonomy ReftaxB.qza --o-classification Class-DADA2-180-98.qza

Plugin error from feature-classifier:

'Identifier KY852270.1.1712_U was reported in taxonomic search results, but was not present in the reference taxonomy.'

Debug info has been saved to /var/folders/y5/2wrthn7s40l3q1f43wfbtz100000gn/T/qiime2-q2cli-err-2mxnp_mg.log

The log file is here:
feature-classifier_error-log.txt (3.9 KB)

I checked the files (incl. invisibles) and even the .qza (I could open it in BBEdit) but could not find anything obvious that distinguished this Identifier from the one above or below (invisibles). So I decided to remove this identifier from both files, taxonomy and sequence data. However, the same error with a different Identifier occurred. Both identifiers are at the same position in both sequence and taxonomy file, the two (so far) highlighted identifiers are in the middle of the files (as opposed to the first or the last position), and not next to each other.

Here are examples of my reference taxonomy and sequence files around the Identifier causing the error, also with line numbers:

Example_Refseqs_linenumbers.txt (14.6 KB) Example_Reftax.txt (3.0 KB) Example_Reftax_linenumbers.txt (3.1 KB) Example_Refseqs.txt (14.4 KB)

Is there a mistake in my files? I do not know what to do next to make this work.

Cheers and thanks,

Inga

@inga,

To be clear, it sounds like you are no longer having the problem which you were originally asking about. Is that right? If that's the case, I think your second question is a brand new, unrelated question, in which case we should start a new topic (I'm happy to take care of that for you this time if that works for you).

In the mean time, I'll take a look at your files as well as the previous discussion about this error, which you can do as well by searching the forum for "not present in the reference taxonomy." Here is a link for that search:

https://forum.qiime2.org/search?q=not%20present%20in%20the%20reference%20taxonomy

Dear @Andrew,
yes, the original issue was sorted and then I also managed to sort the next. After following the advice from previous posts (as you so rightly suggested) and growing ever more frustrated, I had a late-night "epiphany" and realised that my taxonomy file still had the ">" before the accession number :roll_eyes:. I didn't get it because the highlighted Identifiers are randomly distributed and the error came only after a few minutes running. I feared it was some silly formatting thing but didn't see it.
Thanks for the help!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.