Dada2 "No such file or directory" and "Error in open.connection" message

Hi,

The Qiime2 tutorials ran flawlessly in my mac. I’m trying to run an analysis on a large file (imported from Casava1.8 demultiplexed files) and am getting the following:

(qiime2-2017.2) dhcp80ffdd4e:A05_Microbiome apzlo$ qiime dada2 denoise-paired  --i-demultiplexed-seqs /Users/a
pzlo/Qiime2analyses/A05_Microbiome/demux-paired-end.qza  --o-table table  --o-representative-sequences rep-seq
s  --p-trim-left-f 10  --p-trim-left-r 10  --p-trunc-len-f 230  --p-trunc-len-r 230  --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/forward /var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/reverse /var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/output.tsv.biom /var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/filt_f /var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/filt_r 230 230 10 10 2.0 2 1 1000000

R version 3.3.2 (2016-10-31) 
Loading required package: Rcpp
DADA2 R package version: 1.2.1 
1) Filtering .............
2) Learning Error Rates
2a) Forward Reads
Initial error matrix unspecified. Error rates will be initialized to the maximum possible estimate from this data.
Initializing error rates to maximum possible estimate.
Sample 1 - 346911 reads in 55363 unique sequences.
Sample 2 - 205941 reads in 64178 unique sequences.
Sample 3 - 346990 reads in 83357 unique sequences.
Sample 4 - 212410 reads in 55848 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 


Convergence after  4  rounds.
2b) Reverse Reads
Initial error matrix unspecified. Error rates will be initialized to the maximum possible estimate from this data.
Initializing error rates to maximum possible estimate.
Sample 1 - 346911 reads in 24488 unique sequences.
Sample 2 - 205941 reads in 53880 unique sequences.
Sample 3 - 346990 reads in 59239 unique sequences.
Sample 4 - 212410 reads in 38235 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 
   selfConsist step 5 
   selfConsist step 6 


Convergence after  6  rounds.

3) Denoise remaining samples Error in open.connection(con, "rb") : cannot open the connection
Calls: derepFastq ... FastqStreamer -> FastqStreamer -> open -> open.connection
In addition: Warning message:
In open.connection(con, "rb") :
  cannot open file '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/filt_f/16_S11_L001_R1_001.fastq.gz': No such file or directory
Execution halted
Traceback (most recent call last):
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_dada2-2017.2.0-py3.5.egg/q2_dada2/_denoise.py", line 154, in denoise_paired
    run_commands([cmd])
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_dada2-2017.2.0-py3.5.egg/q2_dada2/_plot.py", line 26, in run_commands
    subprocess.run(cmd, check=True)
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/forward', '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/reverse', '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/output.tsv.biom', '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/filt_f', '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o/filt_r', '230', '230', '10', '10', '2.0', '2', '1', '1000000']' returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2cli-2017.2.0-py3.5.egg/q2cli/commands.py", line 217, in __call__
    results = action(**arguments)
  File "<decorator-gen-133>", line 2, in denoise_paired
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/action.py", line 171, in callable_wrapper
    output_types, provenance)
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/action.py", line 248, in _callable_executor_
    output_views = callable(**view_args)
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_dada2-2017.2.0-py3.5.egg/q2_dada2/_denoise.py", line 165, in denoise_paired
    return _denoise_helper(biom_fp, hashed_feature_ids)
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/tempfile.py", line 808, in __exit__
    self.cleanup()
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/tempfile.py", line 812, in cleanup
    _shutil.rmtree(self.name)
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/shutil.py", line 488, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/shutil.py", line 370, in _rmtree_unsafe
    onerror(os.listdir, path, sys.exc_info())
  File "/Users/apzlo/miniconda3/envs/qiime2-2017.2/lib/python3.5/shutil.py", line 368, in _rmtree_unsafe
    names = os.listdir(path)
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o'

Plugin error from dada2:

  [Errno 2] No such file or directory:
  '/var/folders/qr/vd5b6lqx18z8zzhc612cwk9r0000gn/T/tmphkndm05o'

See above for debug info.

I have no prior experience with dada2, I’m having trouble figuring where to start to solve this, I’d be happy to provide more info if needed.

Thank you very much,

Alejandro Pezzulo

Hey @apzlo,

It looks like dada2 wasn’t able to find one of your forward reads (16_S11_L001_R1_001.fastq.gz). Another user is also having a similar issue (but with a missing reverse read file).

Right now we aren’t quite sure what is going on, so we’re interested in understanding what is inside of the artifact demux-paired-end.qza. Would you be able to do the following:

qiime tools export (path to demux-paired-end.qza) --output-path debug-demux/

and give use the results of:

ls debug-demux/

and we can get started from there, thanks!

Dear @ebolyen,

Thanks for your response. Here it is:

(qiime2-2017.2) dhcp80ffdd4e:A05_Microbiome apzlo$ ls debug-demux/
10_S7_L001_R1_001.fastq.gz	18_S12_L001_R1_001.fastq.gz	7_S4_L001_R1_001.fastq.gz
10_S7_L001_R2_001.fastq.gz	18_S12_L001_R2_001.fastq.gz	7_S4_L001_R2_001.fastq.gz
11_S8_L001_R1_001.fastq.gz	19_S13_L001_R1_001.fastq.gz	8_S5_L001_R1_001.fastq.gz
11_S8_L001_R2_001.fastq.gz	19_S13_L001_R2_001.fastq.gz	8_S5_L001_R2_001.fastq.gz
12_S9_L001_R1_001.fastq.gz	2_S1_L001_R1_001.fastq.gz	9_S6_L001_R1_001.fastq.gz
12_S9_L001_R2_001.fastq.gz	2_S1_L001_R2_001.fastq.gz	9_S6_L001_R2_001.fastq.gz
13_S10_L001_R1_001.fastq.gz	5_S2_L001_R1_001.fastq.gz	MANIFEST
13_S10_L001_R2_001.fastq.gz	5_S2_L001_R2_001.fastq.gz	metadata.yml
16_S11_L001_R1_001.fastq.gz	6_S3_L001_R1_001.fastq.gz
16_S11_L001_R2_001.fastq.gz	6_S3_L001_R2_001.fastq.gz 

Thank you,

Alejandro

Thanks @apzlo, looks like everything is there that should be. I’m going to get in contact with the plugin developer to find out more about what is happening. At this point we may need the full data, would it be possible to provide the .qza file? That may be useful for debugging things on our end.

I’d be happy to. It’s a 3.8 gb file, what would be a good way of getting it to you?
Thanks again
Alejandro

I was running a small sample of demux paired-end dataset using the denoise-paired, it works. Then I started to run the whole dataset and got the plugin error as below:

(qiime2-2017.2) Jinbings-iMac:qiime2 jinbingbai$ qiime dada2 denoise-paired \
> --i-demultiplexed-seqs demux-paired-end.qza \
> --o-table table.qza \
> --o-representative-sequences rep-seqs.qza \
> --p-trim-left-f 0 \
> --p-trim-left-r 0 \
> --p-trunc-len-f 200 \
> --p-trunc-len-r 200

Plugin error from dada2:

  Command '['run_dada_paired.R', '/var/folders/sd/vlrr9d916gg5qj3yzzf3sr
  fw0000gn/T/tmppy7c0qis/forward', '/var/folders/sd/vlrr9d916gg5qj3yzzf3
  srfw0000gn/T/tmppy7c0qis/reverse', '/var/folders/sd/vlrr9d916gg5qj3yzz
  f3srfw0000gn/T/tmppy7c0qis/output.tsv.biom',
  '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmppy7c0qis/filt_f',
  '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmppy7c0qis/filt_r',
  '200', '200', '0', '0', '2.0', '2', '1', '1000000']' returned non-zero
  exit status 1

Re-run with --verbose to see debug info.

Could you help me figure out what is wrong?

Thanks, Bing

Could you re-run with --verbose added to your command? It will give us more information about what went wrong inside of dada2.

@ebolyen Here is the rerun and got the same error:

(qiime2-2017.2) Jinbings-iMac:qiime2 jinbingbai$ qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trim-left-f 10 --p-trunc-len-f 200 --p-trim-left-r 0 --p-trunc-len-r 200 --o-table table.qza --o-representative-sequences rep-seqs.qza --verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/forward /var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/reverse /var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/output.tsv.biom /var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_f /var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_r 200 200 10 0 2.0 2 1 1000000

R version 3.3.2 (2016-10-31) 
Loading required package: Rcpp
DADA2 R package version: 1.2.1 
1) Filtering .

...................................................
2) Learning Error Rates
2a) Forward Reads
Initial error matrix unspecified. Error rates will be initialized to the maximum possible estimate from this data.
Initializing error rates to maximum possible estimate.
Sample 1 - 228784 reads in 51536 unique sequences.
Sample 2 - 443168 reads in 53721 unique sequences.
Sample 3 - 184825 reads in 29457 unique sequences.
Sample 4 - 262860 reads in 28629 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 
   selfConsist step 5 
   selfConsist step 6 
   selfConsist step 7 
   selfConsist step 8 
   selfConsist step 9 
   selfConsist step 10 

Warning message:
In dada(drpsF, err = NULL, selfConsist = TRUE, multithread = multithread) :
  Self-consistency loop terminated before convergence.
2b) Reverse Reads
Initial error matrix unspecified. Error rates will be initialized to the maximum possible estimate from this data.
Initializing error rates to maximum possible estimate.
Sample 1 - 228784 reads in 113398 unique sequences.
Sample 2 - 443168 reads in 108942 unique sequences.
Sample 3 - 184825 reads in 65152 unique sequences.
Sample 4 - 262860 reads in 66738 unique sequences.
   selfConsist step 2 
   selfConsist step 3 
   selfConsist step 4 
   selfConsist step 5 


Convergence after  5  rounds.

3) Denoise remaining samples ...................................Error in open.connection(con, "rb") : cannot open the connection
Calls: derepFastq ... FastqStreamer -> FastqStreamer -> open -> open.connection
In addition: Warning message:
In open.connection(con, "rb") :
  cannot open file '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_r/GYN6014v2_S12_L001_R2_001.fastq.gz': No such file or directory
Execution halted
Traceback (most recent call last):
  File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2cli-2017.2.0-py3.5.egg/q2cli/commands.py", line 217, in __call__
    results = action(**arguments)
  File "<decorator-gen-133>", line 2, in denoise_paired
  File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/action.py", line 171, in callable_wrapper
    output_types, provenance)
  File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/action.py", line 248, in _callable_executor_
    output_views = callable(**view_args)
  File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_dada2-2017.2.0-py3.5.egg/q2_dada2/_denoise.py", line 154, in denoise_paired
    run_commands([cmd])
  File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.2/lib/python3.5/site-packages/q2_dada2-2017.2.0-py3.5.egg/q2_dada2/_plot.py", line 26, in run_commands
    subprocess.run(cmd, check=True)
  File "/Users/jinbingbai/miniconda3/envs/qiime2-2017.2/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/forward', '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/reverse', '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/output.tsv.biom', '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_f', '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_r', '200', '200', '10', '0', '2.0', '2', '1', '1000000']' returned non-zero exit status 1

Plugin error from dada2:

  Command '['run_dada_paired.R', '/var/folders/sd/vlrr9d916gg5qj3yzzf3sr
  fw0000gn/T/tmpjofs4kas/forward', '/var/folders/sd/vlrr9d916gg5qj3yzzf3
  srfw0000gn/T/tmpjofs4kas/reverse', '/var/folders/sd/vlrr9d916gg5qj3yzz
  f3srfw0000gn/T/tmpjofs4kas/output.tsv.biom',
  '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_f',
  '/var/folders/sd/vlrr9d916gg5qj3yzzf3srfw0000gn/T/tmpjofs4kas/filt_r',
  '200', '200', '10', '0', '2.0', '2', '1', '1000000']' returned non-
  zero exit status 1

See above for debug info.
(qiime2-2017.2) Jinbings-iMac:qiime2 jinbingbai$ 

I checked the file folder: GYN6014v2_S12_L001_R2_001.fastq.gz file is there. Please help me with this!

Thanks, Bing

Thanks for the details @Bing!

Would you be able to extract/export demux-paired-end.qza? You can do this with:

qiime tools export (path to demux-paired-end.qza) --output-dir debug-dumux/

and then if you could share the output of:

ls debug-demux/

Since you say GYN6014v2_S12_L001_R2_001.fastq.gz was present in your data, I’m curious if something went wrong on import and if the file maybe is missing in demux-paired-end.qza (.qza files are just zipped directories).

It looks like you are having the same issue as another user. So running the above steps probably isn’t needed now. If possible, could you send us your demux-paired-end.qza? We need to do some manual debugging on our side. Thanks!

1 Like

@ebolyen I checked about ls debug-demux/. Everything seems good and GYN6014v2_S12_L001_R2_001.fastq.gz is there!

The demux-paired-end.qza is about 10.09GB and please let me know how to send it to you!

Best, Bing

@Bing it looks like @apzlo’s data-set also causes this issue and is smaller, so we’ll start debugging there. I’m going to merge these threads to keep things together (and so that you can track progress in the same thread). Thanks for your help with all this!

Thanks @apzlo, if you have a dropbox account that would work (I'll only download once). Alternatively @thermokarst discovered this service just recently: https://nofile.io/ which seems to support up to 10gb of upload without registering for an account. I have no idea what they do with the data however. We can also do a multi-part zip email (that will take a while, but you can send me a direct-message for my email).

@ebolyen I tried running it with just two samples in the demultiplexed file and it ran ok. If it’s ok I will try the same, two samples only, but including the one that gave the error message.

@ebolyen
I am looking forward to hearing the new update about the dada2 debugging!

Thanks

1 Like

Good idea! If we're lucky that might give us a small test set to debug against.

@ebolyen Well that was weird, when I ran it with only those samples also it went well… could it have anything to do with my machine having only 4GB RAM?
I will try running it again with all samples tonight and will report tomorrow if it’s done.

@apzlo I was afraid of something like that. Double checking that this is a reproducible problem is a good idea. Assuming it is reproducible (fingers crossed), we’re thinking of setting up an FTP server here that you can send the files to (we’ll send you and @Bing the details of that later).

Thanks everyone for being so helpful and awesome!

@ebolyen and @Bing so I ran it again with the complete dataset and it looks like it worked.
The only differences were:

  1. Before, the demux-paired-end.qza file location / working directory was different than the fastq files, this time I had all files in the same folder/same as working directory
  2. I tried to free up as much memory as possible

I don’t have a clear answer why it worked this time, but will stick to everything in the same directory including the fastq files for now, although I thought once the demux.qza file was ready I would not need the fastq files around…

Thanks

Thanks for the data @apzlo and @Bing! I'm going to run your commands on our local dev cluster today to see if I am able to reproduce.

This shouldn't matter in principle. The .qza/.qzv files are just .zip files. What QIIME 2 does is unzip them to a temporary directory, then it operates on that unzipped data without touching the .qza again.

Since your dataset is right at the boundary of available memory, this may be the reason it worked.
@thermokarst is going to run a couple tests on smaller AWS instances to see if we can replicate that way.

I'm kind of guessing I won't be able to reproduce the issues (time will tell), but at this point I'm suspicious of hardware failure and memory. @Bing was able to re-run with --verbose so whatever constraint it was, was reliable enough for that. My bet is on memory at the moment because it is hard to get hardware failure to happen twice the same way.

@Bing how much memory did you have when you ran your commands?