I’m posting this in hopes of getting more help running Dada2. I solved my memory issue by gaining access to a local server with more computing power than my current PC, but now I’m encountering a new issue. After running for about a week and a half, this error presented itself:
Denoise remaining samples Error in open.connection(con, “rb”) : cannot open the connection
Calls: derepFastq … FastqStreamer -> FastqStreamer -> open -> open.connection
In addition: Warning message:
In open.connection(con, “rb”) :
cannot open file ‘/tmp/tmpdftfsbgd/Sample67_22_L001_R1_001.fastq.gz’: No such file or directory
Execution halted
Plugin error from dada2:
[Errno 2] No such file or directory: ‘/tmp/tmpdftfsbgd’
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
After this, my command terminated and no files were created. I have a hunch that this may be related to theseissues a few other users have had regarding machines wiping temporary directories, but I’m not completely sure nor do I know of a possible fix. The machine runs Python and has 32 gigs of RAM I can access by SSHing into it with MobaXTerm and it has qiime2-2.2018 as the installed version. Due to the long running time, I’ve been backgrounding the process. I’ll include my line of code and the entire --verbose output below.
–verbose output:
R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0
Filtering …
Learning Error Rates
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Not all sequences were the same length.
Initializing error rates to maximum possible estimate.
Sample 1 - 38074 reads in 23364 unique sequences.
Sample 2 - 91605 reads in 56180 unique sequences.
Sample 3 - 14646 reads in 10376 unique sequences.
Sample 4 - 59667 reads in 36929 unique sequences.
Sample 5 - 42 reads in 42 unique sequences.
Sample 6 - 23491 reads in 16622 unique sequences.
Sample 7 - 15774 reads in 9666 unique sequences.
Sample 8 - 10384 reads in 6765 unique sequences.
Sample 9 - 14541 reads in 8029 unique sequences.
Sample 10 - 27921 reads in 15609 unique sequences.
Sample 11 - 33786 reads in 22936 unique sequences.
Sample 12 - 30504 reads in 20602 unique sequences.
Sample 13 - 36670 reads in 22510 unique sequences.
Sample 14 - 1044903 reads in 594457 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
selfConsist step 7
selfConsist step 8
selfConsist step 9
Convergence after 9 rounds.
Denoise remaining samples Error in open.connection(con, “rb”) : cannot open the connection
Calls: derepFastq … FastqStreamer -> FastqStreamer -> open -> open.connection
In addition: Warning message:
In open.connection(con, “rb”) :
cannot open file ‘/tmp/tmpdftfsbgd/Sample67_22_L001_R1_001.fastq.gz’: No such file or directory
Execution halted
Plugin error from dada2:
[Errno 2] No such file or directory: ‘/tmp/tmpdftfsbgd’
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
This looks like your host OS might've cleaned up your temp dir before the job had finished processing.
First off, it looks like you aren't trimming or truncating your reads at all --- this can have a huge impact on DADA2 runtime --- it is a function of the quality of the reads, so by trimming low quality positions, you help reduce the burden applied to the algorithm. If you would like us to weight on on some suggestions for those values, please run demux summarize and post the results here.
This can also potentially cause problems with the OS and the tempdir cleanup - it might be worth exploring some other options, like screen or tmux.
You can change the tempdir to a new location that the OS doesn't clean up automatically, see this post for more details:
First off, it looks like you aren’t trimming or truncating your reads at all — this can have a huge impact on DADA2 runtime — it is a function of the quality of the reads, so by trimming low quality positions, you help reduce the burden applied to the algorithm. If you would like us to weight on on some suggestions for those values, please run demux summarize and post the results here.
Here's the visualization file for my reads. I'd love some advice on how to decide when and what to truncate. The reason I didn't trim or truncate any reads here is because they all appeared to be of equal quality to me. Brightbeard Sequence Quality File.qzv (284.2 KB)
You can change the tempdir to a new location that the OS doesn’t clean up automatically, see this post for more details:
I'll give this a shot! I also switched to a node on the server with twice as much RAM, so hopefully I'll have something to follow up with soon. Thanks!
These reads appear to be pre-joined. Can you confirm that? It is worth noting that DADA2 expects to operate on unjoined reads (this is because the algorithm uses the quality scores --- when you join forward and reverse reads, how do you handle the quality score? Seems like many packages that offer this will make some arbitrary judgement). If you want to use DADA2, I would highly recommend working with the reads prior to joining (DADA2 will join the reads for you, btw).
The quality score histograms look pretty artificial to me - what kind of cleanup has happened to these reads? Can you provide some details about sequencing technology and protocols used? If there has been QC applied to these reads, then again, DADA2 probably won't be your best bet here, since you generally want to provide the "rawest" data possible!
These reads appear to be pre-joined. Can you confirm that?
They are. My lab is currently in transition from our old 454-pyrosequencing primers to Illumina primers, as we are waiting on grant funding to purchase new Illumina primers. This means I have to do some work in Qiime1 first in order to orient the reads correctly according to our labelled barcodes because our forward and reverse reads do not correspond to the Illumina run's forward and reverse reads. Once I've joined the forward and reverse reads from the Illumina run, I can reorient those sequences and extract my barcodes, generating the barcodes file and sequences file I use to demultiplex my sequences. Essentially, the workflow I am trying looks something like this:
Obtain reads (forward and reverse .fastq files) -> Join paired ends using the SeqPrep method in Qiime1-> reorient these reads according to my labeled forward primers and reverse primer and extract the barcodes in Qiime1 -> Try and bring these files into the Qiime2 pipeline using the import tool.
I'd be interested in using DADA2 to clean up my reads earlier, but I haven't thought of a way to get my data into the .qza format without taking the steps above to get my actual bardcodes from the .fastq files. If there is some work-around in Qiime2 that I'm missing, I'd love to learn about it.
On a side note, I may not be the only person in this situation trying to use 454-pyrosequencing primers in Illumina sequencing (or maybe I am ). I can write up a more detailed explanation of my workflow in a different post if you think some others would find it helpful.
The quality score histograms look pretty artificial to me - what kind of cleanup has happened to these reads? Can you provide some details about sequencing technology and protocols used?
After amplification of our target gene region (here it's the ITS region for fungi), we clean our cloned product with a magnetic bead purification kit. Then we send our samples off for Illumina sequencing. On the analysis end, I don't do any direct QC until this DADA2 step, which I'm beginning to suspect isn't something I really need to do anymore. If I don't use DADA2, am I able to create a summary file directly from my demultiplexed file?
Thanks for the info, @Brightbeard. Unfortunately, as I understand it, you probably shouldn’t send these reads through DADA2, since it violates a few assumptions (note, maybe in the future we will get a q2 plugin for unifying read orientation).
Check out the second half of this tutorial for a deblur-based workflow that would be appropriate for your pre-joined sequences. Hope that helps!