Hi!
I would like to run qiime2 on some data I found online :
got the data from MG-Rast
It looks to me that they did PE250 MiSeq, but when I download, I get either alreayd joined reads, or just forward reads. The explanation in the article is not too clear as well : "We then combined the file with merged reads and only the read 1 file; we chose to use just the “not combined” read 1 file because read 1 tends to be of higher quality than read 2 and by not using both “not combined” files we minimize inaccurately estimating abundance."
So I make the manifest file, mark all files as "forward" (I rename the sampleid's, not keeping the file names, this was ok for some other datasets...), import
qiime tools import
--type SampleData[SequencesWithQuality]
--input-format SingleEndFastqManifestPhred33
--input-path ../CG_manifest.csv
--output-path ../seq_CG.qza
Then cut away the primers with cutadapt
qiime cutadapt trim-single
--i-demultiplexed-sequences seq_CG.qza
--p-front GTGCCAGCMGCCGCGGTAA
--p-adapter ATTAGAWACCCBDGTAGTCC
--o-trimmed-sequences cut_seq_CG.qza
--p-cores 60
--verbose
And then denoise with dada2 - but this comes with several errors messages of :
[Parent][DispatchAsyncMessage] Error: PClientHandle::Msg_PClientHandleOpConstructor Route error: message sent to unknown actor ID
qiime dada2 denoise-single \
--i-demultiplexed-seqs cut_seq_CG.qza
--p-trunc-len 250
--p-trim-left 3
--p-n-threads 40
--verbose
--output-dir dada2out_cut_CG
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: run_dada_single.R /tmp/qiime2-archive-ocn_lshq/e874d691-ca35-45f6-98f5-4963426bdca9/data /tmp/tmp4wlrjxgm/output.tsv.biom /tmp/tmp4wlrjxgm/track.tsv /tmp/tmp4wlrjxgm 250 3 2.0 2 Inf consensus 1.0 40 1000000 NULL 16
R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0
- Filtering ....................
- Learning Error Rates
Initializing error rates to maximum possible estimate.
Sample 1 - 194076 reads in 97220 unique sequences.
Sample 2 - 122022 reads in 87339 unique sequences.
Sample 3 - 126001 reads in 79465 unique sequences.
###!!! [Parent][DispatchAsyncMessage] Error: PClientHandle::Msg_PClientHandleOpConstructor Route error: message sent to unknown actor ID
Sample 4 - 76273 reads in 54062 unique sequences.
Sample 5 - 113627 reads in 88027 unique sequences.
Sample 6 - 25210 reads in 21905 unique sequences.
Sample 7 - 314921 reads in 172930 unique sequences.
###!!! [Parent][DispatchAsyncMessage] Error: PClientHandle::Msg_PClientHandleOpConstructor Route error: message sent to unknown actor ID
Sample 8 - 114129 reads in 85025 unique sequences.
selfConsist step 2
selfConsist step 3
This process seems to go on (already for quite long), but even if it finishes, I am not too sure if I understand what happened and why and if I can trust the resulting file.
Couldn't find this error related to qiime / dada2 when googleing.
Thank you in advance!