Name of fastq file: barcode identifier

Hi,

I try to analyze over 1000 samples together with qiime2-2018.6. So barcode identifier may be same for two samples that are analyzed on different plates. For example, a file is named as AB921_15_L001_R1_001.fastq.gz, where 15 is the barcode identifier. Another different sample may be named as AB922_15_L001_R1_001.fastq.gz, where 15 is also the barcode identifier. If I run like this, I got an error message as:

not a(n) CasavaOneEightSingleLanePerSampleDirFmt:

Duplicate samples in forward reads: {‘MC’}

Is it ok to rename the barcode to, for example, 16 for the second sample file? Is it ok to name barcode with 4 digits since I have over 1000 samples? The barcode should be unique for each of the 1000 files, am I right?

Many thanks!

Hi @xjyang69!

You are correct that the barcode segment is supposed to be unique (at least up to a sample), but I wouldn't jump to that being the issue right away.

A bigger question I have is what caused all of these barcode segments to be the same? Are you merging multiple runs together into the same import step?

The error:

is likely referring to the fact that you have a duplicates sample ID segment (rather than a barcode, which I don't think we pay very much attention to).

Hi Evan,

Thanks for your reply. Duplication of these barcode segments is because I have over 1000 samples that were analyzed in about 10 plates and barcode ID for each plate are named the same way (e.g. S1 to S98 for each plate). These files are already demultiplexed. I want to analyze these files in one batch. I would like to know if duplication of barcode in the file name has any effects on data processing.

Thanks!

Hey @xjyang69,

No, the repeated barcode IDs in the file name should have no impact so long as the sample names are unique.

That being said, if you are going to use DADA2 it is usually best to denoise each run independently and then merge the resulting tables and rep-seqs. If you aren't using DADA2 then it won't matter.

Hope that helps!

1 Like

Hi Evan,

Thanks for your reply. If I use DADA2 to denoise the whole batch in one run, what problems will be? Could you please show how to merge the tables and rep-seq if denoising each run independently?

Many thanks!

Hi Evan,

Just a follow up with my question in my previous email about one day ago. After running the denoising step (DADA2) step for 5 days, I got the following error message.

Plugin error from dada2:

** An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.**

Debug info has been saved to /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/qiime2-q2cli-err-1zhdr67i.log

I appreciate if you can guide on this.

Thanks!

Hi @xjyang69, could you please post the contents of the log file listed above? @ebolyen will need that to tell you what type of error you encountered.

Hi Nicholas,

I tried to view the log file but was unable to locate and view the content of the file. I am not sure if it is deleted. Would you mind telling me how to view the file? The error message appeared after I running DADA2 for 5 days for over 1000 files in one batch.

Thanks!

Those log files are temporary. Run your commands with the --verbose flag added to the end of the command to have the error printed to the terminal.

You can try running this on just a subset of samples — you may not get the same error if there is a specific sample causing problems, but you might be able to replicate the error in less time so it is worth a try.

Please give that a try and let us know what you see!

Hi Nicholas,

Thanks for your help. As I mentioned previously, I have over 1000 files with name like WT_170_4_S92_L001_R1_001.fastq.gz and T_170_4_S92_L001_R2_001.fastq.gz. Also, there are repeated barcode IDs in the file name (e.g. several files may have same barcodes ID as S92). All the files are already demultiplexed.

Evan Bolyen suggested the repeated barcode IDs should have no impact so long as the sample names are unique. But he said if using DADA2 it is usually best to denoise each run independently and then merge the resulting tables and rep-seqs. If you aren’t using DADA2 then it won’t matter.

Could you please show how to merge the tables and rep-seq if denoising each run independently?

I prefer to denoise these files in one batch and am wondering what problems will be If I use DADA2 to denoise the whole batch in one run?

Based on your suggestion, I re-ran the denoising process with the --verbose flag added to the end of the command as shown below.

qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 240 --p-trunc-len-r 240 --o-table table.qza --o-representative-sequences rep-seqs.qza --o-denoising-stats denoising-stats.qza --verbose

All the output including error message in the terminal is shown below.

Running external command line application(s). This may print messages to stdout and/or stderr.

The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/forward /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/reverse /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/output.tsv.biom /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/track.tsv /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/filt_f /var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/filt_r 240 240 10 10 2.0 2 consensus 1.0 1 1000000

_R version 3.4.1 (2017-06-30) _

Loading required package: Rcpp

_DADA2 R package version: 1.6.0 _

1) Filtering …

2) Learning Error Rates

2a) Forward Reads

Initializing error rates to maximum possible estimate.

Sample 1 - 33515 reads in 22012 unique sequences.

Sample 2 - 54479 reads in 29234 unique sequences.

Sample 3 - 117720 reads in 42687 unique sequences.

Sample 4 - 86901 reads in 45368 unique sequences.

Sample 5 - 72684 reads in 52417 unique sequences.

Sample 6 - 43365 reads in 22416 unique sequences.

Sample 7 - 66016 reads in 32621 unique sequences.

Sample 8 - 54396 reads in 28094 unique sequences.

Sample 9 - 92065 reads in 55621 unique sequences.

Sample 10 - 135515 reads in 42066 unique sequences.

Sample 11 - 117354 reads in 54093 unique sequences.

Sample 12 - 92504 reads in 36531 unique sequences.

Sample 13 - 83886 reads in 53143 unique sequences.

_selfConsist step 2 _

_selfConsist step 3 _

_selfConsist step 4 _

_selfConsist step 5 _

_selfConsist step 6 _

_selfConsist step 7 _

_selfConsist step 8 _

_selfConsist step 9 _

_selfConsist step 10 _

Self-consistency loop terminated before convergence.

2b) Reverse Reads

Initializing error rates to maximum possible estimate.

Sample 1 - 33515 reads in 26068 unique sequences.

Sample 2 - 54479 reads in 40150 unique sequences.

Sample 3 - 117720 reads in 57240 unique sequences.

Sample 4 - 86901 reads in 56945 unique sequences.

Sample 5 - 72684 reads in 61212 unique sequences.

Sample 6 - 43365 reads in 29865 unique sequences.

Sample 7 - 66016 reads in 40721 unique sequences.

Sample 8 - 54396 reads in 38366 unique sequences.

Sample 9 - 92065 reads in 66063 unique sequences.

Sample 10 - 135515 reads in 65523 unique sequences.

Sample 11 - 117354 reads in 65084 unique sequences.

Sample 12 - 92504 reads in 51016 unique sequences.

Sample 13 - 83886 reads in 64734 unique sequences.

_selfConsist step 2 _

_selfConsist step 3 _

_selfConsist step 4 _

_selfConsist step 5 _

_selfConsist step 6 _

_selfConsist step 7 _

_selfConsist step 8 _

_selfConsist step 9 _

Convergence after 9 rounds.

3) Denoise remaining samples …Error in open.connection(con, “rb”) : cannot open the connection

Calls: derepFastq … FastqStreamer -> FastqStreamer -> open -> open.connection

In addition: Warning message:

In open.connection(con, “rb”) :

cannot open file ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/filt_r/RN_9_S93_L001_R2_001.fastq.gz’: No such file or directory

Execution halted

Traceback (most recent call last):

_File "/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_dada2/denoise.py", line 229, in denoise_paired

run_commands([cmd])

_File "/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_dada2/denoise.py", line 36, in run_commands

subprocess.run(cmd, check=True)

File “/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/subprocess.py”, line 398, in run

output=stdout, stderr=stderr)

subprocess.CalledProcessError: Command ‘[‘run_dada_paired.R’, ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/forward’, ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/reverse’, ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/output.tsv.biom’, ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/track.tsv’, ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/filt_f’, ‘/var/folders/7s/tyj1qvdx3vnbdnfgmgl6c7zc0000gn/T/tmpd6pa_e7s/filt_r’, ‘240’, ‘240’, ‘10’, ‘10’, ‘2.0’, ‘2’, ‘consensus’, ‘1.0’, ‘1’, ‘1000000’]’ returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call

results = action(**arguments)

File “<decorator-gen-380>”, line 2, in denoise_paired

File “/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 232, in bound_callable

output_types, provenance)

File “/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 367, in callable_executor

_output_views = self.callable(**view_args)

_File "/Volumes/4tb2/charris/QIIME/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_dada2/denoise.py", line 244, in denoise_paired

" and stderr to learn more." % e.returncode)

Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.

Many thanks in advance and look forward to your guidance!

James

Hey @xjyang69,

Sorry for disappearing there and thanks @Nicholas_Bokulich for taking over!

The FMT tutorial demonstrates how to merge tables/rep-seqs.


I've seen this happen before when the run takes longer than 3 days on OS X. What happens is OS X will delete temporary files after 3 days, no matter what.

The solution is to set TMPDIR to another location and try again (or use Linux):

mkdir ~/q2-tmp
export TMPDIR=~/q2-tmp
<run your command>
rm -r ~/q2-tmp

Sorry you ran into that!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.