Qiime demux summarize error with NextSeq data

ylor · March 7, 2018, 3:03pm

Hello,

I am new to qiime2 and am hoping to gain some more insight into the workings of qiime2. I have Illumina NextSeq data in which I edited the file names to match that of the Casava1.8 format and was able to successfully import my data files into .qza file format, but I cannot seem to get qiime demux summarize to work.

-Data import step that was successful:
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path /data/ylor/qiime2/paired-end-demux/ --source-format CasavaOneEightLanelessPerSampleDirFmt --output-path /data/ylor/qiime2/paired-end-demux/demux-paired-end.qza

-Error message for qiime demux summarize:
qiime demux summarize --i-data /data/ylor/qiime2_test/qiime2/paired-end-demux/demux-paired-end.qza --o-visualization demux-paired-end.qzv
Plugin error from demux:

[Errno 2] No such file or directory: '/tmp/qiime2-archive-87olkysj/47081ec5-d972-4b07-b9f3-81f17b8c3899/data/GAIM01_S35_R1_001.fastq.gz'

Debug info has been saved to /tmp/qiime2-q2cli-err-e0jg1yrp.log

-Did I use the wrong --source-format when importing my data to get this error message? I previously tried --source-format CasavaOneEightSingleLanePerSampleDirFmt, but it was not successful:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path /data/ylor/qiime2_test/qiime2/paired-end-demux/ --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path /data/ylor/qiime2_test/qiime2/paired-end-demux/demux-paired.qza
There was a problem importing /data/ylor/qiime2_test/qiime2/paired-end-demux/:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'

-Can someone clarify the difference between SingleLane and LaneLess for CasavaOneEight data import format?
-Should I use a completely different data import format?
-What am I doing incorrectly? Any help/suggestions are welcome.

Thanks!

thermokarst · March 7, 2018, 9:33pm

Hi @ylor!

Congrats, you found a bug! I opened an issue for tracking a fix for this.

The good news is that you have some other options available to you!

The CASAVA 1.8 format expects demuxed reads to have filenames that look like this:

L2S357_15_L001_R1_001.fastq.gz

The underscore-separated fields in this file name are the sample identifier, the barcode sequence or a barcode identifier, the lane number, the read number, and the set number.

The CasavaOneEightLanelessPerSampleDirFmt is a variant of that format that we have seen floating around, it is the exact same as CASAVA 1.8, except it is missing the lane in the filename:

L2S357_15_R1_001.fastq.gz

Check out the fastq manifest format, you should be able to create a manifest for these files and import!

Just curious, how did you stumble across the CasavaOneEightLanelessPerSampleDirFmt format? We haven't really documented it or advertised it, which is why there isn't much info floating around about it. Anyway, thanks, and keep us posted!

ylor · March 9, 2018, 3:18pm

Thank you for your response!

I stumbled upon CasavaOneEightLanelessPerSampleDirFmt because I was trying to figure out the best --source-format to use for my NextSeq data. I ran 'qiime tools import --show-importable-formats' in my terminal and it spat out the following:

AlignedDNAFASTAFormat
AlignedDNASequencesDirectoryFormat
AlphaDiversityDirectoryFormat
AlphaDiversityFormat
BIOMV100DirFmt
BIOMV100Format
BIOMV210DirFmt
BIOMV210Format
BooleanSeriesDirectoryFormat
BooleanSeriesFormat
CasavaOneEightLanelessPerSampleDirFmt
CasavaOneEightSingleLanePerSampleDirFmt
DNAFASTAFormat
DNASequencesDirectoryFormat
DeblurStatsDirFmt
DeblurStatsFmt
DistanceMatrixDirectoryFormat
EMPPairedEndCasavaDirFmt
EMPPairedEndDirFmt
EMPSingleEndCasavaDirFmt
EMPSingleEndDirFmt
FastqGzFormat
FirstDifferencesDirectoryFormat
FirstDifferencesFormat
HeaderlessTSVTaxonomyDirectoryFormat
HeaderlessTSVTaxonomyFormat
LSMatFormat
MultiplexedPairedEndBarcodeInSequenceDirFmt
MultiplexedSingleEndBarcodeInSequenceDirFmt
NewickDirectoryFormat
NewickFormat
OrdinationDirectoryFormat
OrdinationFormat
PairedDNASequencesDirectoryFormat
PairedEndFastqManifestPhred33
PairedEndFastqManifestPhred64
QIIME1DemuxDirFmt
QIIME1DemuxFormat
QualityFilterStatsDirFmt
QualityFilterStatsFmt
SingleEndFastqManifestPhred33
SingleEndFastqManifestPhred64
SingleLanePerSamplePairedEndFastqDirFmt
SingleLanePerSampleSingleEndFastqDirFmt
TSVTaxonomyDirectoryFormat
TSVTaxonomyFormat
TaxonomicClassiferTemporaryPickleDirFmt
UchimeStatsDirFmt
UchimeStatsFmt

I couldn't find documentation on how to use that specific --source-format, so I figured that I'd give it a shot because I had combined all four lanes of my NextSeq data into one lane. It seemed successful according to viewing the provenance, but I wasn't sure how to proceed with qiime demux summarize.

I also tried making a manifest file and tested it prior to posting on the forum, but was not able to successfully run it with qiime tools import. After doing some surfing in the qiime2 forum, I found the answer to making my manifest file work -- by deleting the quotations.

However, I am still confused as to how to determine whether I should use Phred33 or Phred64.. Can you point me to a forum that discusses this, if one already exists? I guessed at using Phred33...

Thanks!

thermokarst · March 9, 2018, 9:22pm

Hi @ylor!

Cool! Thanks for the update - I was just curious!

Please review the fastq manifest link I provided above - there are details there about the difference between Phred 33 and Phred 64, as well as links to more detailed resources there. The chances are very high that your reads are Phred 33 offset.

Thanks!

ylor · March 15, 2018, 12:48pm

Hello,

I'm a little confused...I have 3 samples that were sequenced, but am only seeing one sample after importing my data when visualizing it. Is that correct? Should I be seeing only one summary for all three samples in the interactive quality plot?

I tried running qiime dada2 denoise-paired, but got an error message.

> nohup qiime dada2 denoise-paired \
> --i-demultiplexed-seqs /data/ylor/qiime2_test/qiime2/paired-end-demux/paired-end-demux.qza \
> --p-trunc-len-f 0 \
> --p-trunc-len-r 0 \
> --p-trim-left-f 70 \
> --p-trim-left-r 70 \
> --o-representative-sequences  /data/ylor/qiime2_test/qiime2/paired-end-demux/rep-seqs-dada2-paired.qza \
> --o-table  /data/ylor/qiime2_test/qiime2/paired-end-demux/table-dada2-paired.qza &

log:

> R version 3.4.1 (2017-06-30)
> Loading required package: Rcpp
> DADA2 R package version: 1.6.0
> 1) Filtering ...
> 2) Learning Error Rates
> Not all sequences were the same length.
> Not all sequences were the same length.
> 2a) Forward Reads
> Initializing error rates to maximum possible estimate.
> Sample 1 - 1867664 reads in 211613 unique sequences.
>    selfConsist step 2
>    selfConsist step 3
>    selfConsist step 4
>    selfConsist step 5
>    selfConsist step 6
>    selfConsist step 7
>    selfConsist step 8
>    selfConsist step 9
> Convergence after  9  rounds.
> 2b) Reverse Reads
> Initializing error rates to maximum possible estimate.
> Sample 1 - 1867664 reads in 211613 unique sequences.
>    selfConsist step 2
>    selfConsist step 3
>    selfConsist step 4
>    selfConsist step 5
>    selfConsist step 6
>    selfConsist step 7
>    selfConsist step 8
>    selfConsist step 9
> Convergence after  9  rounds.
> 
> 3) Denoise remaining samples Not all sequences were the same length.
> Not all sequences were the same length.
> .Not all sequences were the same length.
> Not all sequences were the same length.
> .
> 4) Remove chimeras (method = consensus)
> Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
>   Input must be a valid sequence table.
> Calls: removeBimeraDenovo -> isBimeraDenovoTable
> In addition: Warning message:
> In is.na(colnames(unqs[[i]])) :
>   is.na() applied to non-(list or vector) of type 'NULL'
> Execution halted
> Running external command line application(s). This may print messages to stdout 
> and/or stderr.
> The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
> 
> Command: run_dada_paired.R /tmp/tmp0efcaxm6/forward /tmp/tmp0efcaxm6/reverse /tmp/tmp0efcaxm6/output.tsv.biom /tmp/tmp0efcaxm6/filt_f /tmp/tmp0efcaxm6/filt_r 0 0 70 70 2.0 2 consensus 1.0 1 1000000
> 
> Traceback (most recent call last):
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 179, in denoise_paired
>     run_commands([cmd])
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 35, in run_commands
>     subprocess.run(cmd, check=True)
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/subprocess.py", line 398, in run
>     output=stdout, stderr=stderr)
> subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmp0efcaxm6/forward', '/tmp/tmp0efcaxm6/reverse', '/tmp/tmp0efcaxm6/output.tsv.biom', '/tmp/tmp0efcaxm6/filt_f', '/tmp/tmp0efcaxm6/filt_r', '0', '0', '70', '70', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in __call__
>     results = action(**arguments)
>   File "<decorator-gen-354>", line 2, in denoise_paired
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
>     output_types, provenance)
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in _callable_executor_
>     output_views = self._callable(**view_args)
>   File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 194, in denoise_paired
>     " and stderr to learn more." % e.returncode)
> Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

-How come it seems to only be picking up that I have one sample, when I really have 3? Is it somehow analyzing all 3 samples as 1 sample instead?
-Am I using the trim vs trunc flags incorrectly? What is the difference between these two -- the descriptions seem the same? Are they both removing however many bases you specify?
-Any suggestions of how to resolve this issue?

Thanks!

thermokarst · March 16, 2018, 2:12am

Probably not - can you please provide us with the exact import command you used, and if it was a fastq manifest format, please provide your manifest file, too.

Something went wrong when importing and/or demultiplexing (although it sounds like your reads are already demultiplexed). Please provide all commands run.

Double-check the docs - trimming works on the 5' end while truncating works on the 3' end.

Once we get some more details on what you are doing to import we can provide more suggestions moving forward. Thanks!

ylor · March 16, 2018, 4:53pm

Thanks for getting back to me.

My manifest file consisted of:

sample-id,absolute-filepath,direction
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-demux/GAIM01_S35_R1_001.fastq.gz,forward
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-demux/GAIM03_S37_R1_001.fastq.gz,forward
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-demux/GAIM05_S39_R1_001.fastq.gz,forward
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-demux/GAIM01_S35_R1_001.fastq.gz,reverse
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-demux/GAIM03_S37_R1_001.fastq.gz,reverse
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-demux/GAIM05_S39_R1_001.fastq.gz,reverse

I looked again and did notice that my three samples showed up after I ran qiime demux summarize.

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /data/ylor/qiime2_test/qiime2/paired-end-demux/manifest_mini_pe.csv
--output-path /data/ylor/qiime2_test/qiime2/paired-end-demux/paired-end-demux.qza
--source-format PairedEndFastqManifestPhred33

qiime demux summarize \

--i-data /data/ylor/qiime2_test/qiime2/paired-end-demux/paired-end-demux.qza
--o-visualization /data/ylor/qiime2_test/qiime2/paired-end-demux/paired-end-demux.qza.qzv

Am I seeing error message in dada2 denoise-paired because I don't have the lane value specified in my manifest file?

Also, I have been wondering if there was a way for me in import all 4 lanes of my NextSeq data (and combine them all later after importing). I tried running all 4 lanes as is in my manifest file, but got an error message (below). Because of this error, that's why I decided to pass --no-lane-splitting when running bcl2fastq to demultiplex my data (above manifest file). Are there any intentions of supporting data generated from a NextSeq run in the near future?

manifest file:

sample-id,absolute-filepath,direction
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L001_R1_001.fastq.gz,forward
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L002_R1_001.fastq.gz,forward
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L003_R1_001.fastq.gz,forward
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L004_R1_001.fastq.gz,forward
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L001_R1_001.fastq.gz,forward
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L002_R1_001.fastq.gz,forward
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L003_R1_001.fastq.gz,forward
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L004_R1_001.fastq.gz,forward
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L001_R1_001.fastq.gz,forward
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L002_R1_001.fastq.gz,forward
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L003_R1_001.fastq.gz,forward
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L004_R1_001.fastq.gz,forward
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L001_R1_001.fastq.gz,reverse
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L002_R1_001.fastq.gz,reverse
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L003_R1_001.fastq.gz,reverse
GAIM01,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM01_S35_L004_R1_001.fastq.gz,reverse
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L001_R1_001.fastq.gz,reverse
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L002_R1_001.fastq.gz,reverse
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L003_R1_001.fastq.gz,reverse
GAIM03,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM03_S37_L004_R1_001.fastq.gz,reverse
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L001_R1_001.fastq.gz,reverse
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L002_R1_001.fastq.gz,reverse
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L003_R1_001.fastq.gz,reverse
GAIM05,/data/ylor/qiime2_test/qiime2/paired-end-lanes/GAIM05_S39_L004_R1_001.fastq.gz,reverse

error:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path /data/ylor/qiime2_test/qiime2/paired-end-lanes/manifest_mini_pe_03152018.csv --output-path /data/ylor/qiime2_test/qiime2/paired-end-lanes/paired-end-demux-lanes.qza --source-format PairedEndFastqManifestPhred33
Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/tools.py", line 116, in import_data
view_type=source_format)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 214, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 239, in _from_view
result = transformation(view)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in transformation
new_view = transformer(view)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 338, in _8
single_end=False)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 268, in _fastq_manifest_helper
absolute=True)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 158, in _parse_and_validate_manifest
_validate_paired_end_fastq_manifest_directions(manifest)
File "/opt/conda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_types/per_sample_sequences/_transformer.py", line 219, in _validate_paired_end_fastq_manifest_directions
'%s' % ', '.join(duplicated_ids_forward))
ValueError: Each sample id can have only one forward read record in a paired-end read manifest, but the following sample ids were associated with more than one forward read record: GAIM05, GAIM03, GAIM01

An unexpected error has occurred:

Each sample id can have only one forward read record in a paired-end read manifest, but the following sample ids were associated with more than one forward read record: GAIM05, GAIM03, GAIM01

See above for debug info.

Sorry for the super long post!

ChristianEdwardson · March 16, 2018, 9:34pm

Looks like you are specifying the same file for both forward and reverse reads. If you have paired reads you should have files that have *_R1_* in the name for the forward reads and files with *_R2_* in the name for reverse reads.

ylor · March 19, 2018, 9:01pm

Good catch Christian! I’ll edit my manifest file and re-import.

ebolyen · March 20, 2018, 3:16pm

Hey @ylor,

One more note in addition to @ChristianEdwardson's, it looks like you are importing multiple lanes into the same artifact. If you intend to use DADA2, you should make each lane it's own artifact so that each has an opportunity to train it's own error-model (you can merge the tables afterwards). In fact you are probably going to see that same error until the lanes are seperated, as the format is built with the expectation of a single-lane, so it thinks you have "duplicate" sample IDs.

ylor · March 20, 2018, 9:11pm

Thanks for your suggestion, Evan.

I edited my manifest file as Christian suggested and re-imported my data successfully. I was also able to successfully run qiime dada2 denoise-paired. If I have paired end reads, they should be merged after qiime dada2 denoise-paired in to my rep-seqs.qza file now correct? After dada2 denoise-paired, I ran my feature data summaries and now would like to run taxonomy assignment, but I am not sure how to proceed. Because I am looking at fish communities, I cannot use pre-trained classifers such as those from SILVA or GreenGenes. What would you suggest I use or do? Should I train my own feature classifiers?

I would also like to run otu picking then use qiime feature-classifier classify-consensus-blast to blast my results before doing taxonomy assignment as well so that I can compare my results from dada2 denoising and otu picking using QIIME2. But again, I am not sure how to format my data properly to start otu picking.

I am welcome to suggestions. Thanks!