Problem with deblur denoise-16S

I’m running deblur with the following command:

qiime2@qiime2core2017-4:~$ qiime deblur denoise-16S   --i-demultiplexed-seqs ~/Desktop/OWNew/paired/OWNew_paired_demux_filtered.qza   --p-jobs-to-start 4   --p-trim-length 275   --o-representative-sequences ~/Desktop/OWNew/paired/rep-seqs-deblur.qza   --o-table ~/Desktop/OWNew/paired/table-deblur.qza --verbose   --o-stats ~/Desktop/OWNew/paired/deblur-stats.qza

and getting the following error:

/miniconda3/lib/python3.5/site-packages/deblur/workflow.py:92: UserWarning: input file /tmp/tmp3nr7nsgx/all.seqs.fa.no_artifacts does not appear to be FASTA or FASTQ
  warnings.warn(msg, UserWarning)
Traceback (most recent call last):
  File "/miniconda3/lib/python3.5/site-packages/q2cli/commands.py", line 218, in __call__
    results = action(**arguments)
  File "<decorator-gen-209>", line 2, in denoise_16S
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 171, in callable_wrapper
    output_types, provenance)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/action.py", line 272, in _callable_executor_
    provenance.fork())
  File "/miniconda3/lib/python3.5/site-packages/qiime2/sdk/result.py", line 216, in _from_view
    result = transformation(view)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in transformation
    new_view = transformer(view)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/core/transform.py", line 193, in wrapped
    new_view.file.write_data(file_view, self._wrapped_view_type)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 86, in write_data
    result = transformation(view)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/core/transform.py", line 57, in transformation
    self.validate(view)
  File "/miniconda3/lib/python3.5/site-packages/qiime2/core/transform.py", line 114, in validate
    view.validate()
  File "/miniconda3/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py", line 31, in validate
    % (self.path, self.__class__.__name__))
ValueError: InPath('/tmp/q2-DNAFASTAFormat-cornsvka') is not formatted as a DNAFASTAFormat file.

Plugin error from deblur:

  InPath('/tmp/q2-DNAFASTAFormat-cornsvka') is not formatted as a
  DNAFASTAFormat file.

See above for debug info.

My input data is the following format:
Type: SampleData[SequencesWithQuality]
Data format: SingleLanePerSampleSingleEndFastqDirFm

The seqs were imported as EMPSingleEndSequences AFTER pairing with PEAR, then demuxed and quality filtered as described in the moving pictures tutorial.

Any thoughts on what might be causing this error?

Thanks,
Christian

@ChristianEdwardson: did you use q2-quality-filter for your quality filtering step? Would you be willing to share OWNew_paired_demux_filtered.qza with us? The provenance would be really useful for us to take a closer look. If you are okay with sharing, but don’t want to share publicly (e.g. uploading here), feel free to send to me in a direct message. Thanks!

@ChristianEdwardson: I was able to reproduce this same error on my development machine. Nothing is jumping out at me though: @wasade, can you please take a look at this when you get a chance? I would really appreciate the help!
Thanks!

Hi @ChristianEdwardson, I ran your data locally, and also was able to recreate the issue. It appears you’ve uncovered an unchecked edge case (:beers: for :bug:s)! The reason for the exception is that none of the data which made it through the Deblur pipeline recruited coarsely to the reference (i.e., they did not appear to be 16S). The specific reason the exception was thrown is because we were erroneously checking if the all.seqs.fa file was empty instead of the reference-hit.seqs.fa file. We’ve opened a new issue here to address this scenario.

That being said, this edge case was encountered due to rather aggressive parameter settings. From spot checking your data, it appears that a large portion of the sequences are around 250nt. But, a trim length of 275nt was specified at runtime (--p-trim-length 275). The end effect was that almost all of of your sequence data got filtered out as all reads shorter than 275nt were dropped. The few remaining reads did not appear to be 16S (spot checking one against nt shows a recruitment to 18S).

What I suggest is to dial back the trim length to 250nt. I can confirm this setting “works” but whether the data make sense is up to you :slight_smile:

Thank you for your patience on this, and for discovering a this bug.

Best,
Daniel

3 Likes

So, I think maybe I was misunderstanding the --p-trim-length parameter. Based on my reading of the Moving Pictures tutorial “–p-trim-length which truncates the sequences at position n” I thought that if I set that to 275, all reads are truncated to that length, and if they are shorter, well then they aren’t truncated (but now I’m understanding that if they are shorter they are dropped).

Thanks for your help. I am going re-try with 250, and also try to run the pipeline with the forward reads only and compare.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.