Deblur denoise-other error

Hi Everyone,

I am using deblur denoise-other (with the SILVA 132 DB as the reference) on 18S amplicon data. I have merged the paired-end reads using the VSearch End-Joining tool in QIIME.

This is the code that I used to run deblur:

qiime deblur denoise-other \
  --i-demultiplexed-seqs $INPUT \
  --i-reference-seqs $REF \
  --p-trim-length 300 \
  --p-sample-stats \
  --o-stats $STATS \
  --o-representative-sequences $REPSEQ \
  --o-table $REPTABLE \

Unfortunately, I get the following error:

Traceback (most recent call last):
  File "/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/skbio/io/registry.py", line 914, in wrapped_sniffer
    return sniffer(fh)
  File "/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 320, in _fastq_sniffer
    if split_length == 10 and description[1] in 'YN':
IndexError: list index out of range

  FormatIdentificationWarning)
/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/deblur/workflow.py:92: UserWarning: input file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_WLK_02_125_L001_R1_001.fastq.gz does not appear to be FASTA or FASTQ
  warnings.warn(msg, UserWarning)
/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/deblur/workflow.py:851: UserWarning: Problem removing artifacts from file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_WLK_02_125_L001_R1_001.fastq.gz
  seqs_fp, UserWarning)
/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/skbio/io/registry.py:922: FormatIdentificationWarning: '_fastq_sniffer' has encountered a problem.

My files are FASTQ Format, so I don’t know what is causing this error.
When I run it with the --verbose flag I get the following:

INFO(47056145847104)2018-12-18 15:36:11,096:launch_workflow for file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz
WARNING(47056145847104)2018-12-18 15:36:11,101:input file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz does not appear to be FASTA or FASTQ
INFO(47056145847104)2018-12-18 15:36:11,101:dereplicate seqs file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim
ERROR(47056145847104)2018-12-18 15:36:11,120:Problem running vsearch dereplication on file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim
INFO(47056145847104)2018-12-18 15:36:11,121:remove_artifacts_seqs file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim.derep
WARNING(47056145847104)2018-12-18 15:36:11,121:file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim.derep has size 0, continuing
WARNING(47056145847104)2018-12-18 15:36:11,121:remove artifacts failed, aborting
WARNING(47056145847104)2018-12-18 15:36:11,122:deblurring failed for file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz

Any help would be appreciated.
Thank you.

Could you please check out that particular file? Sounds like the culprit.

Please post some example sequences here — my guess is either this file is empty or contains some bad sequences, and/or in a funky format.

I’ve been able to cluster them into ASVs using DADA2.

Here are a few sequences:

$ zcat NPRB3_SF007_r2_111_L001_R1_001.fastq.gz | head -n 24
@M01533:383:000000000-B5KR6:1:1101:23033:1716:N:0:1/1
ATGCATGTATCAGCACAAGCCTAAAAATGGTGAAGCCGCGAATAGCTCATTACAACAGTCGTAGTTTATTAGAAAGTACTCTATGGATAACTGTGGTAATCCTAGAGCTAATACATGTTCCAATCCTCGACTCACGGAGGGGTGCATTTATTAGAACAAGGCCGATCAGACTTTGTCTGTCTCAGGTTGACTCTGAATAACTTTGCTAATCGCACAGTCTTTGTACTGGCGATGTATCTTTCAAATGTCTGCCTTATCAACTGTTGATGGTAGATTATGCGCCTACCATGGTTGTAACGGGTAACGGAGAATCAGGGTTTGATTCCGGAGAGGGAGCCTGAGAAACAGCTACCACA
+
GGGGGGGGGGGGFGGGGDEFGGGCGGGGGGGFGGGGGDEGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGFGGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ=JJGGGGGGGGGGGGGGGGGGGGGFAGGGFAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF?GGGGGGGGGGGGGGFFEFGGGGGGGGGGGGGGGGG
@M01533:383:000000000-B5KR6:1:1101:10705:2770:N:0:1/1
ATGCATGTCTAAGTACAAACTTTAACACAGTGAAACCGCGAATGGCTCATTAAATCAGTCAGGATTCCTTAGATCGTACTTTCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATGCACGCAAGCTCCGACCTTCGGGGAAGAGCGCTTTTATTAGATCAAAACCAATCGGTCCGCAAGGGCCGTCTCATTGGGGACTCTGGATAACTTTGGGCTGATCGCACGGACTAGCTCCGGCGACGTATCTTTCAAATGTCTGCCCTATCAACTTTCGTTGGTACGTGATATGCCTACCAAGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
GGGGGGGGGGGGFFGGGFFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGDGGCFGGGGDGGGGJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJ>JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?GDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFEE
@M01533:383:000000000-B5KR6:1:1101:17636:3112:N:0:1/1
ATGCATGTCTTAGTACAGACTATCTCACAGTGAAACTGCGAATGGCTCATTAAATCAGCTAAGGTTCCTTAGATCGTACAATCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATGCAACAAGCTCCGACCTCTCCTGGGAAGAGCGCTTTTATTAGATCAAAACCAATCGGTTCCCTCGGGTTCCGTCCTATTGGTGACTCTGGATAACTTTGTGCTGACCGCATGGCCACGAGCCGGCGACGTATCTTTCAAATGTCTGCCCTATCAACTTTCGATGGTACGTGATATGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJEJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M01533:383:000000000-B5KR6:1:1101:24522:3426:N:0:1/1
GTGCATGTCTAAGTACAAACTTTAACACAGTGAAACCGCGAATGGCTCATTAAATCAGTCAGGATTCCTTAGATCGTACTTTCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACACGCAGCCAAGCTCCGACCGCGAGGGACGAGCGCATTTATTAGAACAAGACCAATCGGGCCTCGGCCTGTGTTTGGTGGATCTGAATAACTCAGTCGATCGCGCGGTCTCGCACCGGCGACGTATCTTCCAAGTGTCTGCCTTATCAACTTTTGATGGTAGTTTACGCGACTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
GGGGGGGGGGGGGGGGGGGGGGFGGFGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG7FGFGGGGGGFGGGGGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ>HJJJGGFGEAF8GGGGGGGGGGGFFGF7GGEGGFC<DGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGF
@M01533:383:000000000-B5KR6:1:1101:22405:3802:N:0:1/1
ATGCATGTCTAAGTACAAACTTTAACACAGTGAAACCGCGAATGGCTCATTAAATCAGTCAGGATTCCTTAGATCGTACTTTCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATGCACGCAAGCTCCGACCTTCGGGGAAGAGCGCTTTTATTAGATCAAAACCAATCGGTCCGCAAGGGCCGTCTCATTGGTGACTCTGGATAACTTTGGGCTGATCGCACGGACTAGCTCCGGCCACGTATCTTTCAAATGTCTGCCCTATCAACTTTCGTTGGTACGTGATATGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
[email protected]GGGGGGGGGGGGGGGGGGGGGGGGGGFDGGGGGGGFGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?GGGGGGGGGGFGGEFBGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M01533:383:000000000-B5KR6:1:1101:8759:4836:N:0:1/1
ATGCATGTATCAGCACAAGCCTCAAAATGGTGAAGCCGCGAATAGCTCATTACAACAGTCGTAGTTTATTAGAAAGTATCTTCTGGATAACTGTGGTAATTCTAGAGCTAATACATGTTCTAAGCCCTGACTAACGGAAGGGTGCATTTATTAGAACAAAGCCAATCAGACTTCTGTCTGTCTCAGGTTGACTCTGAATAACTTTGCTAATCGCACAGTCTTTGCACTGGCGATGTATCTTTCAAATGTCTGCCTTATCAACTGTTGATGGTAGATTATGCGCCTACCATGGTTGTAACGGGTAACGGAGAATCAGGGTTTGATTCCGGAGAGGGAGCCTGAGAAACAGCTACCACA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGFGGGFGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGGGGGGGGGFFGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Very interesting — thank you for that point.

Indeed, that file does not look empty at all! Though that is what that error message implied.

So the plot thickens — pls give me some more time to mull this over. Maybe you could share a link to that sequence file? That will help us troubleshoot locally. Thanks!

@DeSantiago,
You could try running qiime tools validate on your demux seqs QZA to see if that can diagnose the issue.

Let us know what that reports.

Sorry for the late response.
Here is the dropbox link to the sequencing files.

Also, qiime tools validate returned the Artifact as Valid.

@DeSantiago,
One more test: could you try running in deblur directly? (not q2-deblur.) That will help diagnose whether this is an issue with deblur itself or if this is isolated to q2-deblur.
Thanks!

I finished running Deblur workflow and the log file didn’t contain any errors.

Hey there @DeSantiago - we have been looking into this — it hasn’t fallen off of our radar. Thanks!

1 Like