Deblur denoise-other error

deblur

#1

Hi Everyone,

I am using deblur denoise-other (with the SILVA 132 DB as the reference) on 18S amplicon data. I have merged the paired-end reads using the VSearch End-Joining tool in QIIME.

This is the code that I used to run deblur:

qiime deblur denoise-other \
  --i-demultiplexed-seqs $INPUT \
  --i-reference-seqs $REF \
  --p-trim-length 300 \
  --p-sample-stats \
  --o-stats $STATS \
  --o-representative-sequences $REPSEQ \
  --o-table $REPTABLE \

Unfortunately, I get the following error:

Traceback (most recent call last):
  File "/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/skbio/io/registry.py", line 914, in wrapped_sniffer
    return sniffer(fh)
  File "/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 320, in _fastq_sniffer
    if split_length == 10 and description[1] in 'YN':
IndexError: list index out of range

  FormatIdentificationWarning)
/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/deblur/workflow.py:92: UserWarning: input file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_WLK_02_125_L001_R1_001.fastq.gz does not appear to be FASTA or FASTQ
  warnings.warn(msg, UserWarning)
/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/deblur/workflow.py:851: UserWarning: Problem removing artifacts from file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_WLK_02_125_L001_R1_001.fastq.gz
  seqs_fp, UserWarning)
/bigdata/biklab/shared/pkgs/conda/qiime2-2018.8/lib/python3.5/site-packages/skbio/io/registry.py:922: FormatIdentificationWarning: '_fastq_sniffer' has encountered a problem.

My files are FASTQ Format, so I don’t know what is causing this error.
When I run it with the --verbose flag I get the following:

INFO(47056145847104)2018-12-18 15:36:11,096:launch_workflow for file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz
WARNING(47056145847104)2018-12-18 15:36:11,101:input file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz does not appear to be FASTA or FASTQ
INFO(47056145847104)2018-12-18 15:36:11,101:dereplicate seqs file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim
ERROR(47056145847104)2018-12-18 15:36:11,120:Problem running vsearch dereplication on file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim
INFO(47056145847104)2018-12-18 15:36:11,121:remove_artifacts_seqs file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim.derep
WARNING(47056145847104)2018-12-18 15:36:11,121:file /tmp/tmpx5iaokx6/deblur_working_dir/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz.trim.derep has size 0, continuing
WARNING(47056145847104)2018-12-18 15:36:11,121:remove artifacts failed, aborting
WARNING(47056145847104)2018-12-18 15:36:11,122:deblurring failed for file /tmp/qiime2-archive-_acuosa6/f851d4b9-c777-4e87-b7b6-e527f2b61f0e/data/NPRB3_SF007_r2_111_L001_R1_001.fastq.gz

Any help would be appreciated.
Thank you.


(Nicholas Bokulich) #2

Could you please check out that particular file? Sounds like the culprit.

Please post some example sequences here — my guess is either this file is empty or contains some bad sequences, and/or in a funky format.


#4

I’ve been able to cluster them into ASVs using DADA2.

Here are a few sequences:

$ zcat NPRB3_SF007_r2_111_L001_R1_001.fastq.gz | head -n 24
@M01533:383:000000000-B5KR6:1:1101:23033:1716:N:0:1/1
ATGCATGTATCAGCACAAGCCTAAAAATGGTGAAGCCGCGAATAGCTCATTACAACAGTCGTAGTTTATTAGAAAGTACTCTATGGATAACTGTGGTAATCCTAGAGCTAATACATGTTCCAATCCTCGACTCACGGAGGGGTGCATTTATTAGAACAAGGCCGATCAGACTTTGTCTGTCTCAGGTTGACTCTGAATAACTTTGCTAATCGCACAGTCTTTGTACTGGCGATGTATCTTTCAAATGTCTGCCTTATCAACTGTTGATGGTAGATTATGCGCCTACCATGGTTGTAACGGGTAACGGAGAATCAGGGTTTGATTCCGGAGAGGGAGCCTGAGAAACAGCTACCACA
+
GGGGGGGGGGGGFGGGGDEFGGGCGGGGGGGFGGGGGDEGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGFGGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ=JJGGGGGGGGGGGGGGGGGGGGGFAGGGFAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF?GGGGGGGGGGGGGGFFEFGGGGGGGGGGGGGGGGG
@M01533:383:000000000-B5KR6:1:1101:10705:2770:N:0:1/1
ATGCATGTCTAAGTACAAACTTTAACACAGTGAAACCGCGAATGGCTCATTAAATCAGTCAGGATTCCTTAGATCGTACTTTCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATGCACGCAAGCTCCGACCTTCGGGGAAGAGCGCTTTTATTAGATCAAAACCAATCGGTCCGCAAGGGCCGTCTCATTGGGGACTCTGGATAACTTTGGGCTGATCGCACGGACTAGCTCCGGCGACGTATCTTTCAAATGTCTGCCCTATCAACTTTCGTTGGTACGTGATATGCCTACCAAGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
GGGGGGGGGGGGFFGGGFFGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGDGGCFGGGGDGGGGJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJ>JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?GDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFEE
@M01533:383:000000000-B5KR6:1:1101:17636:3112:N:0:1/1
ATGCATGTCTTAGTACAGACTATCTCACAGTGAAACTGCGAATGGCTCATTAAATCAGCTAAGGTTCCTTAGATCGTACAATCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATGCAACAAGCTCCGACCTCTCCTGGGAAGAGCGCTTTTATTAGATCAAAACCAATCGGTTCCCTCGGGTTCCGTCCTATTGGTGACTCTGGATAACTTTGTGCTGACCGCATGGCCACGAGCCGGCGACGTATCTTTCAAATGTCTGCCCTATCAACTTTCGATGGTACGTGATATGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJEJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M01533:383:000000000-B5KR6:1:1101:24522:3426:N:0:1/1
GTGCATGTCTAAGTACAAACTTTAACACAGTGAAACCGCGAATGGCTCATTAAATCAGTCAGGATTCCTTAGATCGTACTTTCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACACGCAGCCAAGCTCCGACCGCGAGGGACGAGCGCATTTATTAGAACAAGACCAATCGGGCCTCGGCCTGTGTTTGGTGGATCTGAATAACTCAGTCGATCGCGCGGTCTCGCACCGGCGACGTATCTTCCAAGTGTCTGCCTTATCAACTTTTGATGGTAGTTTACGCGACTACCATGGTTGTAACGGGTAACGGAGAATAAGGGTTCGACTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
GGGGGGGGGGGGGGGGGGGGGGFGGFGGGGGGGGGGGGGGFFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG7FGFGGGGGGFGGGGGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ>HJJJGGFGEAF8GGGGGGGGGGGFFGF7GGEGGFC<DGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGF
@M01533:383:000000000-B5KR6:1:1101:22405:3802:N:0:1/1
ATGCATGTCTAAGTACAAACTTTAACACAGTGAAACCGCGAATGGCTCATTAAATCAGTCAGGATTCCTTAGATCGTACTTTCCTACTTGGATAACTGTAGTAATTCTAGAGCTAATACATGCACGCAAGCTCCGACCTTCGGGGAAGAGCGCTTTTATTAGATCAAAACCAATCGGTCCGCAAGGGCCGTCTCATTGGTGACTCTGGATAACTTTGGGCTGATCGCACGGACTAGCTCCGGCCACGTATCTTTCAAATGTCTGCCCTATCAACTTTCGTTGGTACGTGATATGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACA
+
[email protected]GGGGGGGGGGGGGGGGGGGGGGGGGGFDGGGGGGGFGGGGGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?GGGGGGGGGGFGGEFBGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M01533:383:000000000-B5KR6:1:1101:8759:4836:N:0:1/1
ATGCATGTATCAGCACAAGCCTCAAAATGGTGAAGCCGCGAATAGCTCATTACAACAGTCGTAGTTTATTAGAAAGTATCTTCTGGATAACTGTGGTAATTCTAGAGCTAATACATGTTCTAAGCCCTGACTAACGGAAGGGTGCATTTATTAGAACAAAGCCAATCAGACTTCTGTCTGTCTCAGGTTGACTCTGAATAACTTTGCTAATCGCACAGTCTTTGCACTGGCGATGTATCTTTCAAATGTCTGCCTTATCAACTGTTGATGGTAGATTATGCGCCTACCATGGTTGTAACGGGTAACGGAGAATCAGGGTTTGATTCCGGAGAGGGAGCCTGAGAAACAGCTACCACA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGFGGGFGGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGGGGGGGGGFFGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

(Nicholas Bokulich) #5

Very interesting — thank you for that point.

Indeed, that file does not look empty at all! Though that is what that error message implied.

So the plot thickens — pls give me some more time to mull this over. Maybe you could share a link to that sequence file? That will help us troubleshoot locally. Thanks!


(Nicholas Bokulich) #9

@DeSantiago,
You could try running qiime tools validate on your demux seqs QZA to see if that can diagnose the issue.

Let us know what that reports.


#10

Sorry for the late response.
Here is the dropbox link to the sequencing files.

Also, qiime tools validate returned the Artifact as Valid.


(Nicholas Bokulich) #11

@DeSantiago,
One more test: could you try running in deblur directly? (not q2-deblur.) That will help diagnose whether this is an issue with deblur itself or if this is isolated to q2-deblur.
Thanks!