Plugin error from deblur: list index out of range

deblur
(UGG) #1

I am trying to analyze bacterial 16S rRNA data (SRA accession no. SRP095022) obtained by amplicon sequencing with Bakt_341-F and Bakt_805-R,using an Illumina MiSeq platform (2x300 reads). I have merged the paired end reads with Pandaseq and obtained reads around 450 bp (this merging step was validated since I obtained the same result reported in the related paper). I imported these merged reads into QIIME2 (v.2019.4) with the code below:

qiime tools import
–input-path manifest-Nunez2017.csv
–output-path …/OUTPUTS/demux-joined.qza
–type ‘SampleData[JoinedSequencesWithQuality]’
–input-format SingleEndFastqManifestPhred33

I am also attaching manifest file: manifest-Nunez2017.csv (668 Bytes)

Later I used deblur for denoising process. The code is:

qiime deblur denoise-16S
–i-demultiplexed-seqs demux-joined.qza
–p-trim-length 400
–o-representative-sequences rep-seqs-deblur.qza
–p-sample-stats \
–o-stats deblur-stats.qza

However I get this error: Plugin error from deblur: list index out of range
Here is the log file:

Traceback (most recent call last):
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py”, line 311, in call
results = action(**arguments)
File “</home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-441>”, line 2, in denoise_16S
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 365, in callable_executor
output_views = self._callable(**view_args)
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_deblur/_denoise.py”, line 99, in denoise_16S
hashed_feature_ids=hashed_feature_ids)
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_deblur/_denoise.py”, line 196, in _denoise_helper
stats = _gather_stats(demultiplexed_seqs, tmp)
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_deblur/_denoise.py”, line 227, in _gather_stats
‘trim.derep’)
File “/home/ugg/miniconda3/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_deblur/_denoise.py”, line 296, in _fasta_counts
counts += int(size.split(’=’)[1])
IndexError: list index out of range

I have tried couple of things and seacrhed a lot, but could not understand the problem, if the problem arises because of the importing process etc.

I will be really happy if anyone helps me solve this issue…
Thanks…

2 Likes
(Matthew Ryan Dillon) assigned wasade #2
(Daniel McDonald) #3

Dear @UGG,

This is an unusual error, thank you for reaching out. Would it be possible to share the demux-joined.qza file by chance?

And if you have a moment, would it be possible to run the command with --p-no-sample-stats to see if it completes?

Thanks,
Daniel

1 Like
(Matthew Ryan Dillon) unassigned wasade #4
(UGG) #5

Thank you very much, putting --p-no-sample-stats flag works…
Since the file is very big, I am sharing qzv file if u want to look at : demux-joined.qzv (290.9 KB)

1 Like
(Nicholas Bokulich) assigned wasade #6
(Daniel McDonald) #7

Thanks @UGG! Based on the execution path and traceback, what I’m wondering about right now is whether one of the samples ended up without any sequence data as a result of the trimming. When you have a moment, would it be possible to also send on the feature-table summary (e.g., qiime feature-table summarize) from the successful run when --p-no-sample-stats was used?

(Nicholas Bokulich) unassigned wasade #8
(UGG) #9

Here is the qzv file: table-deblur.qzv (631.1 KB)
Thanks again… :slight_smile:

(Nicholas Bokulich) assigned wasade #10
(Daniel McDonald) #11

My suspicion was incorrect! I’m not sure what’s happening, this is very weird. Is there any possibility of sharing the .qza with the input sequence data?

(Matthew Ryan Dillon) unassigned wasade #12
(UGG) #13

I have send them via email as a drive link since they are huge files… In the quality trimming stage, I have used --p-min-quality threshold as 2, otherwise most of the reads were trimmed away…

(Matthew Ryan Dillon) assigned thermokarst #14
(Matthew Ryan Dillon) unassigned thermokarst #15
(Matthew Ryan Dillon) assigned wasade #16
(Daniel McDonald) #17

Thank you, @UGG!

I took a look at the files filtered sequence data, and ran one of the samples directly through vsearch using the dereplication command from Deblur. For some reason, these input files produce slightly different output files from vsearch violating the expectations q2-deblur makes when summarizing the data. Specifically:

$ vsearch --derep_fulllength A1_0_L001_R1_001.fastq.gz --output footest --sizeout --fasta_width 0 --minuniquesize 2 --threads 4
vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 12 cores
https://github.com/torognes/vsearch

Reading file A1_0_L001_R1_001.fastq.gz 100%
44442854 nt in 108099 seqs, min 400, max 478, avg 411
Dereplicating 100%
Sorting 100%
43736 unique sequences, avg cluster 2.5, median 1, max 4385
Writing output file 100%
7522 uniques written, 36214 clusters discarded (82.8%)
$ head -n 4 footest
>SRR5113893:::91:0:0:0:;0.928841;size=4385
TGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATGCCGCGTGAAGGAAGAAGGCTCACGGGTCGTAAACTTCTTTTCTCGGAGAAGAATAAATGACGGTATCTGAGGAATAAGCATCGGCTAACTCTGTGCCAGCAGCCGCGGTAAAACAGAGGATGCAAGCGTTATCCGGAATTATTGGGCGTAAAGTGTCTGTAGGTGGCTTTTCAAGTCCGTCGTCAAATCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTATCAAGCTAGAGTACGGTAGAGGCAGAGGGAATTTCCGGTGGAGCGGTGAAATGCGTTGAGATCGGGAGGAACACCAAGGGCGAAAGCACTCTGCTGGGCCGTTACTGACACTCAGAGACGAAAGCTAGGGGAGCAAATG
>SRR5113893:::84:0:0:0:;0.941986;size=1211
TGGGGAATATTGCACAATGGGCGGAAGCCTGATGCAGCGACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGTAGGGAAGAAGCGAGAGTGACGGTACCTGCAGAAGAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAAGCCCGGGGCTCAACCCCGGGTCTGCAGTGGGTACGGGCAGACTAGAGTGCAGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGCTGTTACTGACGCTGAGGAGCGAAAGCATGGGGAGCGAACA

Normally, the record headers for the output files end with a semicolon (e.g., size=1211; vs size=1211) as seen in the Deblur tests. Similarly, we can see in q2-deblur that the method for gathering stats from the dereplicated output also assumes a trailing semicolon. The reason this expectation isn’t a problem for Deblur itself is that a regular expression is used to extract the size information. Conversely, q2-deblur is splitting the record identifier on ; which doesn’t work in this situation.

On further examination this appears this issue has already been noted, but hasn’t yet been fixed. I’ve expanded the existing issue noting this posting.

In the mean time, I advise processing without --p-sample-stats.

All the best,
Daniel

2 Likes
(Matthew Ryan Dillon) unassigned wasade #18