Deblur denoise-other error

AhHua · August 30, 2018, 1:20pm

Hi everyone,

I am running denoise-other via Qiime2 for the ITs amplicon denoising. Below is my code.
qiime deblur denoise-other
--i-demultiplexed-seqs ./Deblur/Single-end-demux-filtered-Deblur.qza
--i-reference-seqs sh_refs_qiime_ver7_99_s_01.12.2017_dev.qza
--p-trim-length 100
--o-representative-sequences ./Deblur/Single-end-rep-seqs-Deblur.qza
--o-table ./Deblur/Single-end-table-Deblur.qza
--p-sample-stats
--o-stats ./Deblur/Single-end-Deblur-stats.qza
--p-jobs-to-start 16
--output-dir ./Deblur/denoise-other-Output
--verbose

So, I use the UNITE database sh_refs_qiime_ver7_99_s_01.12.2017_dev.qza for positive filtering. This reference worked well for DADA2 for taxonomy alignment, so I assume this was not caused by the ref input.

Below is the error code.
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in denoise_other
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
output_types, provenance)
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in callable_executor
output_views = self._callable(**view_args)
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 124, in denoise_other
hashed_feature_ids=hashed_feature_ids)
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 189, in _denoise_helper
stats = _gather_stats(demultiplexed_seqs, tmp)
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 220, in _gather_stats
'trim.derep')
File "/hwfssz1/ST_META/EE/jiayangyang/bin/Miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 289, in _fasta_counts
counts += int(size.split('=')[1])
ValueError: invalid literal for int() with base 10: '0.9914'

Plugin error from deblur:

invalid literal for int() with base 10: '0.9914'

Below is the tail of the log file,
INFO(139760130447104)2018-08-30 17:51:33,829:total sequences 341, passing sequences 284, failing sequences 57
WARNING(139760130447104)2018-08-30 17:51:33,865:removed 48 samples with reads per sample<1
INFO(139760130447104)2018-08-30 17:51:33,882:wrote artifact only filtered biom table to /tmp/tmp5hgfrhnt/reference-non-hit.biom
INFO(139760130447104)2018-08-30 17:51:33,883:saved biom table sequences to fasta file /tmp/tmp5hgfrhnt/reference-non-hit.seqs.fa
INFO(139760130447104)2018-08-30 17:51:33,905:wrote 16s filtered biom table to /tmp/tmp5hgfrhnt/reference-hit.biom
INFO(139760130447104)2018-08-30 17:51:33,907:saved biom table sequences to fasta file /tmp/tmp5hgfrhnt/reference-hit.seqs.fa
INFO(139760130447104)2018-08-30 17:51:33,907:Keeping temp files
INFO(139760130447104)2018-08-30 17:51:33,907:deblur workflow finished
INFO(139760130447104)2018-08-30 17:51:33,907:output saved to /tmp/tmp5hgfrhnt/all.biom
INFO(139760130447104)2018-08-30 17:51:33,907:------------------

Anyone know what is going on here? What I am missing?

Thanks~
YY

wasade · August 30, 2018, 6:10pm

Hi @AhHua,

Would it be possible to remove the –p-sample-stats argument and rerun?

My guess here is that the structure of the comments in the fasta/fastq record are unexpected. If the above command works, then that will help you proceed on your analysis and we can in parallel try to figure out why the identifier structure is presenting a problem.

Best,
Daniel

AhHua · August 31, 2018, 2:28pm

Hi @wasade,

Thank you very much for your helpful reply.
Yes, it works to remove the --p-sample-stats argument!
So any other information should I provide here so that we can fix the identifier structure issue?

Cheers
YY

wasade · August 31, 2018, 4:34pm

That's great!

Would it be possible to either share the Single-end-demux-filtered-Deblur.qza file, or to share the first few sequences of a sequence file contained within? To get one of the sequence files, it should be possible to perform a qiime tools export on the QZA. An example of doing this is below using demux.qza from the Moving Pictures tutorial. What would be helpful is to get those first few lines from the sequence file. My guess is the sequence identifiers have semicolons in them which might be a problem for the parsing expectations of the stats collector.

# first we'll export the qza
(qiime2-2018.6) $ qiime tools export demux.qza --output-dir example

# next, we'll take a look at just the first few files in the output directory
(qiime2-2018.6) $ ls example/ | head
total 40320
-rw-r--r--  1 dtmcdonald  staff   824K Aug 31 09:22 L1S105_9_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   659K Aug 31 09:22 L1S140_6_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   842K Aug 31 09:22 L1S208_10_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   619K Aug 31 09:22 L1S257_11_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   670K Aug 31 09:22 L1S281_5_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   875K Aug 31 09:22 L1S57_13_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   749K Aug 31 09:22 L1S76_12_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   867K Aug 31 09:22 L1S8_8_L001_R1_001.fastq.gz
-rw-r--r--  1 dtmcdonald  staff   752K Aug 31 09:22 L2S155_25_L001_R1_001.fastq.gz

# the sequence files in this case are compressed, so let's just take one of them and decompress it
(qiime2-2018.6) $ gunzip example/L1S105_9_L001_R1_001.fastq.gz

# and now let's examine the first few lines of the file we decompressed
(qiime2-2018.6) $ head example/L1S105_9_L001_R1_001.fastq
@HWI-EAS440_0386:1:25:4646:1592#0/1
AACGTAGGTCACAAGCGTTGTCCGGAATTACTGGGTGTAAAGGGAGCGCAGGCGGGAAGACAAGTTGGAAGTGAAATCTATGGGCTCAACCCATAAACTGCTTTCAAAACTGTTTTTCTTGAGTAGTGCAGAGGTAGGCGGAATTCTCGGGG
+
GGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHBHHECCHBDHFEHAE@@BDB=B@@A2D(5553DAAAAE@EEA>;)9;==A>7B@=@78;:82?C<><9?<>>>98=@########################
@HWI-EAS440_0386:1:25:3806:2202#0/1
AACGTAGGTCACAAGCGTTGTCCGGAATTACTGGTGTAAAGGGAGCGCAGGCGGGAGAACAAGTTGGAAGTGAAATCCATGGGCTCAACCCATGAACTGCTTTCAAAACTGTTTTTCTTGAGTAGTGCAGAGGTAGGCGGAATCCCCGGTGG
+
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGBGGGEGGGG@GGDGHHFGHHHHHHGGFEGBGGG3CE>EAABGDBFFEDFE>E>DCB@EFFBC>EDBDD+CAA>>9?=99DCC?ACABEBBEB<BBD@B@@@B################
@HWI-EAS440_0386:1:25:9735:2401#0/1
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGACGCTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGGTGTCTTGAGTACAGTAGAGGCAGGGGGGGGTGTGTGGG

AhHua · September 3, 2018, 6:43pm

Hi Daniel,

Yes, you are correct. There are semicolons in the sequence identifiers...
Please see below image.

Actually, the input I used here is output from the previous Vsearch quality control step. This is because we are using BGI-seq instrument generated data, which is a little bit different from that of illumina data in format.
My bad...

Thank you so much for the help and replies!

YY

wasade · September 4, 2018, 4:51pm

Ah, that would certainly do it! Thank you for the follow up to verify the problem! A fix for this should be pretty straight forward. I'll open an issue with q2-deblur so we can get it corrected in a future release.

Best,
Daniel

wasade · September 4, 2018, 5:04pm

@AhHua, just to circle back around, please find the associated Github issue here.

AhHua · September 5, 2018, 3:18am

@ wasade,
Ha, thanks a lot for doing this!
Cheers,

system · October 6, 2018, 9:18am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.