Plugin errors when running deblur and dada2 on large single-end 16S dataset

ebolyen · October 11, 2018, 10:58pm

Just wanted to let you know that we can reproduce the issue. The parameters seem fine, so at this point its looking like either the denoising is failing in both, or something about these particular reads make them look chimeric to both DADA2 and Deblur. Still looking into it though! Stay tuned.

Byron_C_Crump · October 14, 2018, 8:34pm

Great! I hope you are able to figure this out.

Byron_C_Crump · October 15, 2018, 1:47pm

Quick note - I tried eliminating the underscores from the sample names in the manifest file, but that did not change anything.

ebolyen · October 19, 2018, 8:01pm

OK! I think I have worked out what's going on with this particular example:

There is an off-by-one error in q2-dada2 where it thinks a feature-table of a single feature is empty when it is sanity checking the table. That's the error we are seeing. DADA2 runs fine, however it is picking only one feature.
Doing a quick and dirty semi-global alignment of the input yields:

>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_0
TACGTAGGACCCGAGCGTTGTCCGGATTTACTGGGTATAAAGGGTGCGTGGGCGGCCTTGTGCGTCAGAGGTGAAATATCCGGGCTTAACCCGGAGGGTGCCTTTGATACGGCAGGGCTTGAGTGCGAGACGGGAT-GATGG----------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_1
GACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGCCTATCAAGTCAGGTGTGAAAGCCCCGAGCTTAACTCGGGAACTGCATTTGATACTGTTGGGCTTGAGACCGGGAGAGGATAG--------------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_2
TACGTAGGACCCGAGCGTTGTCCGGATTTACTGGGTATAAAGGGTGCGTAGGCGGCCTTGTGCGTCAGAGGTGAAATATCCGGGCTTAACCCGGAGGGTGCCTTTGATACGGCAGGGCTTGAGTGCGAGAGAGGAT-GATGG----------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_3
TACGAAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTTGTAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCGTTCGAAACTGCAAGGCTAGAGTGTGTCAGAGGGAGGTAG-----------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_4
TACATAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGTGGTTCGTCACGTCGGATGTGAAAATCTGGGGCTTAACCCCAGACCTGCATTCGATACGGGCGAGCTAGAGTGTGGTAGGGGAGACTGG-----------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_5
TACGGAGGGTGCAAGCGTTATCCGGAATCATTGGGTTTAAAGGGTCCGCAGGCGGTGCTATAAGTCAGTGGTGAAATCTCATAGCTTAACTATGAAACTGCCATTGATACTGTAGCACTTGAATTCGG------------------------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_6
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATACAAGACAGGCGTGAAATCCCCGGGCTCAACCTGGGAATTGCGTCTGTGACTGTATAGCTAGAGTGTGTC-----------------------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_7
---------------------------------------------------------------------------------------------------------------------------------------------GACAGAGGATGCAAGTGTTATCCGGAATTATTGGGCGTAAAGGGTCTGTCGGTTGTTTGATTAGTCATTTATAAAATATTGAGGCTTAACTTCAAAGAAGTATCTGAAACTCTTAAACTTGAGAG
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_8
GACGAAGGATCCAAGCGTTGTCCGGATTTACTGGGTTTAAAGGGTGCGTAGGCGGAAAATTAAGTCAGTGGTGAAAGCCCGCAGCTCAACTGTGGAACTGCCATTGAAACTGGTTTTCTTGAATATAGCTGAG-------------------------------------------------------------------------------------------------------------------------------------
>1883.2006.194.Crump.Artic.LTREB.main.lane1.NoIndex_9
GACGAAGGATCCAAGCGTTGTCCGGATTTACTGGGTTTAAAGGGTGCGTAGGCGGAAAATTAAGTCAGTGGTGAAAGCCCGCAGCTCAACTGTGGAACTGCCATTGAAACTGGTTTTCTTGAATATAGCTGAGGCAGATGG-----------------------------------------------------------------------------------------------------------------------------

which makes me think DADA2 is making a sensible choice in calling these all a single ASV. That being said, it has basically no oppurtunity to train an error model, so we're kind of in a garbage-in garbage-out situation.

I haven't looked into what is going on with deblur, but it may be the case that all reads are filtered.

Given all of that, this test dataset probably isn't useful for understanding what your original error was, so we might go back to the drawing board here.

Where are you currently in terms of using the EMP and what are you looking to do (in the event your analysis has shifted at all).

Byron_C_Crump · October 19, 2018, 8:43pm

Unfortunately, I don't understand a lot of what you wrote here ("off-by-one error", "feature-table of a single feature-table", "ASV", etc.).

That said, should I post a slightly larger dataset that contains a small number of sequences by more than one sample so you can dig into that?

ebolyen · October 19, 2018, 9:23pm

Hi @Byron_C_Crump,

Yes, the issue here is that there is only a single OTU in a single sample, which by accident, makes QIIME 2 think the OTU table is empty. If there were two OTUs there would not have been an issue and we can start looking into the source of the original problem.

I think two samples from very different environments in the EMP should be enough to guarantee two OTUs are found.

Byron_C_Crump · October 22, 2018, 3:49pm

Here is a file containing 5 samples, 20 sequences each. Deblur did not work on this dataset, but dada2 appears to have worked.
single-end-demux_qiime2_forum.qza (17.4 KB)

I ran this command line
qiime deblur denoise-16S --i-demultiplexed-seqs single-end-demux_qiime2_forum.qza --p-trim-length 120 --p-sample-stats --verbose --o-representative-sequences deblur-rep-seqs_qiime2_forum.qza --o-table deblur-table_qiime2_forum.qza --o-stats deblur-stats_qiime2_forum.qza

Here is the deblur.log
deblur.log.txt (31.5 KB)

wasade · October 22, 2018, 6:30pm

Hi @Byron_C_Crump,

I was able to run those data through after adjusting a few parameters. What I suspect is that there just isn't enough data for the default parameter settings. Specifically, I set --p-min-size to 1 and --p-min-reads to 1. The reason I did this was, since each sample only had 20 sequences, the likelihood of a cluster size greater than 1 was pretty low (--p-min-size), and second, the likelihood of a "real" sequence having at least 10 reads across all samples (--p-min-reads) was quite low since there are only 100 sequences in total in these files.

I do not advise altering the --p-min-size parameter for real data. You may want to set --p-min-reads to 1 when performing meta-analysis as you can filter low abundant features later.

Can you try running all of the sequences in each of those samples through with the default parameter settings? I suspect it'll work just fine.

All the best,
Daniel

Byron_C_Crump · October 23, 2018, 12:58pm

Hi Daniel,
I set --p-min-reads to 1 and started the deblur run again with a full dataset, and will send results as soon as it ends. I noticed that for the first time it is returning warning messages without dying. They are many messages like this one:

/Applications/miniconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/deblur/workflow.py:851: UserWarning: Problem removing artifacts from file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/qiime2-archive-usb8k0z7/3bf2982b-4dee-4cc6-9c50-5723510b2e57/data/ltreb.2015.098.288_1246_L001_R1_001.fastq.gz
seqs_fp, UserWarning)

Here is one of the files that was the topic of this warning:
ltreb.2015.098.288_1246_L001_R1_001.fastq.gz (7.5 KB)

Byron_C_Crump · October 23, 2018, 1:01pm

Hi Daniel,
Here is another question. I ran dada2 on this same dataset and received an error. Here is the Debug Info file. I cannot figure out the problem. Is it similar to the potential problem you identified in my deblur run?

Command line:
qiime dada2 denoise-single --i-demultiplexed-seqs single-end-demux.qza --p-trim-left 0 --p-trunc-len 120 --p-n-threads 0 --p-chimera-method consensus --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2_cgrb.qza

Debug info:
qiime2-q2cli-err-amvvbw.log.txt (25.8 KB)

wasade · October 23, 2018, 4:09pm

Hi Byron,

To be clear, I do not advise using --p-min-reads=1 on the full dataset. Instead, I recommend the default parameters on the full dataset.

I'd need to see the log file entries for the relevant file to understand the source of the UserWarning.

Best,
Daniel

wasade · October 23, 2018, 4:13pm

Hi Byron,

I'm not very familiar with the DADA2 runtime. However, the error output in the log file states "Invalid derep$quals matrix. Quality values must be positive integers." This suggests one of the input files either has unusual PHRED scores, the PHRED offset being used is incorrect, or perhaps some other underlying issue.

Best,
Daniel

Byron_C_Crump · October 23, 2018, 4:24pm

OK, but the deblur defaults do not work on the larger dataset, so perhaps running with --p-min-reads=1 will help identify the problem. The deblur log is too big to upload so I copy/pasted two entries below. The first is for one of the files that produced a UserWarning. The second is one that did not produce a UserWarning

It sounds like there might be something wrong with the PHRED scores in the EMP dataset, based on the dada2 errors. Do you know of a tool to test for unusual PHRED scores?

INFO(140736167158656)2018-10-23 09:10:23,917:launch_workflow for file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/qiime2-archive-usb8k0z7/3bf2982b-4dee-4cc6-9c50-5723510b2e57/data/ltreb.blankplate4.89_1998_L001_R1_001.fastq.gz
INFO(140736167158656)2018-10-23 09:10:23,927:dereplicate seqs file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate4.89_1998_L001_R1_001.fastq.gz.trim
INFO(140736167158656)2018-10-23 09:10:23,939:remove_artifacts_seqs file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate4.89_1998_L001_R1_001.fastq.gz.trim.derep
WARNING(140736167158656)2018-10-23 09:10:23,939:file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate4.89_1998_L001_R1_001.fastq.gz.trim.derep has size 0, continuing
WARNING(140736167158656)2018-10-23 09:10:23,939:remove artifacts failed, aborting
WARNING(140736167158656)2018-10-23 09:10:23,940:deblurring failed for file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/qiime2-archive-usb8k0z7/3bf2982b-4dee-4cc6-9c50-5723510b2e57/data/ltreb.blankplate4.89_1998_L001_R1_001.fastq.gz

INFO(140736167158656)2018-10-23 09:09:47,282:--------------------------------------------------------
INFO(140736167158656)2018-10-23 09:09:47,282:launch_workflow for file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/qiime2-archive-usb8k0z7/3bf2982b-4dee-4cc6-9c50-5723510b2e57/data/ltreb.blankplate1.359_1985_L001_R1_001.fastq.gz
INFO(140736167158656)2018-10-23 09:09:47,429:dereplicate seqs file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate1.359_1985_L001_R1_001.fastq.gz.trim
INFO(140736167158656)2018-10-23 09:09:47,442:remove_artifacts_seqs file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate1.359_1985_L001_R1_001.fastq.gz.trim.derep
INFO(140736167158656)2018-10-23 09:09:48,813:total sequences 11, passing sequences 11, failing sequences 0
INFO(140736167158656)2018-10-23 09:09:48,813:multiple_sequence_alignment seqs file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate1.359_1985_L001_R1_001.fastq.gz.trim.derep.no_artifacts
INFO(140736167158656)2018-10-23 09:09:49,038:deblurring 11 sequences
INFO(140736167158656)2018-10-23 09:09:49,039:3 unique sequences left following deblurring
INFO(140736167158656)2018-10-23 09:09:49,039:remove_chimeras_denovo_from_seqs seqs file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir/ltreb.blankplate1.359_1985_L001_R1_001.fastq.gz.trim.derep.no_artifacts.msa.deblurto working dir /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/tmpbrkf056b/deblur_working_dir
INFO(140736167158656)2018-10-23 09:09:49,072:finished processing file