Just wanted to let you know that we can reproduce the issue. The parameters seem fine, so at this point its looking like either the denoising is failing in both, or something about these particular reads make them look chimeric to both DADA2 and Deblur. Still looking into it though! Stay tuned.
OK! I think I have worked out what's going on with this particular example:
There is an off-by-one error in q2-dada2 where it thinks a feature-table of a single feature is empty when it is sanity checking the table. That's the error we are seeing. DADA2 runs fine, however it is picking only one feature.
Doing a quick and dirty semi-global alignment of the input yields:
which makes me think DADA2 is making a sensible choice in calling these all a single ASV. That being said, it has basically no oppurtunity to train an error model, so we're kind of in a garbage-in garbage-out situation.
I haven't looked into what is going on with deblur, but it may be the case that all reads are filtered.
Given all of that, this test dataset probably isn't useful for understanding what your original error was, so we might go back to the drawing board here.
Where are you currently in terms of using the EMP and what are you looking to do (in the event your analysis has shifted at all).
Yes, the issue here is that there is only a single OTU in a single sample, which by accident, makes QIIME 2 think the OTU table is empty. If there were two OTUs there would not have been an issue and we can start looking into the source of the original problem.
I think two samples from very different environments in the EMP should be enough to guarantee two OTUs are found.
Here is a file containing 5 samples, 20 sequences each. Deblur did not work on this dataset, but dada2 appears to have worked. single-end-demux_qiime2_forum.qza (17.4 KB)
I ran this command line
qiime deblur denoise-16S --i-demultiplexed-seqs single-end-demux_qiime2_forum.qza --p-trim-length 120 --p-sample-stats --verbose --o-representative-sequences deblur-rep-seqs_qiime2_forum.qza --o-table deblur-table_qiime2_forum.qza --o-stats deblur-stats_qiime2_forum.qza
I was able to run those data through after adjusting a few parameters. What I suspect is that there just isn't enough data for the default parameter settings. Specifically, I set --p-min-size to 1 and --p-min-reads to 1. The reason I did this was, since each sample only had 20 sequences, the likelihood of a cluster size greater than 1 was pretty low (--p-min-size), and second, the likelihood of a "real" sequence having at least 10 reads across all samples (--p-min-reads) was quite low since there are only 100 sequences in total in these files.
I do not advise altering the --p-min-size parameter for real data. You may want to set --p-min-reads to 1 when performing meta-analysis as you can filter low abundant features later.
Can you try running all of the sequences in each of those samples through with the default parameter settings? I suspect it'll work just fine.
Hi Daniel,
I set --p-min-reads to 1 and started the deblur run again with a full dataset, and will send results as soon as it ends. I noticed that for the first time it is returning warning messages without dying. They are many messages like this one:
/Applications/miniconda2/envs/qiime2-2018.8/lib/python3.5/site-packages/deblur/workflow.py:851: UserWarning: Problem removing artifacts from file /private/var/folders/14/2xdw_prn4_z9ds_35n20v8_80000gn/T/qiime2-archive-usb8k0z7/3bf2982b-4dee-4cc6-9c50-5723510b2e57/data/ltreb.2015.098.288_1246_L001_R1_001.fastq.gz
seqs_fp, UserWarning)
Hi Daniel,
Here is another question. I ran dada2 on this same dataset and received an error. Here is the Debug Info file. I cannot figure out the problem. Is it similar to the potential problem you identified in my deblur run?
I'm not very familiar with the DADA2 runtime. However, the error output in the log file states "Invalid derep$quals matrix. Quality values must be positive integers." This suggests one of the input files either has unusual PHRED scores, the PHRED offset being used is incorrect, or perhaps some other underlying issue.
OK, but the deblur defaults do not work on the larger dataset, so perhaps running with --p-min-reads=1 will help identify the problem. The deblur log is too big to upload so I copy/pasted two entries below. The first is for one of the files that produced a UserWarning. The second is one that did not produce a UserWarning
It sounds like there might be something wrong with the PHRED scores in the EMP dataset, based on the dada2 errors. Do you know of a tool to test for unusual PHRED scores?