Comparison of DADA2 with Deblur

Mark_Gillingham · March 21, 2018, 3:54pm

Dear all,

I have two related questions concerning DADA2 and Deblur. I understand that both methods have their merits and drawbacks and for the moment at least there is no consensus regarding which method is better suited when. I therefore decided to apply both methods to my large avian cloacal swab 16S dataset and compare the results to get an idea of which method may be more appropriate when. Below is my script:

source activate qiime2-2018.2

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path /home/mark/Dropbox/MicrobiomeHM/QIIME2/FinalAnalysis/RawData
--source-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path /home/mark/Dropbox/MicrobiomeHM/QIIME2/FinalAnalysis/qiime2output/Deblur/demux-paired-end.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-cores 8
--p-front-f NNNNGTGCCAGCNGCCGCGGTAA
--p-front-r GGACTACNVGGGTWTCTAAT
--verbose
--output-dir trimmed
#qiime tools validate demux.qza

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed/trimmed_sequences.qza
--p-trim-left-f 0
--p-trunc-len-f 219
--p-trunc-len-r 198
--p-n-threads 8
--o-representative-sequences trimmed/DADA2/rep-seqs-dada2.qza
--o-table trimmed/DADA2/table-dada2.qza

qiime vsearch join-pairs
--i-demultiplexed-seqs trimmed/trimmed_sequences.qza
--verbose
--o-joined-sequences trimmed/deblur/demux-joined.qza

qiime quality-filter q-score-joined
--i-demux trimmed/deblur/ndemux-joined.qza
--o-filtered-sequences trimmed/deblur/demux-joined-filtered.qza
--o-filter-stats trimmed/deblur/demux-joined-filter-stats.qza

qiime deblur denoise-16S
--i-demultiplexed-seqs trimmed/deblur/demux-joined-filtered.qza
--p-trim-length 253
--p-jobs-to-start 8
--p-sample-stats
--o-representative-sequences trimmed/deblur/rep-seqs.qza
--o-table trimmed/deblur/table.qza
--o-stats trimmed/deblur/deblur-stats.qza

Following this script I am removing primers using cutadapt within QIIME 2. I apply DADA2 directly on the unmerged paired-end sequences following the moving pictures tutorial. For Deblur I am merging my pair-end sequences using vsearch and applying a quality filter prior to denoising following the Alternative methods tutorial.

My dataset initially contains 30,761,377 sequences. After denoising using DADA2 I obtain 21,637,825 and after filtering for bacterial sequences I get 20,937,471 sequences and 15,042 features. On the other hand, after denoising using Deblur I obtain 7,749,8954 and 9,373 features. That is in my mind a very big difference in the number of sequences and features using the same dataset from the two denoising methods.

Digging deeper into the data of course all sequences from Deblur are 253bp (since I specified that all sequences are trimmed at 253bp since Deblur requires that all sequences are of the same length). On the hand, sequences of features from DADA2 are of varying length as follows:

feature length	frequency	Proportion
219	4	0.000
221	3	0.000
227	3	0.000
234	1	0.000
237	1	0.000
240	64	0.004
249	1	0.000
250	1	0.000
251	24	0.002
252	1171	0.078
253	12883	0.856
254	797	0.053
255	49	0.003
256	3	0.000
257	2	0.000
258	3	0.000
267	1	0.000
269	2	0.000
270	4	0.000
272	3	0.000
273	10	0.001
274	7	0.000
276	3	0.000
302	1	0.000
318	1	0.000

Therefore only 12883 features (85.6%) are of the expected amplicon length. However this is insufficient to explain the large discrepancy in the number of features between DADA2 and Deblur. When I continue downstream analyses, Unifrac (weighed and unweighted) PCoA plots looks very similar between methods and alpha diversity measures (although lower in absolute terms for Deblur) also give extremely similar results (when comparing between sampling sites for example). Similarly I get very similar phylum, class and genus composition of the micobiome (e.g. proportion of each phylum across sampling sites) regardless of the denoising dataset used. So it seems that biological interpretation (at least superficially) would be very similar regardless of the method I choose to use.

So my questions are (sorry for the long post):

If I choose to use DADA2, should I filter for sequences that are of the expected amplicon length? I noticed that in the R package there is the option to filter for sequence length using maxLen and minLen. Why are these features not included in the QIIME 2 plug-in? To me all features that are not 253bp are likely to be sequencing/PCR artefacts...
I understand that both denoising methods use very different methods but are such large differences in the absolute number of reads and features expected? Did I do something wrong? Have you experienced similar results? (I tried my script on another dataset from a colleague and get a very similar discrepancy in the results suggesting that it is not dataset dependent)

Many thanks for any feedback you may provide!

Best,

Mark

tanaes · March 21, 2018, 5:05pm

Hey Mark,

This sounds consistent with my expectations for the two programs, and follows from differences in the principles of how they operate. As I understand it (and someone please jump in to correct me if I'm wrong), the large differences in the number of sequences remaining arises from a difference in what happens to 'erroneous' sequences. In DADA2, those sequences are changed to match the sequence from which they're inferred to have arisen. In Deblur, those sequences are removed.

The close matches in estimated diversity patterns also fit my experience, and are good to hear!

Cheers,
-jon

colinbrislawn · March 22, 2018, 12:37am

Thank you for this great post, Mark. I appreciate your detailed comparison.

Hopefully the deblur devs and dada2 devs will 'qiime in' to compare their methods.

And thank goodness for that!

Colin

lkursell · March 22, 2018, 1:58pm

Hi Mark - I'm willing to wager that if you performed a very basic feature filtering along the lines of dropping singletons or features with less than 10 sequences across your run, the number of ASVs would be nearly identical. This preprint also shows that the two methods are nearly identical for all features expect the ones with the least amount of sequences, which are probably the ones you'll remove due to lack of coverage anyway. In practice, I think, just pick your favorite and minimize effort spent trying to pick out potentially small but ultimately insignificant differences.

Mark_Gillingham · March 22, 2018, 2:19pm

Hi @lkursell, Thanks for your suggestion. I already read your paper Very interesting.

I tried removing features with less than 10 sequences with the following script:

qiime feature-table filter-features
--i-table trimmed/DADA2/table-dada2.qza
--p-min-frequency 10
--o-filtered-table trimmed/DADA2/dada2-feature-frequency-filtered-table.qza

qiime feature-table summarize
--i-table trimmed/DADA2/dada2-feature-frequency-filtered-table.qza
--o-visualization trimmed/DADA2/dada2-feature-frequency-filtered-table.qzv
--m-sample-metadata-file mapping_flamingo_QIIME.txt

qiime feature-table filter-seqs
--i-data trimmed/DADA2/rep-seqs-dada2.qza
--i-table trimmed/DADA2/dada2-feature-frequency-filtered-table.qza
--o-filtered-data trimmed/DADA2/dada2-feature-frequency-filtered-rep-seqs.qza

It does significantly decrease the number of features from the DADA2 dataset from 15,042 to 11326. The distribution of feature length was as follows:

feature length	frequency	Proportion
219	3	0.000
221	1	0.000
227	2	0.000
240	64	0.006
249	1	0.000
250	1	0.000
251	18	0.002
252	1030	0.091
253	9597	0.847
254	559	0.049
255	25	0.002
256	3	0.000
257	2	0.000
258	2	0.000
270	2	0.000
272	3	0.000
273	6	0.001
274	5	0.000
276	1	0.000
318	1	0.000

Which brings me back to my initial first question. Should I filter by feature length? If I do that I get a very similar number of features to Deblur (9597 vs 9373; not that this is justification for doing these filtering setps in itself ).

wasade · March 22, 2018, 4:10pm

Hi @Mark_Gillingham,

If the sequence lengths differ systematically between samples, then you may bias subsequent diversity calculations. Normalizing sequence lengths are an important factor when performing meta analysis and can drive bias. I think it's discussed in either Debelius et al or Lozupone et al.

Best,
Daniel

benjjneb · March 23, 2018, 12:34am

This is a completely valid option (which I'm assuming there is a command to do in Q2?). "Cutting a band in silico" to choose amplicons of the expected length is a good way to clean up certain kinds of residual artefacts that can crop up. The DADA2 pipeline does not do this by default because in some cases those non-target length amplicons are real interesting things. For example, Trichomonas vaginalis, the protozoan cause of trichomoniasis, is detected by the EMP primer set, but is ~290 nts rather than 251-255 expected V4 length.

No! There is biological length variation in the V4 region! (as in most amplicons)

V4 is a pretty tight distribution, about 251-255 (EMP primer set) captures almost all the V4 sequences. So pruning to 251-255 is legit, but throwing away all sequences that aren't exactly 253nts will throw away real sequences and some taxa preferentially.

This is probably because of the difference in how errors are handled, DADA2 corrects errors and Deblur removes errors. This difference has a big effect on the total reads that make it to the end, but a much smaller impact on the frequencies, which are more important.

Edit: Should have read the thread first, @tanaes already covered the total reads question!

Mark_Gillingham · March 23, 2018, 10:17am

Many thanks for your extensive feedback which has answered most of my queries! As highlighted by @benjjneb, is there way to filter for sequence length as done by the maxLen and minLen functions within DADA2 of the R package? This remains the only unanswered issue from my initial questions. I cannot find an easy way to do this within QIIME2.

Cheers!

Nicholas_Bokulich · March 23, 2018, 5:10pm

Alas, no. We have an open issue so this is on our radar, and we will report here when this makes it into a future release. Maybe it would also make sense to expose those parameters in q2-dada2 so this can be done in-line — what do you think @benjjneb?

For now, if you know the ID of those features, you could just remove them with filter-features, or set a low abundance threshold to remove singletons/doubletons from your data.

Mark_Gillingham · March 23, 2018, 5:43pm

@Nicholas_Bokulich ok thanks for letting me know. As a final update, filtering for features that are classified at the phylum level reduce feature length distributions considerably:

Length	Frequency	Proportion
227	1	0.000
251	15	0.001
252	970	0.089
253	9350	0.860
254	515	0.047
255	21	0.002
256	3	0.000
257	1	0.000

best

Mark

benjjneb · March 23, 2018, 5:56pm

IMO the best way to do this would be through the feature-table plugin, so the functionality is available no matter how you generate the sequence table, rather than being stuck inside the dada2 plugin workflow.

How hard would it be to add something that can filter features based on sequence lengths, as can be done with taxonomy?

Nicholas_Bokulich · March 23, 2018, 6:08pm

Agreed — I was just wondering if additionally exposing in dada2 would be worthwhile. It should not be hard at all to incorporate something like this in q2-feature-table, developer time is the bottleneck here — it's on our radar.

Awesome — that is a very good indicator that those sequences were probably garbage (or at least non-target DNA), anyway, if they do not receive a phylum-level classification.

system · April 24, 2018, 12:08am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.