Denoising with Deblur (only forward reads?)

arwqiime · February 11, 2019, 4:35pm

I'm analyzing V3-V4 amplicon data (Illumina 150 bp or 300 bp paired-end) from 16S rDNA loci and used deblur to denoise the raw data (after qiime quality-filter q-score).
In the deblur script it is stated that "Only forward reads are supported at this time". Does this mean that the reverse reads are ignored (like analyzing only 150 bp single end data)?

I realized that deblur returns more 'representative_sequences' compared to DADA2, which tries to join paired-end sequences (which can be difficult if the overlap is not good enough). Even 300 bp paire-end data sets are often not 'true' 300 bp, because the quality tents to drop above 200 bp and should be truncated by --p-trunc-len-f and --p-trunc-len-r, respectively .

So, does it make sense to use deblur on paired-end reads, if only the forward strand is used?
Is there an option not to loose non-joined sequences after DADA2 step for subsequent analyses?

Best regards

Nicholas_Bokulich · February 12, 2019, 1:26pm

Yes. If you input paired-end data, the reverse reads will be dropped. You need to join the reads before running deblur (see here).

No

Good luck!

arwqiime · February 13, 2019, 9:57am

Thank you for the explanations on deblur and dada2.
I ran into this situation with a dataset of 150 bp paired-end reads on V3-V4 loci, which do not overlap (in contrast to the 300 bp paired-end MiSeq data of my other projects).

Therefore, it seems that only a deblur analysis using only the forward read is possible at the moment if I want to get ASVs instead of OTUs.

To find out if any other pipline would be able to analyse paired-end sequences (not joined), I have looked into the "Clustering sequences into OTUs using q2-vsearch" tutorial and read that vsearch should be possible to run with "demultiplexed, quality-controlled sequence data" sequences (qiime quality-filter q-score accepts [PairedEndSequencesWithQuality])? This step results in an artifact of [SequencesWithQuality] that should be accepted by the next step: "qiime vsearch dereplicate-sequences" expects [JoinedSequencesWithQuality] | SampleData[SequencesWithQuality] | SampleData[Sequences].

So, would qiime vsearch be an alternative for deblur and dada2 based pipelines, if read joining is not possible?

Best regards

Nicholas_Bokulich · February 13, 2019, 1:40pm

No, I believe the workflow you described would result in the same behavior: the reverse reads will be ignored, and dropped during OTU clustering (if not earlier).

You could attempt to concatenate your paired reads (outside of QIIME 2 before importing), but I do not know how this will impact downstream steps (could be dangerous!) so I do not want to recommend it... that said, it would not hurt to try. If you do, let us know how it turns out — I'd recommend paying close attention to the taxonomy classification results to see if they make sense and compare both the concatenated and forward-read-only analysis using beta diversity (via procrustes) and taxonomy classification (via q2-quality-control and barplots).

Good luck!

arwqiime · February 15, 2019, 3:26pm

I have tested the outcome of concatenated reads outside q2, but this did not result in taxonomic annotations that I can see with deblur and dada2 (single-end). Interstingly, both single-end analyses are very similar to the deblur paired-end analysis, that I mistakenly conducted before. It could well be that the reverse reads were droppen at one step, and the 'paired-end' data seqt were just single-end in reality. Well, intersting to see this!
A q2 quality-control evaluate-taxonomy on the two classifications (deblur single-end vs. deblur 'paired-end') resulted in the following vizualization, where the 'reference taxonomy' was deblue single-end and the 'observed taxonomy' was deblur paired-end:
comp-tax-Bact1-se-pe-deblur.qza.qzv (288.1 KB)
I assume the two taxonomy results are very similar...

I thought I could use this great tool 'qualty-control' to compare other analyes: I used a single-end dataset and analyzed it with deblur and dada2, and then compared the two taxonomy classifcations, and I got this visualization:
comp-Bact1-Mi1i2-deblur-dada2.qzv (255.4 KB)
Here, I'm afraid that I this result is not true. This would indicate a perfect match, but I doubt about it.
The evaluate-taxonomy script needs two FeatureData[Taxonomy] artifacts as input data. Is it reasonable that deblur and dada2 do indeed create such very similar taxonomy results? Of course, they should, but I would expect that this should result in 'very similar' taxonomic calssifications, but not in 'identical', right?

Thanks for you great support!

Nicholas_Bokulich · February 15, 2019, 3:37pm

Yes, that indicates that the results are very similar. However, you are correct that these are not truly paired end:

Processing paired-end sequences with q-score or deblur causes the reverse read to be silently dropped. So the similarity makes sense. Why the classifications are not identical is because you are using two different classifiers to classify these sequences.

A perfect match does seem unlikely, but it is certainly possible. I recommend inspecting the taxonomic results (maybe spot check a few) to confirm. You should be able to use qiime metadata tabulate to conveniently merge your two taxonomic results for easy comparison.

system · March 18, 2019, 9:38pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.