Oh, that’s a bummer. I saw that Dada2 added single-end support so I went forward with using merging in ITSxpress based on my bad assumption that merged reads could be used equally well.
One solution to the issue could be to use unsupervised HMM training to estimate an emission and transition matrix for the merged and unmerged regions based on the pattern of quality scores. Then the Hmm could be applied to segment the reads and learn three different error rates. It’s not trivial though.
How are error rates learned for unpaired sequences since they cannot be merged? Are similar reads clustered then compared?
I wanted to follow up with a question about Deblur for ITS. @wasade and @gregcaporaso, in general, what are your thoughts on the appropriateness of using merged data in Deblur? How does Deblur handle merged data and does merging impact the performance of the Deblur denoise-other algorithm? Also what is an appropriate positive filter file for ITS regions using denoise-other?
Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?
deblur can handle pre-merged reads — actually, paired-end reads must be joined prior to passing to q2-deblur.
This is just to perform a rough positive filter. I've used the UNITE sequences clustered at 97% (mostly because pre-clustered seqs existed at that level) but you could probably go lower. For 16S I think the greengenes 88% OTUs are used.
The way it works in dada2 is the forward reads are denoised and the reverse reads denoised separately (so the error model for each is consistent, e.g. its the forward-read error model across the full forward reads). Then reads are merged.
It's a solvable problem, but also not entirely trivial, and we just don't have the time to devote to it given how well the merge-later workflow works, including for ITS. If we get time (i.e. $upport) its something I'd like to revisit though because merge-first is more convenient for ITS in particular.
@Adam_Rivers, I think some of your questions for me were already answered in the discussion here, but I wanted to follow up to be sure that you're not waiting on input. Please let me know if I've missed anything.
I think this would be a very useful workflow to support.
Yes, just to clarify, if SampleData[PairedEndSequencesWithQuality] is provided to denoise-single, the reverse reads are just ignored. This is for convenience so the user can create one SampleData[PairedEndSequencesWithQuality] artifact, and use it with denoise methods that take single or paired end reads.
Pre-joined reads aren't accepted by DADA2 (I think that was already clear from some of the other discussion on this thread, but just wanted to reply to this question specifically).
Yes, that should be the only change that you need to make.
Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?
That could be done but it would more than double the running time and would not allow validation that the beginning and end are present for the selected ITS region. After thinking about it more I realised I could calculate the 5' trimming positions of the reads from the merged sequences. so I will do that instead.
@Adam_Rivers
Thank you for providing this plugin for ITS. I think it is worth including this step in the pre-processing of ITS.
I wonder whether by now it is possible to run itsxpress on forward and reverse reads, and not merged reads.
Could you please update us regarding this matter?
Yes, I added support for the output of unmerged reads last month. Update itsxpress and q2-itsxpress and you should be all set. The itsxpress tutorial has also been updated with instructions on outputting unmerged reads for Dada2.