Q2-ITSxpress: A tutorial on a QIIME 2 plugin to trim ITS sequences

Oh, that’s a bummer. I saw that Dada2 added single-end support so I went forward with using merging in ITSxpress based on my bad assumption that merged reads could be used equally well.

One solution to the issue could be to use unsupervised HMM training to estimate an emission and transition matrix for the merged and unmerged regions based on the pattern of quality scores. Then the Hmm could be applied to segment the reads and learn three different error rates. It’s not trivial though.

How are error rates learned for unpaired sequences since they cannot be merged? Are similar reads clustered then compared?

I wanted to follow up with a question about Deblur for ITS. @wasade and @gregcaporaso, in general, what are your thoughts on the appropriateness of using merged data in Deblur? How does Deblur handle merged data and does merging impact the performance of the Deblur denoise-other algorithm? Also what is an appropriate positive filter file for ITS regions using denoise-other?

Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?

deblur can handle pre-merged reads — actually, paired-end reads must be joined prior to passing to q2-deblur.

This is just to perform a rough positive filter. I've used the UNITE sequences clustered at 97% (mostly because pre-clustered seqs existed at that level) but you could probably go lower. For 16S I think the greengenes 88% OTUs are used.

1 Like

The way it works in dada2 is the forward reads are denoised and the reverse reads denoised separately (so the error model for each is consistent, e.g. its the forward-read error model across the full forward reads). Then reads are merged.

It's a solvable problem, but also not entirely trivial, and we just don't have the time to devote to it given how well the merge-later workflow works, including for ITS. If we get time (i.e. $upport) its something I'd like to revisit though because merge-first is more convenient for ITS in particular.

2 Likes

@Adam_Rivers, I think some of your questions for me were already answered in the discussion here, but I wanted to follow up to be sure that you're not waiting on input. Please let me know if I've missed anything.

I think this would be a very useful workflow to support.

Yes, just to clarify, if SampleData[PairedEndSequencesWithQuality] is provided to denoise-single, the reverse reads are just ignored. This is for convenience so the user can create one SampleData[PairedEndSequencesWithQuality] artifact, and use it with denoise methods that take single or paired end reads.

Pre-joined reads aren't accepted by DADA2 (I think that was already clear from some of the other discussion on this thread, but just wanted to reply to this question specifically).

Yes, that should be the only change that you need to make.

1 Like

Thanks for all the feedback @gregcaporaso @benjjneb and @Nicholas_Bokulich

So I will:

  1. Add add an option to export unpaired reads for Dada2 in the format SampleData[PairedEndSequencesWithQuality]

  2. Add an option to export SampleData[JoinedSequencesWithQuality] for Deblur and

  3. Remove the ability to export SampleData[SequencesWithQuality]

To answer @Nicholas_Bokulich's question:

Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?

That could be done but it would more than double the running time and would not allow validation that the beginning and end are present for the selected ITS region. After thinking about it more I realised I could calculate the 5' trimming positions of the reads from the merged sequences. so I will do that instead.

1 Like

An off-topic reply has been split into a new topic: How to install q2-itsxpress in a virtual machine?

Please keep replies on-topic in the future.

@Adam_Rivers
Thank you for providing this plugin for ITS. I think it is worth including this step in the pre-processing of ITS.
I wonder whether by now it is possible to run itsxpress on forward and reverse reads, and not merged reads.
Could you please update us regarding this matter?

Yes, I added support for the output of unmerged reads last month. Update itsxpress and q2-itsxpress and you should be all set. The itsxpress tutorial has also been updated with instructions on outputting unmerged reads for Dada2.

2 Likes

An off-topic reply has been split into a new topic: Q2-itsxpress param help

Please keep replies on-topic in the future.

A post was split to a new topic: q2-itsxpress: can we visualize the outputs as a QZV?

A post was split to a new topic: does ITSxpress assume that primers/barcodes have been removed from sequences?

4 posts were split to a new topic: does q2-itsxpress remove 5.8S from full ITS amplicons?

A post was split to a new topic: q2-itsxpress error

2 posts were split to a new topic: q2-itsxpress bbmerge error

An off-topic reply has been split into a new topic: q2-itsxpress bbmerge error 2

Please keep replies on-topic in the future.

An off-topic reply has been split into a new topic: ITS samples sequenced on a NovaSeq machine

Please keep replies on-topic in the future.

An off-topic reply has been split into a new topic: unsupervised HMM training

Please keep replies on-topic in the future.

An off-topic reply has been split into a new topic: trimmed_exact.qza in ITSxpress

Please keep replies on-topic in the future.