Q2-ITSxpress: A tutorial on a QIIME 2 plugin to trim ITS sequences

Hello Adam_Rivers
when I used your data ,I got a mistake .
when I use “deblur denoise-16S” to get “table.qza ;rep-seqs.qza” but the two files are empty. but it seems not because of the data

I'm not the Deblur developer, but It looks like qiime deblur deblur-16S is designed for 16S sequences not ITS sequences. You can follow the tutorial and use Dada2 or use qiime deblur denoise-other and provide it an appropriate ITS positive filter file.

The help for this command says.

Usage: qiime deblur denoise-16S [OPTIONS]

Perform sequence quality control for Illumina data using the Deblur
workflow with a 16S reference as a positive filter. Only forward reads are
supported at this time. The specific reference used is the 88% OTUs from
Greengenes 13_8. This mode of operation should only be used when data were
generated from a 16S amplicon protocol on an Illumina platform. The
reference is only used to assess whether each sequence is likely to be 16S
by a local alignment using SortMeRNA with a permissive e-value; the
reference is not used to characterize the sequences.

1 Like

Thanks for reviewing @gregcaporaso . I'll work through the comments. I did have questions about the output data types in point 7.

I was looking at q2_dada2's input types and it looks like:

I don't see anything about the type SampleData[JoinedSequencesWithQuality] being accepted by dada2 is it a subtype or something?

If I did change types, can I just change my data output type from SampleData[SequencesWithQuality] to SampleData[JoinedSequencesWithQuality] without any additional changes?

I think that Dada2 has the ability to learn error rates from single ended data as well as paired end data now, but I don't know the method. @benjjneb can you provide guidance on this? How are error profiles calculated in denoise_single ? Will merged scores interfere with the error rate estimation procedures?

Merging by BBMerge does change the quality scores since most positions are verified by two reads. The score change is shown for one ITS1 sample here:

Changing to a paied end output requires a major rewrite of ITSxpress so I'd like to explore other options first.

I'm not quire sure I understand point 8, but I can remove that input type. Do I then need to add a third command for SampleData[JoinedSequencesWithQuality] that outputs the same type?

Adam

In short, DADA2 does not recommend processing pre-merged reads, because the different regions of pre-merged reads (Forward-only, overlapping, reverse-only) have different relationships between the assigned quality scores and the error rates, which can lead to false-positive ASV inference. You can see more discussion of this here: Fungal ITS pre-processing suggestion · Issue #327 · benjjneb/dada2 · GitHub

This is a bit annoying, and we would like to support pre-merged reads, but when we evaluated this possibility again recently we were not able to attain the same accuracy on such reads as we could in our recommended merge-later workflow.

Oh, that’s a bummer. I saw that Dada2 added single-end support so I went forward with using merging in ITSxpress based on my bad assumption that merged reads could be used equally well.

One solution to the issue could be to use unsupervised HMM training to estimate an emission and transition matrix for the merged and unmerged regions based on the pattern of quality scores. Then the Hmm could be applied to segment the reads and learn three different error rates. It’s not trivial though.

How are error rates learned for unpaired sequences since they cannot be merged? Are similar reads clustered then compared?

I wanted to follow up with a question about Deblur for ITS. @wasade and @gregcaporaso, in general, what are your thoughts on the appropriateness of using merged data in Deblur? How does Deblur handle merged data and does merging impact the performance of the Deblur denoise-other algorithm? Also what is an appropriate positive filter file for ITS regions using denoise-other?

Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?

deblur can handle pre-merged reads — actually, paired-end reads must be joined prior to passing to q2-deblur.

This is just to perform a rough positive filter. I've used the UNITE sequences clustered at 97% (mostly because pre-clustered seqs existed at that level) but you could probably go lower. For 16S I think the greengenes 88% OTUs are used.

1 Like

The way it works in dada2 is the forward reads are denoised and the reverse reads denoised separately (so the error model for each is consistent, e.g. its the forward-read error model across the full forward reads). Then reads are merged.

It's a solvable problem, but also not entirely trivial, and we just don't have the time to devote to it given how well the merge-later workflow works, including for ITS. If we get time (i.e. $upport) its something I'd like to revisit though because merge-first is more convenient for ITS in particular.

2 Likes

@Adam_Rivers, I think some of your questions for me were already answered in the discussion here, but I wanted to follow up to be sure that you're not waiting on input. Please let me know if I've missed anything.

I think this would be a very useful workflow to support.

Yes, just to clarify, if SampleData[PairedEndSequencesWithQuality] is provided to denoise-single, the reverse reads are just ignored. This is for convenience so the user can create one SampleData[PairedEndSequencesWithQuality] artifact, and use it with denoise methods that take single or paired end reads.

Pre-joined reads aren't accepted by DADA2 (I think that was already clear from some of the other discussion on this thread, but just wanted to reply to this question specifically).

Yes, that should be the only change that you need to make.

1 Like

Thanks for all the feedback @gregcaporaso @benjjneb and @Nicholas_Bokulich

So I will:

  1. Add add an option to export unpaired reads for Dada2 in the format SampleData[PairedEndSequencesWithQuality]

  2. Add an option to export SampleData[JoinedSequencesWithQuality] for Deblur and

  3. Remove the ability to export SampleData[SequencesWithQuality]

To answer @Nicholas_Bokulich's question:

Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?

That could be done but it would more than double the running time and would not allow validation that the beginning and end are present for the selected ITS region. After thinking about it more I realised I could calculate the 5' trimming positions of the reads from the merged sequences. so I will do that instead.

1 Like

An off-topic reply has been split into a new topic: How to install q2-itsxpress in a virtual machine?

Please keep replies on-topic in the future.

@Adam_Rivers
Thank you for providing this plugin for ITS. I think it is worth including this step in the pre-processing of ITS.
I wonder whether by now it is possible to run itsxpress on forward and reverse reads, and not merged reads.
Could you please update us regarding this matter?

Yes, I added support for the output of unmerged reads last month. Update itsxpress and q2-itsxpress and you should be all set. The itsxpress tutorial has also been updated with instructions on outputting unmerged reads for Dada2.

2 Likes

An off-topic reply has been split into a new topic: Q2-itsxpress param help

Please keep replies on-topic in the future.

A post was split to a new topic: q2-itsxpress: can we visualize the outputs as a QZV?

A post was split to a new topic: does ITSxpress assume that primers/barcodes have been removed from sequences?

4 posts were split to a new topic: does q2-itsxpress remove 5.8S from full ITS amplicons?