Q2-ITSxpress: A tutorial on a QIIME 2 plugin to trim ITS sequences

Looks like bioconda didn’t get added correctly. Try:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

Thanks for your time on this. :slight_smile: I have uninstalled and re-installed qiime2, and I ran the commands you suggested but still get a similar error:

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

- q2-itsxpress
Current channels:

- https://conda.anaconda.org/conda-forge/osx-64

- https://conda.anaconda.org/conda-forge/noarch

- https://conda.anaconda.org/bioconda/osx-64

- https://conda.anaconda.org/bioconda/noarch

- https://repo.anaconda.com/pkgs/main/osx-64

- https://repo.anaconda.com/pkgs/main/noarch

- https://repo.anaconda.com/pkgs/free/osx-64

- https://repo.anaconda.com/pkgs/free/noarch

- https://repo.anaconda.com/pkgs/r/osx-64

- https://repo.anaconda.com/pkgs/r/noarch

- https://repo.anaconda.com/pkgs/pro/osx-64

- https://repo.anaconda.com/pkgs/pro/noarch

Sorry, I made a mistake in the install instructions but I just updated the tutorial to correct the issue. The correct installation commands are:

conda config --add channels bioconda
conda install itsxpress
pip install q2-itsxpress
1 Like

This install works fine for me now. The plugin however, doesn’t appear to be compatible with ‘EMPPairedEndSequences’ type files as produced by the following command:

qiime tools import
–type EMPPairedEndSequences
–input-path raw-seqs
–output-path paired-end-sequences.qza

Where I get this error:

Plugin error from itsxpress:

Argument to parameter ‘per_sample_sequences’ is not a subtype of SampleData[PairedEndSequencesWithQuality].

Debug info has been saved to /var/folders/x4/jr46ngl90px4376_dbktnc6jmk70ht/T/qiime2-q2cli-err-sv581_bv.log

Is this something that this plugin will accept in the future? I am happy to help you troubleshoot things as much as I am able, as I think this is a valuable tool for QIIME (and me :)).

Cheers,

–Lorinda

@Lorinda — I am not involved with the development of this plugin, but it looks like you are trying to use multiplexed sequences where demultiplexed sequences are expected.

Use demux emp-paired to demux, then provide that is input to itsexpress.

Again, I am not involved with the development of this plugin, but I would suggest that maybe that kind of functionality doesn’t make sense in the context of this plugin. I think the general design idea here in QIIME 2 is to compose modular building blocks of functionality. Each plugin/method should do one thing, and one thing well.

Ah I see, I was just incorrectly adopting this to my fit my protocol. It seems to work using the demultiplexed file. Thanks!

1 Like

Thanks @thermokarst, I’ll add an explanatory note on the tutorial explaining that demultiplexed sequences are needed.

2 Likes

@Adam_Rivers, thanks a lot for your work on q2-itsxpress and for this tutorial! I worked through it and I have a few suggestions for improvements. There are only two changes that I think might be critical to make as soon as possible (noted as critical below) - the rest you should just think of as suggestions that you can take or leave.

  1. Would it be possible to provide wget or curl commands for tutorial file downloads? That will help us to automate testing of this tutorial in the future, which will help you to ensure that the commands in the tutorial never become out of date (e.g., because of changing interfaces or broken file download links).

  2. I recommend replacing or expanding your text “paired-end sequences with quality” or “sequences with quality” with literal blocks containing the names of these types (i.e., SampleData[PairedEndSequencesWithQuality] or SampleData[SequencesWithQuality]) as that may be more clear for users.

  3. I recommend replacing your conda config and conda install command with conda install -c bioconda itsxpress. This will then only impact the current installation, and doesn’t impact what channels conda will look for at other times (e.g., when the user is using conda to install something unrelated).

  4. It looks like you don’t have citations officially associated with your plugin:

    (qiime2-2018.6) 13:25:23 temp$ qiime itsxpress --citations
    No citations found.
    (qiime2-2018.6) 13:25:38 temp$ qiime itsxpress trim-pair --citations
    No citations found.
    

    See here and here for an example of how to add a citation to your plugin. Without this information, users will have a harder time knowing how to cite your work.

  5. I’ve attached a manifest file (manifest.txt (170 Bytes)) that you could include in your tutorial that won’t require the user to make edits before importing if the manifest.txt is in the same directory as the fastq files.

  6. This one is really probably just personal preference, but I recommend making all of the commands in your tutorial single-threaded only (i.e., drop --p-threads 2 and any similar parameters from commands that use them). Since that parameter is a little bit dependent on the environment where the command is running, and since it’s not required, I usually just opt to not specify options for parallel processing in tutorials.

  7. critical The merged sequences can be fed directly into Dada2 using the denoise-single command. I don’t think this is correct. While this will technically execute I think you’re violating assumptions that DADA2 is making about the error profiles, so DADA2 itself won’t work as expected. The output semantic type for the trim-pair method should instead be SampleData[JoinedSequencesWithQuality]. These data can be passed to methods which accept this input such as qiime quality-filter q-score-joined or qiime deblur denoise-other, but my understanding is that it is technically incorrect to pass pre-joined data to DADA2. (If you could trim but not join the reads, and output a SampleData[PairedEndSequencesWithQuality], than that should be ok to pass to qiime dada2 denoise-paired.)

  8. critical SampleData[JoinedSequencesWithQuality] should not be an allowed input type to trim-single. This is a bit annoying, and something we’re going to fix soon, but because you have to specify the output type as SampleData[SequencesWithQuality], the information that these reads have been joined would be lost in the process of executing trim-single. In the future, methods will support mapping of input types onto output types so this information could be preserved. In the meantime, I recommend just not allowing the user to pass pre-joined sequences to the itsxpress methods, since they can join either during or after running itsxpress.

  9. Looks like you left time in one of your commands: time qiime dada2 denoise-single.

Thanks again for this contribution @Adam_Rivers! We’re very excited to have this a part of the QIIME 2 ecosystem! Please let me know if you have any questions about this.

5 Likes

Hello Adam_Rivers
when I used your data ,I got a mistake .
when I use “deblur denoise-16S” to get “table.qza ;rep-seqs.qza” but the two files are empty. but it seems not because of the data

I’m not the Deblur developer, but It looks like qiime deblur deblur-16S is designed for 16S sequences not ITS sequences. You can follow the tutorial and use Dada2 or use qiime deblur denoise-other and provide it an appropriate ITS positive filter file.

The help for this command says.

Usage: qiime deblur denoise-16S [OPTIONS]

Perform sequence quality control for Illumina data using the Deblur
workflow with a 16S reference as a positive filter. Only forward reads are
supported at this time. The specific reference used is the 88% OTUs from
Greengenes 13_8. This mode of operation should only be used when data were
generated from a 16S amplicon protocol on an Illumina platform. The
reference is only used to assess whether each sequence is likely to be 16S
by a local alignment using SortMeRNA with a permissive e-value; the
reference is not used to characterize the sequences.

1 Like

Thanks for reviewing @gregcaporaso . I’ll work through the comments. I did have questions about the output data types in point 7.

I was looking at q2_dada2’s input types and it looks like:

I don’t see anything about the type SampleData[JoinedSequencesWithQuality] being accepted by dada2 is it a subtype or something?

If I did change types, can I just change my data output type from SampleData[SequencesWithQuality] to SampleData[JoinedSequencesWithQuality] without any additional changes?

I think that Dada2 has the ability to learn error rates from single ended data as well as paired end data now, but I don’t know the method. @benjjneb can you provide guidance on this? How are error profiles calculated in denoise_single ? Will merged scores interfere with the error rate estimation procedures?

Merging by BBMerge does change the quality scores since most positions are verified by two reads. The score change is shown for one ITS1 sample here:

Changing to a paied end output requires a major rewrite of ITSxpress so I’d like to explore other options first.

I’m not quire sure I understand point 8, but I can remove that input type. Do I then need to add a third command for SampleData[JoinedSequencesWithQuality] that outputs the same type?

Adam

In short, DADA2 does not recommend processing pre-merged reads, because the different regions of pre-merged reads (Forward-only, overlapping, reverse-only) have different relationships between the assigned quality scores and the error rates, which can lead to false-positive ASV inference. You can see more discussion of this here: https://github.com/benjjneb/dada2/issues/327#issuecomment-400022629

This is a bit annoying, and we would like to support pre-merged reads, but when we evaluated this possibility again recently we were not able to attain the same accuracy on such reads as we could in our recommended merge-later workflow.

Oh, that’s a bummer. I saw that Dada2 added single-end support so I went forward with using merging in ITSxpress based on my bad assumption that merged reads could be used equally well.

One solution to the issue could be to use unsupervised HMM training to estimate an emission and transition matrix for the merged and unmerged regions based on the pattern of quality scores. Then the Hmm could be applied to segment the reads and learn three different error rates. It’s not trivial though.

How are error rates learned for unpaired sequences since they cannot be merged? Are similar reads clustered then compared?

I wanted to follow up with a question about Deblur for ITS. @wasade and @gregcaporaso, in general, what are your thoughts on the appropriateness of using merged data in Deblur? How does Deblur handle merged data and does merging impact the performance of the Deblur denoise-other algorithm? Also what is an appropriate positive filter file for ITS regions using denoise-other?

Does ITSxpress work on single-end reads? If so, why not use it to trim ITS in forward/reverse separately, and then denoise with dada2?

deblur can handle pre-merged reads — actually, paired-end reads must be joined prior to passing to q2-deblur.

This is just to perform a rough positive filter. I’ve used the UNITE sequences clustered at 97% (mostly because pre-clustered seqs existed at that level) but you could probably go lower. For 16S I think the greengenes 88% OTUs are used.

1 Like

The way it works in dada2 is the forward reads are denoised and the reverse reads denoised separately (so the error model for each is consistent, e.g. its the forward-read error model across the full forward reads). Then reads are merged.

It’s a solvable problem, but also not entirely trivial, and we just don’t have the time to devote to it given how well the merge-later workflow works, including for ITS. If we get time (i.e. $upport) its something I’d like to revisit though because merge-first is more convenient for ITS in particular.

2 Likes

@Adam_Rivers, I think some of your questions for me were already answered in the discussion here, but I wanted to follow up to be sure that you’re not waiting on input. Please let me know if I’ve missed anything.

I think this would be a very useful workflow to support.

Yes, just to clarify, if SampleData[PairedEndSequencesWithQuality] is provided to denoise-single, the reverse reads are just ignored. This is for convenience so the user can create one SampleData[PairedEndSequencesWithQuality] artifact, and use it with denoise methods that take single or paired end reads.

Pre-joined reads aren’t accepted by DADA2 (I think that was already clear from some of the other discussion on this thread, but just wanted to reply to this question specifically).

Yes, that should be the only change that you need to make.

1 Like