Non-chimeric reference dataset selection for uchime-ref (reference-based chimera detection) for fungal ITS sequences

Hello everyone,
I’m working on reference-based chimera detecton for fungal ITS sequecne with uchime-ref in QIIME2. There are two kind of databsets avaliable in UNITE(https://unite.ut.ee/repository.php) as far as I konw, but I’m not sure which database should be used in QIIME2. I aware of that it is vsearch plugin in QIIME2 responsible for this analysis, and vsearch is similar to usearch. however, there is no description for vsearch about the datasets.
Could anyone give me a suggestion?
Thanks
Fang

Brief description of datasets
Dataset1. UCHIME reference dataset
This is a release of UNITE/INSDC representative/reference sequences for use in reference-based chimera detection of fungal ITS sequences in UCHIME and similar programs.
Dataset2. USEARCH/UTAX release ((https://doi.org/10.15156/BIO/786345))
Reference file for USEARCH/UTAX.

Hi @Fairytale,

You should download the QIIME release from the page that you linked to. Those sequences will need to be imported into QIIME as described in this tutorial (which also describes how to train a classifier on those sequences if you want to use a naive Bayes classifier on these sequences. You could also download a trained classifier from this tutorial:

That is incorrect. We do have a taxonomy classification method that uses vsearch for alignment followed by a native LCA algorithm to find consensus taxonomies, but that is in the q2-feature-classifier plugin (along with other taxonomy classifiers), NOT the q2-vsearch plugin.

Good luck!

Hello, Nicholas,
Thank you so much for your detailed reply. I just found it.
I'm sorry that I may not make myself clear. I mean I'm doing "chemera checking" analysis with q2-vsearch for fungal ITS sequences, not classification analysis(Training feature classifiers with q2-feature-classifier — QIIME 2 2019.1.0 documentation). Here is the tutorial for de novo chemera chechking (Identifying and filtering chimeric feature sequences with q2-vsearch — QIIME 2 2019.1.0 documentation), but I'd like using reference-based chemera checking. While there is no tutorial or description about the reference dataset selection except this link(uchime-ref: Reference-based chimera filtering with vsearch. — QIIME 2 2019.1.0 documentation

Yes, I am aware. If you are using reference-based chimera checking you need to download the sequences in the correct format. You were looking at usearch format, which is not what you want. Use the QIIME release format from UNITE.

Hello @Nicholas_Bokulich
Thanks a lot for your suggestion. I'm sorry that my first reply is half finished (also my first time to reply) due to my accidental operation, and I don't know how to recall or reedit it, but the main ideas were not missing.

I'm still confused a little. I oringinally used the QIIME release to do the reference-based chimera checking, then I found that the introduction of the UCHIME/USEARCH/UTAX reference datasets(UNITE - Resources) is specially prepared for the " reference-based chimera detection of fungal ITS sequences in UCHIME and similar programs". So, I'm wondering if QIIME2 (q2-vsearch) belongs to the "similar programs" or is more similar to USEARCH program? or you mean QIIME2 release should be used both in training classifier and reference-based chimera checking ?

Best
Fang

No. It is fair to be confused about that, since the method is called qiime vsearch uchime-ref. But that is the wrong format for using with QIIME 2.

Yes. The same QIIME formatted files should be used for both. You would use the fasta file (imported as a FeatureData[Sequence] artifact in QIIME 2) as the reference with the qiime vsearch uchime-ref method.

Good luck!

1 Like

Thank you @Nicholas_Bokulich, I get it now.
So kind of you. Thanks for all your help. :grinning:

Best
Fang

2 Likes

Hello, @Nicholas_Bokulich,
I got one more question about the reference dataset selection. I know the “developer” reference sequences were recommended for the classifier training process (the last part. https://docs.qiime2.org/2019.1/tutorials/feature-classifier/). I’m wondering is there any recommendation about reference dataset selection (developer sequence or the alternative one within theQIIME-compatible release) for reference-based chimera detection and OTU clustering? Or all these three analyses should use the consistent reference dataset? Thanks~

Best
Fang

Either should work fine. You do not need to use the developer version.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.