Hello everyone,
I’m working on reference-based chimera detecton for fungal ITS sequecne with uchime-ref in QIIME2. There are two kind of databsets avaliable in UNITE(https://unite.ut.ee/repository.php) as far as I konw, but I’m not sure which database should be used in QIIME2. I aware of that it is vsearch plugin in QIIME2 responsible for this analysis, and vsearch is similar to usearch. however, there is no description for vsearch about the datasets.
Could anyone give me a suggestion?
Thanks
Fang
Brief description of datasets
Dataset1. UCHIME reference dataset
This is a release of UNITE/INSDC representative/reference sequences for use in reference-based chimera detection of fungal ITS sequences in UCHIME and similar programs.
Dataset2. USEARCH/UTAX release ((https://doi.org/10.15156/BIO/786345))
Reference file for USEARCH/UTAX.
You should download the QIIME release from the page that you linked to. Those sequences will need to be imported into QIIME as described in this tutorial (which also describes how to train a classifier on those sequences if you want to use a naive Bayes classifier on these sequences. You could also download a trained classifier from this tutorial:
That is incorrect. We do have a taxonomy classification method that uses vsearch for alignment followed by a native LCA algorithm to find consensus taxonomies, but that is in the q2-feature-classifier plugin (along with other taxonomy classifiers), NOT the q2-vsearch plugin.
Yes, I am aware. If you are using reference-based chimera checking you need to download the sequences in the correct format. You were looking at usearch format, which is not what you want. Use the QIIME release format from UNITE.
Hello @Nicholas_Bokulich
Thanks a lot for your suggestion. I'm sorry that my first reply is half finished (also my first time to reply) due to my accidental operation, and I don't know how to recall or reedit it, but the main ideas were not missing.
I'm still confused a little. I oringinally used the QIIME release to do the reference-based chimera checking, then I found that the introduction of the UCHIME/USEARCH/UTAX reference datasets(UNITE - Resources) is specially prepared for the " reference-based chimera detection of fungal ITS sequences in UCHIME and similar programs". So, I'm wondering if QIIME2 (q2-vsearch) belongs to the "similar programs" or is more similar to USEARCH program? or you mean QIIME2 release should be used both in training classifier and reference-based chimera checking ?
No. It is fair to be confused about that, since the method is called qiime vsearch uchime-ref. But that is the wrong format for using with QIIME 2.
Yes. The same QIIME formatted files should be used for both. You would use the fasta file (imported as a FeatureData[Sequence] artifact in QIIME 2) as the reference with the qiime vsearch uchime-ref method.