Should I use aligned or non-aliged representative sequences to train my own classifier

Kevin · April 17, 2020, 2:10pm

When I check the two file, it seems rep_set is the right one.

Kevin · April 17, 2020, 2:10pm

@Mehrbod_Estaki Hi, I wanted to train my classifier, and I downloaded the reference dataset Greengenes (16S rRNA) 13_8 on Data resources page of qiime2. There two folders of reference sequence, rep_set and rep_set_aligned. Could you please tell me which should I use? Thanks

Mehrbod_Estaki · April 18, 2020, 1:00am

Hi @Kevin,
Sounds like you already figured it out, use the non-aligned one.
Good luck!

KMaki · April 29, 2020, 5:55pm

Hi @Mehrbod_Estaki
Just out of curiosity, what would happen if you tried to import and use VSEARCH taxonomy classifier with the aligned rep sequences from Greengenes?

Mehrbod_Estaki · April 30, 2020, 6:22am

Hi @KMaki,
The classifiers require a specific (non-aligned) file format FeatureData[Sequence], so if you tried to provide a FeatureData[AlignedSequence] instead you would get a warning telling you that the file format is not compatible for that plugin. This is one of the important designs of QIIME 2 semantic types, to prevent wrong file inputs to be used. Give it a try

system · May 31, 2020, 12:22pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.