Should I use aligned or non-aliged representative sequences to train my own classifier

When I check the two file, it seems rep_set is the right one.

2 Likes

@Mehrbod_Estaki Hi, I wanted to train my classifier, and I downloaded the reference dataset Greengenes (16S rRNA) 13_8 on Data resources page of qiime2. There two folders of reference sequence, rep_set and rep_set_aligned. Could you please tell me which should I use? Thanks

2 Likes

Hi @Kevin,
Sounds like you already figured it out, use the non-aligned one.
Good luck!

Hi @Mehrbod_Estaki
Just out of curiosity, what would happen if you tried to import and use VSEARCH taxonomy classifier with the aligned rep sequences from Greengenes?

1 Like

Hi @KMaki,
The classifiers require a specific (non-aligned) file format FeatureData[Sequence], so if you tried to provide a FeatureData[AlignedSequence] instead you would get a warning telling you that the file format is not compatible for that plugin. This is one of the important designs of QIIME 2 semantic types, to prevent wrong file inputs to be used. Give it a try :slight_smile:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.