Classifier training parameters

devonorourke · December 6, 2018, 4:16pm

Just curious if there is any idea out there for how long one might expect a job to take if you're trying to classify 2 million sequences with the naive-Bayes approach. This thread made it sound like I could be waiting a long time.

I know that the 2 million sequences are unique, but their associated taxonomies are not necessarily unique (ie. one taxa can be represented multiple times with distinct sequences). Perhaps I should try to pare down the number of samples.

When the classifiers were trained for microbiome communities, how many samples were included?