Feature-classifier extract-reads

How large is midori? My guess is "very large" because this method can take some time on very large databases :grin:

Absolutely! Longer primers and more degeneracy will increase runtime even more

You can use the --p-n-jobs parameter to run this in parallel, reducing the wait (though at 48 hr in I don't know if it's worth just waiting until completion or how much longer you have left to wait :man_shrugging:)

Generally trimming is beneficial β€” there are a few publications (for 16S) showing improved classification accuracy, but this will also reduce runtime downstream so is a worthwhile step to perform if this is a trimmed database that you will use again and again (for classification, alignment, filtering, or whatever)

For COI specifically, @devonorourke tested this here:

and describes steps to reproduce that database here:

1 Like