How to create a dereplicated sequence reference database for taxonomy classification: case of COI

colinbrislawn · November 1, 2018, 7:49pm

Hello Devon,

I agree with Nick that this is outside the scope of Qiime 2, but I also saw your comment on the vsearch GitHub issues and I wanted to 'qiime in' about your desire to, as you say,

preserve as much taxonomic information

Are you sure that method won't over-classify your reads?

Robert Edgar, the author of muscle and usearch, makes the argument that that over-classification is one of the largest pitfalls of many classifiers, which produce genus level identifications that are essentially worthless and misleading to researchers.

As you focus on minimizing Type II errors in classifications, I'm equally worried about increasing your Type I error rate.

This is a tricky problem and I'm happy you brought it to the qiime forums!

Colin