does qiime downstream of feature-classifier need Evalue and bitscore?

I am wondering whether qiime steps after the classifier use the blastn 6 format Evalue and bitscore columns in any way (alignment quality filtering?!).
When it does not, it may be possible to substitute blastn/vsearch by minimap2 for long read assignments but minimap does not produce Evalues (bitscores) so if these are required, it does not qualify.
Thanks for your info

Good to see you again Stephane,

Let’s open up the source code and see what that plugin is doing under the hood. :blue_book: -> :open_book:

It looks like after running vsearch and blast, the _consensus_assignments method is called.

That in turn calls _compute_consensus_annotations, which is further down the same file.

Here, I don’t see any ‘alignment quality filtering’, but keep in mind that both vsearch and blast let you define filters to control what they consider as a hit.

def classify_consensus_vsearch(query: DNAFASTAFormat,
                               reference_reads: DNAFASTAFormat,
                               reference_taxonomy: pd.Series,
                               maxaccepts: int = 10,
                               perc_identity: float = 0.8,
                               query_cov: float = 0.8,
                               strand: str = 'both',
                               min_consensus: float = 0.51,
                               unassignable_label: str =
                               search_exact: bool = False,
                               top_hits_only: bool = False,
                               threads: str = 1) -> pd.DataFrame:

You can see that vsearch will not report a hit unless it’s 80% similar and has 80% coverage. I think this is what you were asking about.

Keep in mind that vsearch does not report evalues either, so this should not conflict with minimap2.


P.S. I feel a little pedantic saying this, but e-values are different than bitscores. Bitscore are a weighted total of the matches and mismatches in the alignment. E-values are the chances of seeing a bitscore that good by chance alone in the database. E-values are probabilistic based on the size and complexity of the database and also the size and complexity of the query. I think this is why modern programs often don’t report e-values.

1 Like

Thanks a lot Colin,

So when I succeed to derive a blastn 6 quasi format from the minimap2 PAF format, I may plug minimap into qiime provided I can somehow build a stringency filter to keep only long matches (80%) out of the mapping results. Well, this is still some way to go but I’ll try.

PAF is a lot like blast6. Here’s all the PAF columns:

I wanted to check in about your project a little more. I’ve found vsearch plenty fast, even at a very large scale. How many reads are you looking to assign taxonomy to? Have you reduced their number by dereplication, denoising, or high-identity clustering?