Pick up a Right Alignemnt Method.

TurboQiimer · December 15, 2020, 12:21pm

Dear admins,
Many researchers are working on 16S gene. What factors are required to apply global (Vesearch) or local (Blast+) alignment? What criteria are there to pick up a right one?
Thanks
Qiimer

ChrisKeefe · December 15, 2020, 3:50pm

The papers (BLAST, VSEARCH) are a good place to start. The VSEARCH paper in particular spends some time comparing its (optimal) algorithm to BLAST's optimal-with-certain-parameters approach.

TurboQiimer · December 16, 2020, 1:08pm

Thanks, sir.
Just a simple question I have. Many people use the Vesearch classifier for 16S analysis. Would it be problematic to use the Blast+ classifier for analysis 16S reads?

ChrisKeefe · December 16, 2020, 4:51pm

I'm not an expert on this, @TurboQiimer, but BLAST has been around for a long time, and I'd be surprised if there's a "problem" with using it in this context. As with most things, I suspect your decision is going to be one of "what's the best tool for the job?"

TurboQiimer · December 17, 2020, 4:30pm

BLAST is a popular tool in NCBI website. There is no any doubt that it is a useful and effective tool in alignment, but based on survey in Qiime2 forum, I recognized Vesearch is widely used in classification. I read the two papers you suggested to me. The part I needed was mentioned in mathematics or algorithmic way, that I could not catch more. Yeah... you are right! I want simply to know which one is the best, Vesrach or Blast, although it sounds BLAST output represents that it works well in my case! I am just in dilemma, indeed!
Thanks your guidence in advance.
Qiimer

colinbrislawn · December 17, 2020, 6:39pm

I'm not really sure how to answer this question, because it depends on how you define 'best'. If you were asking about protein alignment, I would say blast (because vsearch does not do protein alignment ).

In your comparison, are you using positive controls with a known taxonomic composition? How does the vsearch and blast+ classifications compare to your expected results?

Colin

TurboQiimer · December 22, 2020, 1:23pm

If you look at this, you will see Blast generates more and mote taxa rather than Vesearch!!!
As a researcher, which do you prefer and offer? The results are very different!!!
As old saying"A picture is worth a thousand words".
By the way, in reality, which the result do I have in my samples? Belonged to Blast+ or Vesearch?
Qiimer

colinbrislawn · December 22, 2020, 3:04pm

This is the big question. More is not always better.

Are these positive controls with a known composition?

Colin

TurboQiimer · December 22, 2020, 5:26pm

What do you mean by that? You meant Vesearch-related result is accepted?

Could you please explain a little bit more? I am not sure I caught your question's concept!

Thank you very much you replied me as I know new year is around the corner!
Qiimer

colinbrislawn · December 22, 2020, 6:20pm

Sure thing.

You have been asking about the taxonomy classification results of blast and vsearch, and have shown that they produce different results.

in reality, which the result do I have in my samples?

One way to answer this question is to use a mock community with a known mix of microbes. For example, in this paper, they simulate microbial samples so they know the exact composition of each sample. Then they analyse these samples in different ways, and just like you, they observed that different analysis methods produced slightly different results.

So what results is best? Well, because they knew the true composition of their mock samples, they can find the analysis method that most closely matches the true composition truth.

Here is an example of a mock sample being classified by two programs

Taxonomy	True % in mock sample	program1	program2
taxa1	50%	49%	35%
taxa2	40%	39%	35%
taxa3	10%	12%	30%

I want simply to know which one is the best

Based on these example results, I would say program 1 is the best, as it most closely matches the expected composition of the mock sample.

Here is an example of a real sample being classified by the same two programs.

Taxonomy	Sample1	program1	program2
taxa1	?	55%	25%
taxa2	?	27%	27%
taxa3	?	18%	48%

I want simply to know which one is the best

We don't know what program is best because we do not know the true composition of Sample1.

So are we stuck? Based on these two examples, would you use program1 or program2?