Placing filtered ASV sequences into guide tree, per clade

Many years ago, I used to use QIIME 1 with pynast to filter sequence sets prior to EPA placement with RAxML (now I would use epa-ng), and I'm looking for a practical way to do this now.

inputs: a set of ASVs that are pre-filtered based on interest, guide alignments and trees for specific clades, again selected based on interest

caveats: assume that we don't know a priori which ASVs fall w/in which clade, and are looking for a practical way to filter the set of ASVs prior to alignment and placement w/in each clade of interest. I'd like to do this for 8+ clades and two genes, so a smooth approach that isn't too hacky or labour intensive would be ideal

goals: given the input ASVs, align them to a template alignment based on similarity (i.e. dissimilar sequences are discarded during the alignment process) and use the resulting alignment to place the ASVs into a guide tree

does anyone have a working method for doing this? specifically: i don't know a good piece of software to use for discarding dissimilar ASVs when aligning to a template. tree placement is trivial once i have the correct inputs for that step

thanks in advance

2 Likes

Hello @morien,

I'm not familiar with these tools or such a workflow, but if I'm understanding correctly maybe the following would work?

  • obtain a representative sequence from each clade
  • align each asv to each clade's representative sequence
  • for each asv choose the clade the representative sequence of which resulted in the best alignment as the tree to perform placement in

There are obviously different ways to perform alignment (blast, vsearch, etc.) and assess alignment quality (blast tool suite, samtools, etc.).

This is a reasonable approach and it is basically what I ended up doing (blastn against reference sequence set, then filter ASVs for alignment to reference alignment based on their %similarity in the blast results). I was hoping there was still a purpose built tool for this that someone knew about, but this method worked okay.

1 Like

Maybe one of these two plugins?

https://docs.qiime2.org/2024.10/plugins/available/rescript/extract-seq-segments/
--o-extracted-sequence-segments returns matches to database

or

https://docs.qiime2.org/2024.10/plugins/available/quality-control/exclude-seqs/
--o-sequence-hits returns matches to database

Align to database, then return reads that align is definitely possible, but there are many different ways to do this...

Let us know if either of those existing plugins seem like a good fit!