Hi! I am interested in using the q2-fragment-insertion plugin and have some questions about it. I want to give context first so you can fully understand my problem and perhaps suggest alternatives if this isn't the best approach. I am working on a meta-analysis focused on the diversity and distribution of anaerobic ciliates. I have run ~30 different Illumina 18S rRNA amplicon datasets through QIIME2 preprocessing and Deblur and now have ASVs from both the V4 and V9 regions of the 18S rRNA gene. I took steps to reduce study specific bias such as trimming to the same primer set for the two 18S regions and trimming to the same length when running Deblur.
Currently I am only interested in sequences that originate from anaerobic ciliates, so I would like to pull only those ASVs out. I know that I can use
qiime taxa filter-table to filter by taxonomy. However, this is not ideal, because despite my best efforts I know that my reference database is not complete for this group. I would also love to potentially discover novel lineages from this data, which I couldn't accomplish this way. I came across q2-fragment-insertion as a great option in my meta-analysis to do phylogeny-based diversity analyses, even across different 18S regions. I also think it's a great option to filter my dataset to only include the sequences I care about, by running
qiime fragment-insertion sepp using my reference database.
The issue is that anaerobic ciliates are weird and while some lineages are completely anaerobic at the class level, other classes have both anaerobic and aerobic representatives, even differing at the genus level. So it seems tough to pick a % sequence identity threshold that would work across the board and output only anaerobic sequences. What I am thinking of doing is using a reference database that includes all ciliates (likely EukRef-Ciliophora with my own manually curated sequences added), and running my merged ASV table against that database with
qiime fragment-insertion sepp. Then I could examine the resulting phylogeny and identify which features are definitely or at least extremely likely anaerobic.
My main question is: once I have identified the features that I want to pull out to analyze further in this way, how do I go about pulling those out? is there a way to make selections from the tree and obtain only those features? Or trim unwanted sequences out of the tree? And then use the resulting filtered tree with
qiime fragment-insertion filter-features to get my filtered feature table with only sequences of interest? If this is not possible within QIIME2, do you know of any other programs where it's possible to input a tree, identify/highlight sequences that you want based on the phylogeny (by eye, like in some kind of GUI), and get a list of features corresponding to those phylogentic positions, or the filtered tree containing only those features?
Now that I am typing this, I think that the "pruning" functionality in iTOL (https://itol.embl.de/) might be a good fit - described in the following way on their website:
"Pruning the tree: pruning is a process of selecting one or several branches from the original tree and creating a new, smaller tree. You can access this function directly by pressing the key 'P' while clicking on a branch/leaf."
Would there be any issues with utilizing that tool to obtain my desired tree for downstream analysis?
Sorry for the very long post! I hope what I am asking makes sense. Thanks for reading