using q2-fragment insertion to filter sequences

Hi! I am interested in using the q2-fragment-insertion plugin and have some questions about it. I want to give context first so you can fully understand my problem and perhaps suggest alternatives if this isn't the best approach. I am working on a meta-analysis focused on the diversity and distribution of anaerobic ciliates. I have run ~30 different Illumina 18S rRNA amplicon datasets through QIIME2 preprocessing and Deblur and now have ASVs from both the V4 and V9 regions of the 18S rRNA gene. I took steps to reduce study specific bias such as trimming to the same primer set for the two 18S regions and trimming to the same length when running Deblur.

Currently I am only interested in sequences that originate from anaerobic ciliates, so I would like to pull only those ASVs out. I know that I can use qiime taxa filter-table to filter by taxonomy. However, this is not ideal, because despite my best efforts I know that my reference database is not complete for this group. I would also love to potentially discover novel lineages from this data, which I couldn't accomplish this way. I came across q2-fragment-insertion as a great option in my meta-analysis to do phylogeny-based diversity analyses, even across different 18S regions. I also think it's a great option to filter my dataset to only include the sequences I care about, by running qiime fragment-insertion sepp using my reference database.

The issue is that anaerobic ciliates are weird and while some lineages are completely anaerobic at the class level, other classes have both anaerobic and aerobic representatives, even differing at the genus level. So it seems tough to pick a % sequence identity threshold that would work across the board and output only anaerobic sequences. What I am thinking of doing is using a reference database that includes all ciliates (likely EukRef-Ciliophora with my own manually curated sequences added), and running my merged ASV table against that database with qiime fragment-insertion sepp. Then I could examine the resulting phylogeny and identify which features are definitely or at least extremely likely anaerobic.

My main question is: once I have identified the features that I want to pull out to analyze further in this way, how do I go about pulling those out? is there a way to make selections from the tree and obtain only those features? Or trim unwanted sequences out of the tree? And then use the resulting filtered tree with qiime fragment-insertion filter-features to get my filtered feature table with only sequences of interest? If this is not possible within QIIME2, do you know of any other programs where it's possible to input a tree, identify/highlight sequences that you want based on the phylogeny (by eye, like in some kind of GUI), and get a list of features corresponding to those phylogentic positions, or the filtered tree containing only those features?

Now that I am typing this, I think that the "pruning" functionality in iTOL (https://itol.embl.de/) might be a good fit - described in the following way on their website:
"Pruning the tree: pruning is a process of selecting one or several branches from the original tree and creating a new, smaller tree. You can access this function directly by pressing the key 'P' while clicking on a branch/leaf."
Would there be any issues with utilizing that tool to obtain my desired tree for downstream analysis?

Sorry for the very long post! I hope what I am asking makes sense. Thanks for reading :slight_smile:

Hi @anna-schrecengost, welcome back!

Thanks for your very detailed question, the context is incredibly helpful. I consulted with @Nicholas_Bokulich on this, and have a couple of thoughts to share below.

This is probably your best approach - but also utilizing exclude-seqs at around a 90-94% threshold. You can review and confirm the accuracy of your results by manually inspecting the phylogeny afterwards.

iTol is not something that we are familiar with, but its functionality does sound like something that could be useful as well. You could test that out against your results from the above recommendation to see what the difference in resulting phylogeny is, which could guide your decision moving forward on which tool is best for this use-case.

Hopefully this helps!

Cheers,
Liz

2 Likes

Thank you so much for your response @lizgehret! I am glad this seems like a reasonable approach, and thanks for pointing me toward exclude-seqs, I will definitely utilize that.

After I initally run qiime fragment-insertion sepp and get my resulting phylogeny, I know I will have a lot of non-target sequences in that phylogeny, so I will want to manually insepct the tree and select my target sequences based on their phylogenetic positions. Do you know if there is a way in QIIME to filter ASV tables by a list of feature IDs that I copy from the tree?

Or, if I prune the tree in a different program, and obtain a tree with only my target ASVs (as well as the reference), I could use that tree with qiime fragment-insertion filter-features to obtain an ASV table with only target sequences, and then re-run qiime fragment-insertion sepp with that ASV table --- does that seem reasonable?

Hi @anna-schrecengost,

I'm not sure if this is exactly what you're looking for, but this forum post does go over filtering on specific ASVs in QIIME 2 - which may be useful if you go the QIIME 2 route!

This seems reasonable to me, so if you don't have any luck with the suggestion in that separate forum post above, I would recommend taking this route!

Cheers,
Liz

2 Likes

Hi @lizgehret, great, thank you so much!

1 Like