Help with filtering phylogeny based on feature table: Not a single fragment of your table is part of your tree. The resulting table would be empty.

Hi, I'm running QIIME2 v2024.5.1 on our university computer cluster. I've got a problem that I'm not sure you can solve, but I'm hoping someone can give me some tools to help diagnose where the issue is.

I've gone through a pipeline of making my feature tables, rep-seqs, metadata sets, and tree using sepp. In this next step, I wish to filter my tree such that it only includes sequences in my table. However, I keep getting this error:

Plugin error from fragment-insertion:

  Not a single fragment of your table is part of your tree. The resulting table would be empty.

Debug info has been saved to /tmp/qiime2-q2cli-err-npyvhk05.log

I think it is a formatting issue with the metadata files somewhere along the pipeline, and I suspect there's a hidden character or extra space that is the wrench in the system. It was kind of my fault: for a good reason, I had to change the feature names (I know this is not recommended, but I can't get around it).

My question is whether there is a way to explore or extract the sequence fragment names in .qza tree to match manually to my table. I can get my seq fragment names from the feature table by making a .qzv file, opening it in view.qiime2.org, and downloading the .csv. Is there another way?

I have no idea what to do with the tree except export it, and I'm not sure if I'm introducing (or missing) hidden characters by doing the export. I did some exploring of the exported newick tree, and I can find some of the same features that are in the newick tree (using grep) as well as in the Feature Detail of my feature table .qzv file. So my sequences were definitely inserted into the tree. In fact, grep says that exactly 10,779 of my 11,375 features were inserted into my tree.

Suggestions on how to troubleshoot this in qiime?

Hi @kristend,
Could you send the command that you are running?

Hi @kristend - Here are some commands in q2 that might be able to help with your problem.

Export the feature IDs from your feature table as below:

qiime tools export
--input-path feature-table.qza
--output-path exported-feature-table

There's a .biom file in the directory, just change it to CSV to read it (or use the command below).

biom convert
-i exported-feature-table/feature-table.biom
-o exported-feature-table/feature-table.csv
--to-csv

Then extract the feature IDs:

cut -f1 exported-feature-table/feature-table.csv | tail -n +2 > feature-table-ids.txt

As you said you already have your Newick tree, simply use the following command to extract feature IDs:

grep -oP '\b[\w-]+(?=[,:;])' exported-tree/tree.nwk > tree-feature-ids.txt

Finally, compare the lists:

sort feature-table-ids.txt > sorted-feature-table-ids.txt
sort tree-feature-ids.txt > sorted-tree-feature-ids.txt
comm -23 sorted-feature-table-ids.txt sorted-tree-feature-ids.txt > missing-ids.txt

Once you have those IDs which are missing from the tree, you should be able to check if they are low quality sequences or chimeric; not sure if they are accepted when creating a tree.

Hope some of that helps!