Errors when importing biomtable and tree files created by Ucluster into QIIME2

Thanks @karren_owl!

That error is warning you that your feature IDs in your phylogenetic tree do not match the feature IDs found in your feature table --- they need to match in order for the phylogenetic metrics to look up phylogenetic distances in the tree when evaluating your feature table.

What do the first few lines of /path/to/tree.nwk look like? What are some of the feature IDs listed in /path/to/feature_table.qzv? :qiime2:

Hi Matthew,

I think the feature IDs in my phylogenetic tree match with the feature IDs in my feature table. I checked some of the IDs shown in the error message. They are actually in both of my feature table and tree. And both of my feature table and tree have the same numbers of IDs. I am really not sure what the problem is.

My first few lines of my tree look like:
((((((May12.50v_9315:0.03482,((Jun22.75v_1205:0.07334,(Jul11.0v_7885:0.00053,(Jun22.0v_1022:0.04590,Jul27.5v_6359:0.01931)0.442:0.00054)0.872:0.00053)0.982:0.03354,(((Jul11.50v_3904:0.07224,(Jul11.5v_1691:0.05089,Aug5.75v_18546:0.03200)0.932:0.04150)0.885:0.03220,((((((Jul27.50v_1972:0.00055,(Jun22.0v_2832:0.03786,Jul11.50v_2380:0.04936)0.745:0.00379)0.999:0.08336,(((((May12.75v_1550:0.05003,Jun22.75v_1877:0.03310)0.871:0.01112,(Jun8.50v_20071:0.01548,(May12.75v_2489:0.00055,Jun8.50v_19871:0.01154)0.730:0.00055)0.329:0.00382)0.744:0.00491,Sep8.0v_6764:0.03565)0.890:0.02016,(Sep8.50v_5265:0.01376,(May12.25v_1963:0.01914,May12.50v_8190:0.08329)0.994:0.06513)0.246:0.01181)0.938:0.04263,Jun22.50v_1553:0.07598)0.595:0.03386)0.697:0.03495,((((May12.50v_1767:0.00781,((((((((Jul11.25v_8391:0.03125,Jul27.25v_1884:0.02335)0.723:0.00371,(Sep8.5v_1714:0.03924,(Jul27.5v_8705:0.05389,Sep8.25v_11666:0.04018)0.899:0.00995)0.000:0.00054)0.868:0.00378,((Jul11.25v_1108:0.05726,((Jun8.75v_2647:0.02690,Sep8.50v_313:0.05828)0.753:0.00431,Jul27.5v_10561:0.02713)0.796:0.00647)0.431:0.00790,Sep8.5v_5943:0.03480)0.837:0.00054)0.929:0.00054,((Jul27.50v_1414:0.00055,((Aug5.5v_20921:0.03982,Jun22.5v_11452:0.05805)0.838:0.00791,((Jul11.0v_190:0.04850,(Jul27.5v_7858:0.05089,Sep8.0v_10072:0.06589)0.451:0.01482)0.000:0.00054,Jul11.5v_920:0.04032)0.000:0.00053)0.903:0.00055)0.000:0.00055,Jul27.75v_2165:0.04137)0.734:0.00746)1.000:0.06

The feature IDs in the feature_table.qzv look like:
Jul27.50v_1353
Jun8.75v_6612
Jun22.75v_1425
May12.75v_4714
Jun22.5v_4333
Aug5.25v_4805
May12.75v_1961
Jul27.50v_1414
Aug5.25v_4666
Jul11.5v_6158
Jun22.50v_6292

The error above mentions 476 mismatched features --- just curious, how many features are present in your feature table? You can use the viz you generated above (although, your command is all messed up, looks like copy-and-paste mangled a few commands together into one...)

I ask because you have a lot of similar feature IDs, but they aren't quite identical. For example:

  • Jul27.50v_1972
  • Jul27.50v_1414
  • Jul27.50v_1353

Both of my feature table and the tree has 923 features. There are no mismatched features when I looked into both of the feature table and the tree.
For the similar feature IDs, they represent different sequences. For example, Jul27.50v_1972 is one of the sequences from the sample July27th, 50m depth. Jul27.50v_1414 is another sequence from the sample July27th, 50m depth.

The last piece of my command looks like this:

qiime feature-table summarize \
  --i-table /path/to/feature_table.qza \
  --o-visualization /path/to/feature_table.qzv \
  --m-sample-metadata-file /path/to/metadata 
qiime diversity core-metrics-phylogenetic \
  --i-phylogeny  /path/to/rooted_tree.qza \
  --i-table /path/to/feature_table.qza \
  --p-sampling-depth 380 \
  --m-metadata-file /path/to/metadata \
  --output-dir /path/to/core-metrics-results

Sorry for the unclear paste.
Do you mean that QIIME2 is confused by the feature IDs?

:+1:

Hmm, well, that isn't what QIIME 2 is seeing --- the error you posted above indicates that the feature IDs are mismatched between the two Artifacts.

In QIIME 2, ID matching is performed by directly comparing the ID string, character for character --- the IDs must match 100% in order to be collated.

When we see mismatched ID issues here it is usually because of a bookkeeping issue when importing data.

If you can't find any problems with your file bookkeeping, please send me a DM with download links to 3 files necessary to re-run your qiime diversity core-metrics-phylogenetic command above. Thanks! :qiime2:

Hi Matthew,

I still can't find any problems with my file bookkeeping, could you please have a look on my files?
The threes files necessary to run my qiime diversity core-metrics-phylogenetic command is attached below.

Thank you!

feature-table.qza (30.9 KB)
rooted-tree.qza (17.9 KB)
anotop_rtpr_qiime2_matadata_period.txt (6.5 KB)

1 Like

The IDs for your features in the table have underscores:

 'Sep8.5v_9015',
 'Sep8.5v_9043',
 'Sep8.5v_9109',
 'Sep8.5v_9804',

While the tree has spaces:

 'Sep8.5v 9015'
 'Sep8.5v 9043'
 'Sep8.5v 9109'
 'Sep8.5v 9804'

That’s weird because I can see that the IDs have underscores in both of the unrooted tree file I imported into QIIME2 and the rooted tree file I exported from QIIME2.

Can you send the unrooted tree, too?

By the way, if your tree is already rooted, import it as such, you can skip midpoint rooting.

Here it is.
new_tree.tre.zip (11.7 KB)

My tree was not rooted.

Thanks @karren_owl --- I found the culprit. q2-phylogeny is using scikit-bio behind the scenes to do the midpoint rooting --- I was reading up on the Newick file format (used to represent the tree here), check this tidbit out:

In this format, underscores are treated as spaces --- in order to include a literal underscore, you must precede the underscore with a single-quote '. You can also surround the entire ID with single quotes.

You can learn more about the format here: Newick format (skbio.io.format.newick) — scikit-bio 0.5.5 documentation

1 Like

Hi Mathew,

I tried the qiime importing again using new biomtable and tree file with id quoted with single quotes. However, it runs to the same error again:
All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=515): ‘May12.50v_1549’ ‘Jul11.25v_4600’ ‘Jun22.0v_544
2’ ‘Aug5.50v_20401’ ‘May12.75v_1961’ ‘Jul11.50v_9708’ ‘Jul27.25v_4983’ ‘Jul27.50v_1972’