Errors when importing biomtable and tree files created by Ucluster into QIIME2

Hello everyone,

I am working on homologous amino acid sequences of a functional protein. I would like to analyze my data in amino acid level instead of nucleotide level. I know that QIIME2 does not take amino acid sequences but it can take biomtable and tree built outside of QIIME and run analysis: Import amino acid sequences?
I clustered my sequences using Ucluster fast and I got a uc file and a centroid Fasta file. Uc file was transformed to a biom file using uc2biom from the website: http://biom-format.org/index.html. Sequences in the centroid file were used as OTUs. Phylogenetic tree was built using the OTUs by FastTree. The biomtable and phylogenetic tree were converted into QIIME2 artifacts successfully. However, building core matrix was failed and the error showed that some sequence names in biomtable are missing in the tree. I checked both of my biomtable and tree file and found that they should match with each other. I also could successfully import my OTU table (converted from the biomtable) and the phylogenetic tree into phyloseq in R. I wonder if it is a file format problem? What can I do to import my data into QIIME2? It would be great if I can use QIIME2 to do some downstream analysis.

Hey there @karren_owl! Can you provide some more specifics? What commands did you run and what were the error messages you received?

Hi Matthew,

The command I used to cluster my sequences is:

usearch -cluster_fast /path/to/sequence_file -id 0.95 -centroids /path/to/centroid_output -uc /path/to/uc_output

The command I used to convert the uc_output to biomtable is:

biom from-uc -i /path/to/uc_output -o /path/to/biom

The command I used to align the centroids is:

mafft /path/to/centroid_output > /path/to/centroid_align_output

The command I used to build the phylogenetic tree is:

FastTree /path/to/centroid_align_output > /path/to/tree.nwk

The command I used to import my biomtable and tree files into Qiime2 is:

qiime tools import \
  --input-path /path/to/biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV100Format \
  --output-path /path/to/feature_table.qza
qiime tools import \
  --input-path /path/to/tree.nwk \
  --output-path /path/to/tree.qza \
  --type 'Phylogeny[Unrooted]'
qiime phylogeny midpoint-root \
  --i-tree  /path/to/tree.qza \
  --o-rooted-tree  /path/to/rooted_tree.qza
qiime feature-table summarize \
  --i-table /path/to/feature_table.qza \
  --o-visualization /path/to/feature_table.qzv \
  --m-sample-metadata-file /path/to/metadata \
  --i-phylogeny  /path/to/rooted_tree.qza \
  --i-table /path/to/feature_table.qza \
  --p-sampling-depth 380 \
  --m-metadata-file /path/to/metadata \
  --output-dir /path/to/core-metrics-results

The error message is:

All ``feature_ids`` must be present as tip names in ``phylogeny``. ``feature_ids`` not corresponding to tip names (n=476): Jun8.75v_2066 May12.75v_1069 Jun22.0v_4164 Jul11.25v_1903 May12.25v_4952 Sep8.25v_1815 Aug5.50v_12201 Jul27.75v_635 ...

Thanks @karren_owl!

That error is warning you that your feature IDs in your phylogenetic tree do not match the feature IDs found in your feature table --- they need to match in order for the phylogenetic metrics to look up phylogenetic distances in the tree when evaluating your feature table.

What do the first few lines of /path/to/tree.nwk look like? What are some of the feature IDs listed in /path/to/feature_table.qzv? :qiime2:

Hi Matthew,

I think the feature IDs in my phylogenetic tree match with the feature IDs in my feature table. I checked some of the IDs shown in the error message. They are actually in both of my feature table and tree. And both of my feature table and tree have the same numbers of IDs. I am really not sure what the problem is.

My first few lines of my tree look like:
((((((May12.50v_9315:0.03482,((Jun22.75v_1205:0.07334,(Jul11.0v_7885:0.00053,(Jun22.0v_1022:0.04590,Jul27.5v_6359:0.01931)0.442:0.00054)0.872:0.00053)0.982:0.03354,(((Jul11.50v_3904:0.07224,(Jul11.5v_1691:0.05089,Aug5.75v_18546:0.03200)0.932:0.04150)0.885:0.03220,((((((Jul27.50v_1972:0.00055,(Jun22.0v_2832:0.03786,Jul11.50v_2380:0.04936)0.745:0.00379)0.999:0.08336,(((((May12.75v_1550:0.05003,Jun22.75v_1877:0.03310)0.871:0.01112,(Jun8.50v_20071:0.01548,(May12.75v_2489:0.00055,Jun8.50v_19871:0.01154)0.730:0.00055)0.329:0.00382)0.744:0.00491,Sep8.0v_6764:0.03565)0.890:0.02016,(Sep8.50v_5265:0.01376,(May12.25v_1963:0.01914,May12.50v_8190:0.08329)0.994:0.06513)0.246:0.01181)0.938:0.04263,Jun22.50v_1553:0.07598)0.595:0.03386)0.697:0.03495,((((May12.50v_1767:0.00781,((((((((Jul11.25v_8391:0.03125,Jul27.25v_1884:0.02335)0.723:0.00371,(Sep8.5v_1714:0.03924,(Jul27.5v_8705:0.05389,Sep8.25v_11666:0.04018)0.899:0.00995)0.000:0.00054)0.868:0.00378,((Jul11.25v_1108:0.05726,((Jun8.75v_2647:0.02690,Sep8.50v_313:0.05828)0.753:0.00431,Jul27.5v_10561:0.02713)0.796:0.00647)0.431:0.00790,Sep8.5v_5943:0.03480)0.837:0.00054)0.929:0.00054,((Jul27.50v_1414:0.00055,((Aug5.5v_20921:0.03982,Jun22.5v_11452:0.05805)0.838:0.00791,((Jul11.0v_190:0.04850,(Jul27.5v_7858:0.05089,Sep8.0v_10072:0.06589)0.451:0.01482)0.000:0.00054,Jul11.5v_920:0.04032)0.000:0.00053)0.903:0.00055)0.000:0.00055,Jul27.75v_2165:0.04137)0.734:0.00746)1.000:0.06

The feature IDs in the feature_table.qzv look like:
Jul27.50v_1353
Jun8.75v_6612
Jun22.75v_1425
May12.75v_4714
Jun22.5v_4333
Aug5.25v_4805
May12.75v_1961
Jul27.50v_1414
Aug5.25v_4666
Jul11.5v_6158
Jun22.50v_6292

The error above mentions 476 mismatched features --- just curious, how many features are present in your feature table? You can use the viz you generated above (although, your command is all messed up, looks like copy-and-paste mangled a few commands together into one...)

I ask because you have a lot of similar feature IDs, but they aren't quite identical. For example:

  • Jul27.50v_1972
  • Jul27.50v_1414
  • Jul27.50v_1353

Both of my feature table and the tree has 923 features. There are no mismatched features when I looked into both of the feature table and the tree.
For the similar feature IDs, they represent different sequences. For example, Jul27.50v_1972 is one of the sequences from the sample July27th, 50m depth. Jul27.50v_1414 is another sequence from the sample July27th, 50m depth.

The last piece of my command looks like this:

qiime feature-table summarize \
  --i-table /path/to/feature_table.qza \
  --o-visualization /path/to/feature_table.qzv \
  --m-sample-metadata-file /path/to/metadata 
qiime diversity core-metrics-phylogenetic \
  --i-phylogeny  /path/to/rooted_tree.qza \
  --i-table /path/to/feature_table.qza \
  --p-sampling-depth 380 \
  --m-metadata-file /path/to/metadata \
  --output-dir /path/to/core-metrics-results

Sorry for the unclear paste.
Do you mean that QIIME2 is confused by the feature IDs?

:+1:

Hmm, well, that isn't what QIIME 2 is seeing --- the error you posted above indicates that the feature IDs are mismatched between the two Artifacts.

In QIIME 2, ID matching is performed by directly comparing the ID string, character for character --- the IDs must match 100% in order to be collated.

When we see mismatched ID issues here it is usually because of a bookkeeping issue when importing data.

If you can't find any problems with your file bookkeeping, please send me a DM with download links to 3 files necessary to re-run your qiime diversity core-metrics-phylogenetic command above. Thanks! :qiime2:

Hi Matthew,

I still can't find any problems with my file bookkeeping, could you please have a look on my files?
The threes files necessary to run my qiime diversity core-metrics-phylogenetic command is attached below.

Thank you!

feature-table.qza (30.9 KB)
rooted-tree.qza (17.9 KB)
anotop_rtpr_qiime2_matadata_period.txt (6.5 KB)

1 Like

The IDs for your features in the table have underscores:

 'Sep8.5v_9015',
 'Sep8.5v_9043',
 'Sep8.5v_9109',
 'Sep8.5v_9804',

While the tree has spaces:

 'Sep8.5v 9015'
 'Sep8.5v 9043'
 'Sep8.5v 9109'
 'Sep8.5v 9804'

That’s weird because I can see that the IDs have underscores in both of the unrooted tree file I imported into QIIME2 and the rooted tree file I exported from QIIME2.

Can you send the unrooted tree, too?

By the way, if your tree is already rooted, import it as such, you can skip midpoint rooting.

Here it is.
new_tree.tre.zip (11.7 KB)

My tree was not rooted.