Beta diversity fails without filtering feature table

John_Chase · October 27, 2016, 10:05pm

Running beta diversity with a phylogenetic metric results in a MissingNodeError. This is not unsurprising because the sequences that went into building the tree were filtered (essentially following the moving pictures tutorial)

dm = diversity.methods.beta_phylogenetic(phylogeny=rooted_tree.rooted_tree, 
                                         table=feat_qiime_100.rarefied_table, 
                                         metric='weighted_unifrac')

MissingNodeError: All ``otu_ids`` must be present as tip names in ``tree``. ``otu_ids`` not corresponding to tip names (n=765): GCA...

The best solution I can think of is something like:

feat_table_filt = feat_table.filter(items=rooted_tree.rooted_tree.??)

Which would result in the table only containing sequences that are also in the tree however I can't figure out how to get the tip names from the tree object.

I have a couple additional thoughts regarding this, first I can see many users running into this issue so it could be nice to allow things that require overlapping IDs to give the option to take the intersection of the two.

Second while I can't speak for others this type of filtering or grouping is something that pandas would be nice for, however doing it here would require creating a pandas object munging it and the returning it to a qiime object, at the moment it is not clear how to "rountrip" to pandas and back (could be cleared up once the documentation is updated).

being able to operate on a view would be fantastic, though this may entail too much overhead or simply not be possible

feat_qiime_100.rarefied_table.view(pd.DataFrame).drop(my_bad_seqs, inplace=True, axis=1)

Finally IMHO otu_ids should likely be changed to feature_ids in the error message

gregcaporaso · October 27, 2016, 10:08pm

@John_Chase, if you're using the 2.0.5 release you can perform this filtering with the filter method in q2-feature-table. If you're using the latest development versions, this is now the filter-table method in q2-phylogeny.

John_Chase · October 27, 2016, 10:19pm

@gregcaporaso will filter be deprecated, should I not be using it? If I can use it how do I get the names of the tips in the tree in order to filter the table?

feat_table_filt = feat_table.filter(items=rooted_tree.rooted_tree.??)

gregcaporaso · October 27, 2016, 10:26pm

No, you can use it, it's just moving. If you're using q2_feature_table's filter in 2.0.5, it is being replaced with q2_phylogeny's filter_table in 2.0.6 (the change is already made in the development version). This method takes a Phylogeny and FeatureTable as input, and filters the FeatureTable to only contain the feature ids that are present as tips in the tree (which I think is what you're trying to accomplish, right?).

If that's not right, can you let me know which version of QIIME you're using now (2.0.5 or the development version)?

John_Chase · October 27, 2016, 10:43pm

Yes, that gets me the first part of my question. How do I get the development version? It looks like the latest update in the changelog is 2.0.5. Am I not looking at the right repository?

thermokarst · October 27, 2016, 10:54pm

Hi @John_Chase, that is the right repo, we have just been bad about updating the changelog lately.

gregcaporaso · October 27, 2016, 10:56pm

@John_Chase, if you don't need stuff that's in the development version, you might just want to wait for 2.0.6 next week. Once 2.0.6 is out, we're commited to putting all interface changes in the ChangeLogs, so it'll be easier for users to track this stuff.

If you do want to install development versions, you would do that by pip installing from the repo zip files (e.g., pip install https://github.com/qiime2/qiime2/archive/master.zip). You'll need to do this for the framework and the plugins that you're using (as well as q2cli if you're using that, but I don't think you are).

John_Chase · October 27, 2016, 10:59pm

That's what I did, the version is still noted as 2.0.5 in the setup.py so I was confused after I installed it why the version hadn't changed. I already upgraded so I'll likely stick with that

gregcaporaso · October 27, 2016, 11:19pm

Ok, sounds good. Get in touch if you run into other issues. (We'll also be sure to be more careful about the post-release version number updates beginning with 2.0.6. Sorry for the confusion.)

In reply to your other questions:

I agree - I created an issue for this.

This is what QIIME 1 did, but it did it behind the scenes so users wouldn't know that their UniFrac calculation (for example) wasn't being computed on all the OTUs in their OTU table. This could also be problematic because the table wouldn't be rarefied anymore (if some OTUs are dropped, then some samples could have more sequences than others). In QIIME 2 I want the users to be aware of this issue, hence the explicit filtering of the table, which should be applied before rarefaction.

John_Chase · October 27, 2016, 11:30pm

That's a good point, I 100% agree, even a flag doesn't entire make it clear what is happening.

What would really be ideal is to be able to munge the data with pandas and the easily return to qiime. Filtering is one of the simpler tasks, however we will often group samples with a bit more complex pandas code and then use that resulting table in the diversity analyses. This is really more of a want than a need at the moment, however.

The filter-table worked perfect, thanks!

gregcaporaso · October 27, 2016, 11:48pm

It's possible to do this right now. The way this works is that you view the table as pd.DataFrame, do your manipulations, and then import it into a new artifact. All provenance will be lost when you do this because we can't track what happens to the data outside of QIIME (and real artifact provenance is coming with 2.0.6, so you'll see why you care about that soon), but what you describe is functionality that we've explicitly designed the system to support.

Here's an example:

# Load the artifact
import qiime
a = qiime.Artifact.load('table.qza')

# Get a view of it as a DataFrame
import pandas as pd
df = a.view(pd.DataFrame)

# Load it back into QIIME 2.
b = qiime.Artifact.import_data("FeatureTable[Frequency]", df)

# Confirm equality
import biom
a.view(biom.Table) == b.view(biom.Table)

UPDATE: The requested functionality is now supported through the development version of q2-types, so I updated this reply to reflect that. Also, updated to included @ebolyen's suggestion below.

ebolyen · October 27, 2016, 11:52pm

[quote="gregcaporaso, post:11, topic:65]

from q2_types.feature_table import FeatureTable, Frequency
b = qiime.Artifact.import_data(FeatureTable[Frequency], df)

[/quote]

You can also get away with just this (assuming q2_types is installed):

b = qiime.Artifact.import_data("FeatureTable[Frequency]", df)