Question on Deicode and songbird

cxf514 · October 4, 2019, 6:51pm

Both Deicode and songbirds are really helpful. And l have several questions about them.
First
l would like to know whether DEIcode can provide a list of features rank like differential.tsv in songbird.
Second
l would like to know whether differential rank by songbird is not appropriate dealing with table.qza which is collapsed to genus level as deicode.
Third
ls there another way to pick out meaningful ASVs from rank results apart from the top five or bottom five，you know they are always exists.

Thanks for any reply.

fedarko · October 6, 2019, 2:22am

Hi @cxf514,

The best people to answer this question are probably @cmartino and @mortonjt, who developed DEICODE and Songbird respectively! Since they're currently unavailable, you get me instead

First
l would like to know whether DEIcode can provide a list of features rank like differential.tsv in songbird.

The answer is: sort of! When you run DEICODE, you get an ordination file (along with a distance matrix).¹ The ordination file contains sample loadings and feature loadings. The feature loadings are similar to the differential output you get from Songbird, in that you can rank them from smallest to highest for each feature. The general takeaway from these feature loadings is that highly-ranked or very lowly-ranked features seem to be somehow associated with variation in the dataset.

The DEICODE paper (should be open access ) goes into detail about these loadings: see figure 2F and figure 5 for examples of looking at feature loadings. If you want to do this sort of thing yourself—looking at your feature loadings in order to compare log-ratios of features with sample metadata—this is possible using Qurro (in QIIME 2 this is possible through the qiime qurro loading-plot visualizer). (Qurro doesn't support visualizing DEICODE sample loadings alongside feature log-ratios yet, like in fig. 5 of the DEICODE paper, but that's an open issue I'd like to add eventually.)

Of course, something to note is that DEICODE doesn't know anything about your sample metadata (all you pass in to DEICODE when you run qiime deicode rpca is a feature table)—so unlike Songbird differentials, which are generated using the feature table along with the formula and sample metadata you pass in, the feature loadings in DEICODE output don't necessarily mean anything about your sample metadata.

¹ You can use this ordination file to create a fancy biplot in Emperor, if you want! It's possible to combine this with a Qurro visualization of the feature loadings; see the Qurro tutorial for an example.

Second
l would like to know whether differential rank by songbird is not appropriate dealing with table.qza which is collapsed to genus level as deicode.

I'm not an expert here, but my feeling is that it is best to use the uncollapsed table for Songbird. I do know that DEICODE explicitly recommends against using collapsed tables, and if you want to do stuff like compare your DEICODE and Songbird results then it definitely seems best to just use uncollapsed tables for both.

Third
ls there another way to pick out meaningful ASVs from rank results apart from the top five or bottom five，you know they are always exists.

This is a complicated question! Generally speaking, these high-ranked/low-ranked features (in the context of DEICODE feature loadings or Songbird differentials) are ranked that way because they seem to be somehow associated with variation in some way. I guess you could filter out the top/bottom ranked features and then rerun DEICODE or Songbird, but I wouldn't recommend doing that.

For further information about how to interpret these rankings, I'd recommend looking over the Songbird paper (should also be open access ) -- in particular, I've found the section on "Interpreting ranks" useful when looking at feature rankings.

This isn't super related to the "ranking" side of things, but you might also want to check out this cool open-access paper, which proposes a different way of automatically selecting log-ratios of features.

Hope this helps answer some of your questions!

cxf514 · October 6, 2019, 4:14pm

Dear fedarko~!
l really can not find a word to express my gratitude for you patient and time.
l have read both of the paper DEICODE paper and DR paper.

The first question has been solved through your suggestion successfully.
l extracted the ordination file and sorted the feature loading of PC1. Thanks again~

When it comes to the second issue, l would like to put forward my doubt much more clearly.
Job l am working on
l am working on a subject on V3-V4 region 16S sequencing. l follow the qiime2 pipline and everything works good till the differential analysis. First Ancom， which obtained some significant results but W<25% of all feature number. Second DR you know , which l think is reliable and l successfully got my differential.tsv. Third 'APCR', now l got the rank of the feature loading.
Doubt l feel confusing
As we all know the genus level results of 16S is relatively reliable but not the species level. But the notice from both APCR and DR tell me not to do that with collasped table without an understandable reason.
In DEICODE paper ,FIG5 C and D show me the rank plot with genus level ,even family level at all. l am really confused.
when l am replying to you, l got a good idea.
l have an idea that we can first do the analysis(DR and DEICODE) with level 7, and then we could sum feature loading based on higher levels and then sort it denovo to get a new rank list of higher level.
l think it will be feasible in DR,what about the result in DR differential.tsv?

Wahahah
Is it feasible,in your opinion?

Pay tribute to your Qurro
It is a really useful plugin to qiime2.
Emmmmm, in my work, actually, l got seven thousand ASVs after filtering, l can not see the feature in the left rank plot Intuitively(l know how to pick the Numerator and Denominator) it just too many features.

Looking forward to your next reply.

fedarko · October 7, 2019, 8:20pm

No problem, and thanks for the detailed response!

In DEICODE paper ,FIG5 C and D show me the rank plot with genus level ,even family level at all. l am really confused.

I'm not sure why this is the case -- it seems strange to me that there would be only one Synechococcus in a marine dataset, so I'm also somewhat confused I didn't contribute to this paper; @cmartino, would you mind commenting on this? My guess is that just the most positively or most negatively ranked features from certain genera or families are highlighted on these rank plots.

Regardless of the paper figures, you can still look at log-ratios involving more than two features in your dataset -- that is, on the order of genera/families/etc. -- while still using the uncollapsed and unrarefied table as input to DEICODE. There are multiple ways you can calculate the log-ratio, but the way Qurro does it is, for every sample:

Sum all features in the numerator of the log-ratio (e.g. all features in the genus Synechococcus)
Sum all features in the denominator of the log-ratio (e.g. all features in the genus Nitrosopumilus)
Just take the log-ratio of these sums.

An advantage of this is that you can highlight multiple features (ASVs, sOTUs, ...) on the rank plot, to see how all features identified as belonging to say a certain genus are ranked. In Qurro, you can do this sort of computation automatically using the filtering controls at the bottom right of the screen (e.g. searching by taxonomy).

I think the rank plot with all features in a given genus highlighted should look similar to the rank plot from your idea of summing and re-sorting the feature loadings after the fact... I haven't tried that idea out, though! It's a cool idea, but I think doing things the recommended way through DEICODE is likely best

Pay tribute to your Qurro
It is a really useful plugin to qiime2.
Emmmmm, in my work, actually, l got seven thousand ASVs after filtering, l can not see the feature in the left rank plot Intuitively(l know how to pick the Numerator and Denominator) it just too many features.

Thanks! Seven thousand is a lot of ASVs, so I can see how that would make things hard to interpret... more data is a good problem to have, though

I'm currently working on improving this: the next release of Qurro should have some useful tools for automatically selecting features from the top and bottom of the rank plot.

In the meantime, you can try the following approaches:

When you generate a Qurro visualization, try specifying --p-extreme-feature-count (maybe 500 would be an ok place to start, depending on how many columns of differentials or feature loadings you have?). If you specify this option, Qurro will only display a limited amount of features from both sides of the rank plot -- this can make the visualization a lot easier to work with (at the cost of losing some features from the middle of the rankings).
Try checking the "Fit bar widths to the plot's default width?" box underneath the rank plot. This should at least squeeze the rank plot into a smaller image, and speed up the application (although it will make it hard to select features from the plot).
Using the filtering controls at the bottom right of the screen, you should be able to select features based on their differential or feature loading values (you'll need to use the "numeric searching" options). This will let you just look at features where the corresponding differential/loading is above or below a certain threshold on the rank plot y-axis.
- These controls are kind of difficult to use, sorry... if you have any questions about them, let me know!

I think that's all I know how to answer for now. Let me know if you have any further questions, and I'll see if I can help any more.

cmartino · October 9, 2019, 3:08am

The log-ratios used the paper figure were single sOTU/ASVs. The taxonomy labels used are the lowest classified level for that ASV. Those ASVs can be chosen from the arrows in the biplot.

Like you pointed out @fedarko, you can also group by lowest common taxonomy or high/low rank groupings. That is where Qurro makes life easy.