Doubt about alpha rarefaction plot data

marctormo · November 2, 2020, 11:44am

Hi there,

I'm confused about how to use the alpha-rarefaction data (downloaded as csv).
First of all, the alpha-rarefaction plot is plotting an average of the 10 iterations? or is it plotting only an average of the 10th?

Second, I am unable to reproduce the plot from this data, and when I look at a specific point (the second one, depth-8110), the averages that I obtain are: 13.xx for HC and 12.xx for UC (depending on how many iterations I took to make the average). Those values are different than what qiime2 had plotted:

In this plot, UC value is higher than HC. Do you know what it's happening here? Maybe I am not taking into account something important, but I don't know what.

Many thanks!
Marc

andrewsanchez · November 2, 2020, 8:58pm

Hi, @marctormo!

To quote from both the moving pictures tutorial and the Parkinson's mice tutorial:

At each sampling depth step, 10 rarefied tables will be generated, and the diversity metrics will be computed for all samples in the tables. The number of iterations (rarefied tables computed at each sampling depth) can be controlled with --p-iterations . Average diversity values will be plotted for each sample at each even sampling depth

By default, 10 rarefied tables are calculated at each sampling depth to provide an error estimate.

Does that make sense? Let me know if your second question still stands after reading this. Because of the random re-sampling which occurs in rarefaction, it is feasible to see different results.

marctormo · November 3, 2020, 10:53am

Hi Andrew,

Thanks for your answer, but sadly it isn't clear for me yet.
The thing is that I'm trying to do the same plot in R with the values from the downloaded csv, I'm not trying to do the rarefaction again (here I understand that I would get different values).
If I do an average for each depth and grouped by a metadata column HC/UC (I tried it with the 10th iteration of all the samples of the group and the average of all the iterations of all the samples for that group) I get that the average of UC is lower than the average of HC samples, but it's not what I see in the rarefaction plot from qiime2.

Do you know how to reproduce the rarefaction plot from the csv? Am I wrong somewhere in my explanation?

Many thanks!
Marc

andrewsanchez · November 4, 2020, 7:07pm

Hi, @marctormo,

Important: after digging into this further, I discovered that the tutorial which I quoted above is incorrect in stating that the averages are plotted. In fact, the curves are drawn based on the median.. This fact is correctly documented in the visualization help text:

I think were you may be going wrong is that you need to group by metadata columns first. To be clear, all iterations at a specified depth are taken into account when plotting the curves.

Rather than trying to reproduce the plot by mimicking QIIME 2's operations on the raw, rarefaction results in metric*.csv, you could export the visualization and read in/plot the raw data in the metric*.jsonp files, e.g.:

➜ qiime tools export --input-path alpha_rarefaction_curves.qzv --output-path alpha_rarefaction_curves
Exported alpha_rarefaction_curves.qzv as Visualization to directory alpha_rarefaction_curves
[I] ➜ ls alpha_rarefaction_curves
dist                                              observed_features-donor_status.jsonp              q2templateassets                                  shannon-genotype.jsonp
index.html                                        observed_features-genotype.jsonp                  shannon-barcode.jsonp                             shannon-genotype_and_donor_status.jsonp
observed_features-barcode.jsonp                   observed_features-genotype_and_donor_status.jsonp shannon-cage_id.jsonp                             shannon-mouse_id.jsonp
observed_features-cage_id.jsonp                   observed_features-mouse_id.jsonp                  shannon-donor.jsonp                               shannon.csv
observed_features-donor.jsonp                     observed_features.csv                             shannon-donor_status.jsonp

That being said, I will try to put the basic, high-level steps of our Python implementation for generating the data used to plot the curves when grouping on a categorical metadata column into plain language. The "real," answer is, of course, in the source code, here.

You must do the following for each metadata column:

Reindex the data frame on the metadata column.
Sort the index by rows.
Group by the metadata column.
Calculate the median on the grouped data frame.

To illustrate, prior to the first step, your data frame would look something like this:

_alpha_rarefaction_depth_column_    1            ...       200               pet
iter                                1    2    3  ...         9        10        
S1                               -0.0 -0.0 -0.0  ...  0.998846  0.998846    russ
S2                               -0.0 -0.0 -0.0  ...  0.998196  0.999351    milo
S3                               -0.0 -0.0 -0.0  ...  1.000000  0.999928  peanut

After the four steps above, you would be working with something like this:

_alpha_rarefaction_depth_column_  1              ...       200                    
iter                               1    2    3   ...        8         9         10
pet                                              ...                              
milo                             -0.0 -0.0 -0.0  ...  0.999928  0.998196  0.999351
peanut                           -0.0 -0.0 -0.0  ...  0.999711  1.000000  0.999928
russ                             -0.0 -0.0 -0.0  ...  0.997402  0.998846  0.998846

The next step is to stack the data frame from columns to index, which would look like this:

iter                                           1         2   ...        9         10
pet    _alpha_rarefaction_depth_column_                      ...                    
milo   1                                -0.000000 -0.000000  ... -0.000000 -0.000000
       23                                0.886541  0.998636  ...  0.932112  0.965636
       45                                0.999644  0.982474  ...  0.996792  0.982474
       67                                0.992112  0.999839  ...  0.999839  0.999839
       89                                0.995533  0.995533  ...  0.992611  0.995533
       111                               0.999473  0.999473  ...  0.999941  0.999473
       133                               0.999959  0.999633  ...  0.999959  0.999959
       155                               0.998528  0.998528  ...  0.999730  0.999970
       177                               0.999424  0.999793  ...  0.999977  0.999977
       200                               0.999351  1.000000  ...  0.998196  0.999351
peanut 1                                -0.000000 -0.000000  ... -0.000000 -0.000000
       23                                0.987693  0.987693  ...  0.998636  0.998636
       45                                0.991076  0.999644  ...  0.999644  0.982474
       67                                0.972670  0.980468  ...  0.999839  0.999839
       89                                0.997722  0.995533  ...  0.997722  0.984554
       111                               0.995253  0.998536  ...  0.995253  0.998536
       133                               0.999633  0.999959  ...  0.995060  0.999959
       155                               0.998528  0.999970  ...  0.997567  0.997567
       177                               0.999793  0.996105  ...  0.999793  0.999977
       200                               1.000000  0.999351  ...  1.000000  0.999928
russ   1                                -0.000000 -0.000000  ... -0.000000 -0.000000
       23                                0.965636  0.987693  ...  0.886541  0.987693
       45                                0.999644  0.999644  ...  0.999644  0.996792
       67                                0.992112  0.998553  ...  0.999839  0.995979
       89                                0.999180  0.999909  ...  0.984554  0.979412
       111                               0.992904  0.995253  ...  0.992904  0.998536
       133                               0.990805  0.999633  ...  0.999633  0.995060
       155                               0.998528  0.986718  ...  0.999970  0.999730
       177                               0.997212  0.994813  ...  0.999793  0.996105
       200                               0.999351  0.998846  ...  0.998846  0.998846

It is on this stacked version of the data frame that we calculate the five-number summaries, used to plot the data for a given metadata column.

marctormo · November 5, 2020, 10:27am

Thanks Andrew,

With the median it looks really better

Best,
Marc

system · December 6, 2020, 4:27pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.