alpha diversity analysis in QIIME2 vs Jupyter Notebook (python)

malenaamer · June 28, 2023, 4:28am

Hello,

I'm trying to make a table of the Kruskal-Wallis results for 35 different variables. It would take a long time to copy and paste each value from the .qzv/view files. So I created a script in python using the tsv files produced in QIIME2 of Shannon, FaithPD, Evenness and ObservedOTUs metrics.

However, I noticed they don't produce the exact same results. Could you help me explain why? Are there any nuances I should take into account?

I'm using these commands in python:

import pandas as pd
from scipy.stats import kruskal
        # Perform Kruskal-Wallis test
        H, p = kruskal(category1_values, category2_values)

Captura de Pantalla 2023-06-27 a la(s) 21.21.14

And I'm using this command in QIIME2:

!qiime diversity alpha-group-significance \
  --i-alpha-diversity faith_pd_vector.qza \
  --m-metadata-file sample-metadata-v2.tsv \
  --o-visualization faith-pd-group-significance_230625.qzv

I appreciate your help!
Malena

gregcaporaso · June 28, 2023, 2:00pm

Hi @malenaamer, You should be getting the same result. I suspect there may be a difference in how the lists of values being compared are created. Could you provide more context on how you're computing your category1_values and category2_values? Ideally you could share code showing this starting from the same artifact that you're providing to alpha-group-significance.

If you haven't used QIIME 2 artifacts from within Python before, you can load your Faith PD vector and sample metadata into pandas objects (a pd.Series and pd.DataFrame, respectively) as follows:

In [1]: import qiime2

In [3]: import pandas as pd

In [4]: faith_pd = qiime2.Artifact.load('./faith_pd_vector.qza').view(pd.Series)

In [6]: sample_metadata = qiime2.Metadata.load('./sample-metadata.tsv').to_dataframe()

malenaamer · June 29, 2023, 11:20pm

Thank you so much for the quick response, Greg!

I'm glad I asked the question because I definitely had a mistake in the list preparation as you suggested, I was actually skipping a row:

import pandas as pd
from scipy.stats import kruskal

# Read the Excel
#MISTAKE! -->data = pd.read_excel('Manually_made_file.xls', sheet_name='AlphaGraphs', header=0, skiprows=[1])
data = pd.read_excel('Manually_made_file.xls', sheet_name='AlphaGraphs', header=0)
columns_to_analyze = data.columns[7:73].tolist()
print(columns_to_analyze)
alpha_diversity_outcomes = ['Shannon', 'Observed_OTUs', 'FaithPD', 'Evenness']

# Kruskal-Wallis
results = []
for variable in columns_to_analyze:
    for outcome in alpha_diversity_outcomes:
        category1_values = data.loc[data[variable] == 0, outcome]
        category2_values = data.loc[data[variable] == 1, outcome]
        #print(category1_values)
        #print(category2_values)
        
        H, p = kruskal(category1_values, category2_values)

        results.append({
            'Testing Variable': variable,
            'Outcome': outcome,
            'H': H,
            'p-value': p, #only two categories, no q
        })

As you can notice in my code, I used a "Manually_made_file.xls". I made it by downloading all the .TSV files made from the alpha_diveristy.qzv files and copying the data into different columns in excel. Then I also merged the metadata file.

I had not called the QIIME2 artifacts from python yet, but I am going to update my script now so I don't have to manually copy and paste the downloaded .TSV file from the view.qiime2 page. I'm having trouble with the automatic merge because the .qza file has as the index the SampleIDs and then the data as a column. So I need to find a way to merge them properly.

timanix · June 30, 2023, 6:17am

Glad that you found an error - I silently followed your post since I am also applying Kruskal-Wallis test outside of qiime2 since it is faster for me and I don't remember when I checked for the last time if results are the same with qiime2.

I just add all alpha diversity metrics to metadata file:

def add_alpha(qza):  
    a = !unzip $qza
    out = a[1].split('/')[0].replace('  inflating: ', '')
    inf = f'{out}/data/alpha-diversity.tsv'
    df = pd.read_csv(inf, sep='\t', index_col=0)
    !rm -rf $out
    return df 


META = pd.read_csv('metadata.tsv', sep='\t', index_col=0)
ALPHAS = {
    'shannon': 'Shannon entropy', 
    'observed_features': 'Observed features',
    'faith_pd': 'Faith\'s PD',
    'evenness': 'Pielou\'s evenness'}


COREM = 'Results/Core-metrics'
for alpha in ALPHAS:
    data = add_alpha(f'{COREM}/{alpha}_vector.qza')
    META[alpha] = data.iloc[:,  0]

META.to_csv('alpha.tsv', sep='\t')

This file then can be used for different tests and plots.
I use dictionary for alpha metrics instead of list just because I added metric names as values that I usually plot on figures instead of metrics itself.

malenaamer · July 8, 2023, 5:10am

Thank you so much for sharing this, I'll give it a try!
Have you run a Lefse analysis in python?

timanix · July 9, 2023, 11:47am

Hi!
Yes, I run lefse analysis couple of times on the data front Qiime2 in Jupyter lab. Now I prefer Ancom-BC that is already installed in the latest versions of Qiime2 since I like this analysis more

malenaamer · July 10, 2023, 5:54pm

I didn't know about this analysis, why do you like it better? Can you link a tutorial?
I'm trying to make prettier graphs with my Lefse results, but it's a slow process for a beginner

timanix · July 10, 2023, 9:17pm

I like this comparison of DA tests:

And, of course, this one:

Besides, Lefse should be installed into different environment and one will need to export feature table in Qiime2, add metadata information like class and subclass and then switch to the other environment, convert it to the right format for lefse and then run lefse.

I think Ancom-BC is more appropriate for compositional microbiome data, have an advantage of formulas for multiple factors and is better for comparisons with more than two levels.

Best,

malenaamer · July 14, 2023, 4:00pm

Great, thank you so much for sharing this! I'll add it to my "To Learn" list

system · August 27, 2023, 7:33am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.