Hi @dann818,
Recently I too have been playing around with the LEfSe + QIIME 2 combination, so I was very happy to come across your post.
I'm not sure I understood 100% of what you wrote, but I generally agree with you that preparing an input file for LEfSe can be tricky, especially with the distinction between relative abundance vs. accumulative relative abundance. For example, when we look at this example input file provided by the original authors (lefse · biobakery/biobakery Wiki · GitHub), each column -- which corresponds to a sample -- doesn't sum to 1 because, for example, the relative abundance of the taxon [ Bacteria ] is the accumulative relative abundance of [ Bacteria|Acidobacteria ], [ Bacteria|Bacteroidetes ], ... Therefore, this is not the same as the QIIME 2 feature table with FeatureTable[RelativeFrequency]
type whose column does sum up to 1. And I think this is what you are trying to point out in your post (correct me if I am wrong).
The good news is: I believe LEfSe is smart enough to calculate accumulative relative abundance on its own when it's given a FeatureTable[RelativeFrequency]
and it doesn't see accumulative relative abundance. I have written a short tutorial for performing LEfSe with a QIIME 2 feature table. I will copy and paste it below. Take a look at the input_table.tsv
file vs. the formatted_table.tsv
file. The first file has only 220 taxa with all of them belonging to the genus level; it's the input file for LEfSe. The second file is what LEfSe creates after formatting the first file and it has 429 taxa with varying taxonomic ranks from Kingdom to Genus.
I hope you find my tutorial helpful and please let me know if you have any questions.
LEfSe
In this section, I will walk you through how I run the LEfSe (linear discriminant analysis effect size) tool. But before I do that, it is important for you to acknowledge this:
LEfSe method is more a discriminant analysis method rather than a DA method. (Lin and Peddada, 2020; PMID: 33268781)
In order to use LEfSe, you will need to open two Terminal windows: one for your usual QIIME 2 environment and another for running LEfSe. For the latter, you should create a new conda environment and install LEfSe as described below.
- Terminal for running QIIME 2 and Dokdo:
$ conda activate qiime2-2020.8
- Terminal for running LEfSe:
$ conda create -n lefse -c conda-forge python=2.7.15
$ conda activate lefse
$ conda install -c bioconda -c conda-forge lefse
After you have both terminals set up, you can create an input file for LEfSe from a QIIME 2 feature table. We will use the "Moving Pictures" tutorial as an example (run below in the QIIME 2 terminal).
$ dokdo prepare-lefse \
-t data/moving-pictures-tutorial/table.qza \
-x data/moving-pictures-tutorial/taxonomy.qza \
-m data/moving-pictures-tutorial/sample-metadata.tsv \
-o output/Useful-Information/input_table.tsv \
-c body-site \
-u subject \
-w "[body-site] IN ('tongue', 'gut', 'left palm')"
Click here to view the input_table.tsv
file.
Next, we need to format the input table (run below in the LEfSe terminal):
$ lefse-format_input.py \
output/Useful-Information/input_table.tsv \
output/Useful-Information/formatted_table.in \
-c 1 \
-u 2 \
-o 1000000 \
--output_table output/Useful-Information/formatted_table.tsv
Click here to view the formatted_table.in
file. Click here to view the formatted_table.tsv
file.
We can run LEfSe with (run below in the LEfSe terminal):
$ run_lefse.py \
output/Useful-Information/formatted_table.in \
output/Useful-Information/output.res
Which will give:
Number of significantly discriminative features: 199 ( 199 ) before internal wilcoxon
Number of discriminative features with abs LDA score > 2.0 : 199
Click here to view the output.res
file.
We can then list the discriminative features and their LDA scores (run below in the LEfSe terminal):
$ lefse-plot_res.py \
output/Useful-Information/output.res \
output/Useful-Information/output.pdf \
--format pdf
Click here to view the output.pdf
file.
Finally, you can create a cladogram for the discriminative features (run below in the LEfSe terminal):
$ lefse-plot_cladogram.py \
output/Useful-Information/output.res \
output/Useful-Information/output.cladogram.pdf \
--format pdf
Click here to view the output.cladogram.pdf
file.