Converting QIIME 2 Output to VALENCIA Format (ASV Taxon and Read Count Table)

MSrinivasan · October 17, 2024, 3:01am

I’m currently working on analyzing microbial community data, and I need help converting QIIME 2 outputs into a format suitable for VALENCIA analysis. Specifically, I need to generate two files:

ASV Taxon Names Key: A file that links ASV IDs to their respective taxon names.
ASV Read Count Table: A table where samples are rows and condensed taxa are columns, with the read counts per sample.

I tried using the feature-table in CSV format, but this only contains feature IDs and read counts for each sample, without the corresponding taxon names. I would appreciate any advice on how to get these files.

What I have tried so far:

Exported my feature table to CSV, which contains feature IDs and read counts per sample.
I used the qiime tools export to export the data, but this doesn’t give me the taxon names.
How to Create an output where each sample’s ASVs are linked to their taxonomic information, with the final table having samples as rows and taxa (collapsed to higher levels like genus or species) as columns.

Could you provide guidance or a script that can take QIIME 2 output (e.g., feature table and taxonomy) and convert it to this VALENCIA-ready format?

Thank you so much for your help!

salias · October 17, 2024, 1:06pm

Hello @MSrinivasan ,

When you say VALENCIA, do you mean this tool?: GitHub - ravel-lab/VALENCIA: VAginaL community state typE Nearest CentroId clAssifier

If so¹, README says that convert_qiime.py should provide you with the input file required by the pipeline. In order to run convert_qiime.py you need two CSV files:

The taxonomy file (we'll obtain it from the taxonomy QZA).
The ASV table (we'll obtain it from the feature table QZA).

I'll give you some example code. Assuming your taxonomy file is taxonomy.qza and your ASV table is table.qza:

# QZA to TSV (taxonomy) / BIOM (ASV table)

qiime tools export \
  --input-path taxonomy.qza \
  --output-path exported

qiime tools export \
  --input-path table.qza \
  --output-path exported

# Extra step for ASV table: BIOM to TSV

biom convert \
  -i exported/feature-table.biom \
  -o feature-table.tsv \
  --to-tsv

You'll end up with exported/taxonomy.tsv and exported/feature-table.tsv. These are TSV files instead of CSV files. Now you can either convert them to CSV (e.g. import to Excel -> export in CSV)² or change lines 7 and 14 in convert_qiime.py before running it:

github.com

ravel-lab/VALENCIA/blob/8559d454387479f7155333693d854961463c3b15/convert_qiime.py#L7


      
          #!/usr/bin/env python3
          import pandas as pd
          import sys
          
          
          #reading in the qiime taxon key
          taxon_key = pd.read_csv(sys.argv[1],sep=",",index_col=0)
          
          taxon_key.columns = ['k','p','c','o','f','g','s']
          taxon_key = taxon_key[taxon_key.columns[::-1]]
          
          #replacing taxon_key with 
          #reading in the table of counts
          counts_table = pd.read_csv(sys.argv[2],sep=",",index_col=0)
          
          #function that determines the highest level of taxonimc specifity and then formats the condensed name
          #only provides species assignments for focal taxa used by valencia

github.com

ravel-lab/VALENCIA/blob/8559d454387479f7155333693d854961463c3b15/convert_qiime.py#L14


      
          
          
          #reading in the qiime taxon key
          taxon_key = pd.read_csv(sys.argv[1],sep=",",index_col=0)
          
          taxon_key.columns = ['k','p','c','o','f','g','s']
          taxon_key = taxon_key[taxon_key.columns[::-1]]
          
          #replacing taxon_key with 
          #reading in the table of counts
          counts_table = pd.read_csv(sys.argv[2],sep=",",index_col=0)
          
          #function that determines the highest level of taxonimc specifity and then formats the condensed name
          #only provides species assignments for focal taxa used by valencia
          def taxon_condense(row):
          
              row = row.T
          
              first_nonan = row.first_valid_index()
          
              if first_nonan == 's':

Change sep="," to sep="\t" in both lines and you should be good to go

Best,

Sergio

--

Footnotes

¹ If you take a look at the VALENCIA repository issues section, it seems like VALENCIA may be abandonware. QIIME 2 also includes methods for classifying samples. While they don't include the "thirteen reference centroids" of VALENCIA tool, they may work better and are supported by the QIIME 2 team. In case you are interested, see: sample-classifier — QIIME 2 2024.5.0 documentation

² I'm not a big fan of Excel-based solutions in bioinformatics, but I think it's worth mentioning.

MSrinivasan · October 22, 2024, 12:10am

Thank you @salias
I end up with exported/taxonomy.tsv and exported/feature-table.tsv . After this I can't change the lines 7 and 14 in convert_qiime.py.

When I am using the code
#!/usr/bin/env python3 import pandas as pd import sys #reading in the qiime taxon key taxon_key = pd.read_csv(sys.argv[1],sep=",",index_col=0) taxon_key.columns = ['k','p','c','o','f','g','s'] taxon_key = taxon_key[taxon_key.columns[::-1]]
Getting error FileNotFoundError: [Errno 2] No such file or directory: '-f'

I attached my exported tsv files. Please navigate me to fix this error

Thank you!
feature-table.csv (119.6 KB)
taxonomy.csv (324.5 KB)
feature-table.tsv (158.2 KB)
taxonomy-2.tsv (339.3 KB)

salias · October 22, 2024, 10:09am

Hello!

What I meant with "change the lines" was: download convert_qiime.py, open it with any text editor, change sep="," to sep="\t" in both lines 7 and 14, and save the script. That will allow you to use TSV files. Anyway, both your CSV and TSV files looks fine. You may want to remove the first row of your feature table files (the one that says # Constructed from biom file,,,,,,,,,). You can use those CSV files with convert_qiime.py directly, without editing it.

And regarding the error:

The correct way of invoking convert_qiime.py is, according to the README:

python3 /path/to/convert_qiime.py /path/to/taxon_key.csv /path/to/asv_count_table.csv

I suspect you are prepending filenames with -f, so the script assumes that "-f" string itself is the filename.

Best,

Sergio

MSrinivasan · October 22, 2024, 3:58pm

Thank you very much; however, VALENCIA is not working for me.

I’m trying to run the VALENCIA analysis tool with the following command:

(qiime2-amplicon-2024.5) srinivasan@CT-Ubuntu22:~/microbiome/VALENCIA$ python Valencia.py -i /home/srinivasan/microbiome/feature-table.**csv** -o /home/srinivasan/microbiome/valencia_output -r /home/srinivasan/microbiome/VALENCIA/CST_centroids_012920.csv

> Error

Input file expected to be a CSV with first two column headers: sampleID, read_count

Should I use the feature table or the taxonomy file for the VALENCIA analysis, and how should I format them?

I encountered the following error while trying to run the convert_qiime.py script:

python3 convert_qiime.py taxonomy.csv featureasv-table.csv
Traceback (most recent call last):
  File "/home/srinivasan/microbiome/convert_qiime.py", line 9, in <module>
    taxon_key.columns = ['k', 'p', 'c', 'o', 'f', 'g', 's']
  File "/home/srinivasan/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/pandas/core/generic.py", line 6313, in __setattr__
    return object.__setattr__(self, name, value)
  File "properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/home/srinivasan/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/pandas/core/generic.py", line 814, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/srinivasan/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 238, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/home/srinivasan/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/pandas/core/internals/base.py", line 98, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements

This error indicates that the script is expecting a DataFrame with columns that match the taxonomy levels (kingdom, phylum, class, order, family, genus, species), but it seems the input DataFrame is empty or improperly structured. Help me to understand and how to resolve this issue?

salias · October 23, 2024, 10:35am

Hello again,

The input file required here is not the ASV table, but the CSV that outputs convert_qiime.py (or any other CSV file constructed either manually or with a custom script from your QIIME 2 files so it matches the required format).

It looks like the convert_qiime.py is not as QIIME-friendly as we thought... someone wrote an unanswered issue¹ two years ago facing the same problem.

It looks that the script is expecting a CSV with one column for taxonomic level. Currently, there is no way of achieving this within QIIME 2. The easiest approach, IMO, would be to reformat your taxonomy QZA file in R using qiime2R. Briefly, you would need to do the following (double-check with qiime2R tutorial since I extracted all the info from there):


if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R")

library(qiime2R)

# Import taxonomy
taxonomy <- read_qza("taxonomy.qza")

# Parse taxonomy
taxonomy <- parse_taxonomy(taxonomy$data)

# Remove first column (I think convert_qiime.py is assuming
# your taxonomy and table files have the same ASV 
# order so we wouldn't need ASV ID, if this is not the
# case feel free to try again skipping this step).
taxonomy <- taxonomy[ , -1]

# Export
write.csv(taxonomy, "taxonomy.csv", row.names = FALSE)

Try convert_qiime.py with this new taxonomy CSV, and now it should work!

Best,

Sergio

--

¹ Remember:

MSrinivasan · October 23, 2024, 10:02pm

I don't know how to proceed. I have the current file as per the QIIME 2 pipeline. Could you please help me format the feature table or guide me on how to obtain the required ASV table format?

MSrinivasan · October 23, 2024, 10:07pm

salias:

if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R")

library(qiime2R)

# Import taxonomy
taxonomy <- read_qza("taxonomy.qza")

# Parse taxonomy
taxonomy <- parse_taxonomy(taxonomy$data)

# Remove first column (I think convert_qiime.py is assuming
# your taxonomy and table files have the same ASV 
# order so we wouldn't need ASV ID, if this is not the
# case feel free to try again skipping this step).
taxonomy <- taxonomy[ , -1]

# Export
write.csv(taxonomy, "taxonomy.csv", row.names = FALSE)

This codes works well. This new taxonomy.csv file also not working with convert_qiime.py

salias · October 24, 2024, 8:34am

Hi,

At this point we have already tried all my options. I think what you should do now is either trying to contact VALENCIA developers or find another way/tool to analyze your data (like I outlined in a previous post).

Good luck,

Sergio

MSrinivasan · October 24, 2024, 11:14pm

Hi @salias
Thank you very much for your help. I will post if I fixed!

MSrinivasan · November 1, 2024, 12:16am

Hi @salias
Here’s the script I used to fix the taxonomy file error! while using convert_qiime.py

Thank you very much for your direction!

# Load necessary libraries
library(qiime2R)
# Import taxonomy
taxonomy <- read_qza("taxonomy-esr1.qza")
# Parse taxonomy
taxonomy_data <- parse_taxonomy(taxonomy$data)
# Create a data frame that includes Feature IDs as the first column
final_taxonomy <- data.frame("Feature ID" = rownames(taxonomy_data), 
                              taxonomy_data, 
                              stringsAsFactors = FALSE)
# Set column names explicitly
colnames(final_taxonomy) <- c("Feature ID", "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
# Print the final taxonomy to inspect
print("Final Taxonomy Data with Feature IDs:")
print(head(final_taxonomy))
# Check if the Species column exists, if not, fill with NA
if (!"Species" %in% colnames(final_taxonomy)) {
  final_taxonomy$Species <- NA
}
# Export the final taxonomy data to a CSV file
write.csv(final_taxonomy, "taxonomy_with_feature_ids_and_species.csv", row.names = FALSE)
# Confirmation message
print("Exported taxonomy_with_feature_ids_and_species.csv successfully.")

salias · November 4, 2024, 8:35am

Happy you got it working @MSrinivasan !

system · December 5, 2024, 2:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.