Identifying ASV Taxon Names Key and ASV Read Count Table Files

Hello there!

I am currently conducting analyses on vaginal microbiome samples using QIIME 2. I have successfully completed several steps in my analysis pipeline, and now I would like to classify these samples into CSTs (Community State Types).

I am interested in using Valencia, as I have come across several papers that utilize it for this purpose. (Here is the GitHub link: GitHub - ravel-lab/VALENCIA: VAginaL community state typE Nearest CentroId clAssifier). However, I am encountering difficulty in identifying where to find the required files generated by QIIME: the ASV taxon names key and the ASV read count table.

I would greatly appreciate any assistance or guidance you can provide in locating these files within my QIIME 2 analysis results.

Thank you very much for your help!

1 Like

Hi @Julia_Botto,
Welcome to the :qiime2: forum!

Sounds like you need to run qiime tools extract on your taxonomy and your feature table and that should get you files that will work with Valencia.

Let me know if this works. I have never used Valencia so its possible we will need to do more brainstorming.

Hope that helps!


thank you very much for your help @cherman2 !

So, I tried to use the taxonomy and feature table, but the file generated in the feature table has the biom format, and the code asks for csv. I tried to do it with this format anyway but I got the error:
ValueError: Length mismatch: Expected axis has 2 elements, new values have 7 elements

I also did a test and tried to manually assemble the csv table that is needed for the input. To do this, I used the "sample-frequency" and the bar plots csv file from level 7 and it generated an output classifying the CSTs. But it's a lot of work because the format of the taxonomy generated by qiime is different from that used by Valencia, for example, qiime uses the taxon nomenclature to classify the CSTs: d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus iners
And Valencia asks for Lactobacillus_iners
I don't know if I used the right files for the input, but I tested with some samples and got the CST classification..

Thanks again for your help, hope we can figure it out together! :grinning:

Looks like you successfully exported feature table as biom file. It is not your goal, but a step. You need to convert biom file to tsv file and then tsv to csv (or can you export it directly to csv? :thinking:).


Thank you for your help, now we know the input files!!
I think the files are right now, it's the table and the taxonomy, both in csv
But I'm having a problem converting them with the code "python3 /path/to/ /path/to/taxon_key.csv /path/to/asv_count_table.csv"
For the code to work, I had to change the taxonomy file and leave each taxon in a column. It creates a table joining the data from the table and the taxonomy, with the first column as SampleID and the second read_counts. But apparently it's only classifying by genus and not by species. So the table looks like this "g_ g__Lactobacillus, g_ g__Gardnerella..."
Thank you very much for your help, I'm running out of ideas, I think it might be necessary to seek help from the author of the program :confused:

1 Like

Hi @Julia_Botto,

It looks like the author has contact information on their Github repository for any questions or feedback - that might be your best bet (unless any other mods have ideas)!

Best of luck! Cheers :lizard:

Hi Julia, I was dealing with the same problem, but I think that I found the solution. You need to erase all the prefixes ("d__", "g__", "o__", "f__"), . For example, in the case of phyla level, you use "Bacteria" instead of d__Bacteria. In the same way, for other levels, each prefixes should be erased. The former works for me. All the rows should be prepared following the same rule.