Diversity analysis with otu table and taxonomy data

Hi,

I am beginning to use Qiime2, and I try to do the diversity analysis. The aim is to discover the relationship between the diversity and the status.
Now, I receive some datasets from collaborators.
OTU_table ( otu id and the sample identify).
OTU_mapping file (OTU_id, acc, full_name, taxonomy)
Sample mapping file (ids, type, etc)
According to the tutorial, I first generate a biom file with sample identify, otu id and taxonomy, then import it to qiime. I think this is the feature table. Because it gives me the information counts with each sample and each feature.

First Question: I check the import example, feature-table-v210.biom. it seems that the biom file doesn’t contain the taxonomy. When I play the following code
biom validate-table -i feature-table-v210.biom
Unknown BIOM type: Table
The input file is not a valid BIOM-formatted file.
But my biom is a valid one. Why it happens? Because I get the biom from the otu table which contains taxonomy?
Besides, I can not view the my-feature-table directly
qiime tools view my-feature-table.qzv
Usage: qiime tools view [OPTIONS] VISUALIZATION_PATH
Error: Visualization viewing is currently not supported in headless environments. You can view Visualizations (and Artifacts) at https://view.qiime2.org.
But it can be viewed online.

Next step, I want to get the Feature Data. But I do not know what to do next. It seems that I should use the otu mapping file which contains the otu id, and taxonomy. When I import it into qiime2, what I get is just the feature ID and taxon. Should I train it? Do I miss something?

Could anyone give me some suggestions? Thanks!!!

Hi @pumpkin,
Thanks for posting!

Feature tables in QIIME2 do not contain taxonomy, and will be discarded upon import. Taxonomy is contained in a separate FeatureData[Taxonomy] artifact. If you or your collaborators have a list of taxonomy assignments for each feature ID, you can import a file in the following format into qiime:

Feature ID	Taxon
229854	k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Legionellales; f__Legionellaceae; g__Legionella; s__
367523	k__Bacteria; p__Bacteroidetes; c__Flavobacteriia; o__Flavobacteriales; f__Flavobacteriaceae; g__Flavobacterium; s__
239330	k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfuromonadales; f__Geobacteraceae; g__Geobacter; s__

Import like this:

qiime tools import \
  --input-path taxonomy.tsv \
  --output-path taxonomy.qza \
  --type 'FeatureData[Taxonomy]'

Alternatively, if you only have the reference sequences associated with each feature ID, or want to classify using the taxonomy classifiers in QIIME2, import as described here.

This is an issue with BIOM formats and out of scope of the QIIME2 forum — you can get support with BIOM on the qiime 1 forum (the issue is probably that you have an old version of biom installed or need to specify the biom format (e.g., 2.1.0) that you are using).

It sounds like you may be trying to visualize on a remote computing cluster? E.g., see this post for more details.

Perfect — sounds like you already figured out how to import FeatureData[Taxonomy] as I described above (I've left that advice there in case I misunderstood your post). This is how taxonomy information is handled in QIIME 2. If you want to visualize taxonomic abundances in a barplot, use barplot. If you want to collapse your feature table to taxonomic groups and use those groups for downstream analyses, e.g., testing whether taxa are differentially abundant between groups, use collapse and proceed with other commands.

I hope that fixes all of your (non-BIOM-related) problems! Please let us know if these do not resolve the issues you are having.

1 Like

Hi @Nicholas_Bokulich,

Thank you for your detailed and clear explanation.
It seems that I have some misunderstandings about the feature ID and OTU_ID. As I explained, I have the OTU table that contains the otu_id and the sample identifier. Also, I have the otu mapping files, which contains the OTU_ID, acc, taxonomy. The OTU IDs are like CsrLava3, CsrNeon2, etc. Not just the numbers. The acc is the identifier assigned by SILVA.
Here is my question.
At first, I thought the OTU ID was the feature ID. But now, It seems that the acc is the feature ID? So does it means that the OTU ID is just an index? What I should use is the acc, treated it as the feature ID?
In addition, there are some data from the website. Could I use the 97_otus.tre as the phylogenetic tree? If the tree is ready, why do we need to generate the tree for phylogenetic as the tutorial did?

Thank you!!!

Hi @pumpkin,

The taxonomy assignments should only contain two columns before importing to QIIME (see the example I gave above). If I am correct, it sounds like you need to delete the acc column before importing so that the OTU IDs (which can have letters in the name, they don't need to just be numbers) are mapped to the predicted taxonomy.

OTU ID should be the feature ID (after all, these are the IDs used in your OTU/feature table).

That probably will not work, unless if your OTU IDs correspond to the labels on the tree (e.g., if you used a closed-reference OTU picking method that assigns OTU IDs that correspond to the reference sequences. It sounds like that probably isn't the case).

The trees prepared in SILVA and other reference sequence databases are phylogenetic trees based on alignment of all reference sequences in that database; in QIIME2 we typically make our own alignments/trees because while OTU picking strategies do exist we recommend using dada2 or deblur denoising algorithms, which remove erroneous sequences and delivery sequence variants (think of these as OTUs clustered at 100% similarity) — and dada2/deblur OTU ids will not correspond to the IDs used by the reference database.

You may want to consider asking your collaborators for the raw sequence data and processing those data through QIIME 2, since this alleviates some of the awkwardness of trying to squeeze data generated on different platforms into QIIME 2. You would also benefit from using methods like dada2 or deblur for denoising/dereplicating, which are more sensitive that OTU picking methods (and yield fewer sequence variants that OTUs because many OTUs are spurious/noisy, making taxonomy classification and other downstream steps faster).

Hope that helps!

1 Like

Thank you @Nicholas_Bokulich
It is very clear to me now!!!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.