Help importing and using BIOM table with taxonomy data in QIIME2

MicrobeManager · May 5, 2021, 7:48pm

I'm new to using QIIME2 and BIOM format, so I'm probably way off track here with what I'm trying to do, so any help on this is welcome!

Our lab recently requested shotgun metagenomic sequencing for some fecal microbiome samples. Because of a filtering issue, I was only able to get the bacteria-specific data from the sequencing service in the form of individual TSV files from each sample. These came with full taxonomic classification, so I consolidated the results into one large sheet in Excel and converted it back to a TSV file with the hope that I could convert it to a BIOM format and import into QIIME2 for further analysis. Here's a very small version of my TSV file (using relative abundances).

To convert this to BIOM, had to invert the rows and columns in Excel first (for some reason the final qzv showed the taxa as column names and samples as rows). I converted to BIOM 2.10 format using biom-convert, then imported it to QIIME2 using qiime tools import, using type 'FeatureTable[Frequency] and input-format BIOMV210Format. This must be where I'm going wrong, since the visualization I produced from this using qiime metadata tabulate looked odd and seemed to read the taxa information as just basic metadata.

After more reading on the forum I used qiime tools import again and set the type to 'FeatureData[Taxonomy]', but this produced the "table must have observation metadata" error, and this is about as far as I could get.

What should I be doing to convert this TSV to BIOM and have QIIME2 recognize the taxa already present in the file? Should I create a separate TSV file for the taxa data and import with my sample metadata? I'm using QIIME2 2021.4 in a Conda environment on VirtualBox running Ubuntu 20.04. Any help would be greatly appreciated!

thermokarst · May 10, 2021, 2:55pm

Hi there @MicrobeManager!

I see your point, but that doesn't look like off to me - you provided QIIME 2 with an input while were the IDs are all taxonomic identifiers (at least according to your first screenshot). QIIME 2 doesn't automatically create IDs for samples or features - that's up to the individual investigator to provide. If you want different IDs, you'll have to modify your excel table to include those as the first column (the one called OTUID).

It depends - if you apply the taxon strings to the BIOM file as observational metadata, then you can import like this:

qiime tools import \
  --input-path my_hdf5.biom \
  --output-path taxonomy.qza \
  --source-format BIOMV210Format \
  --type "FeatureData[Taxonomy]"

Otherwise, you can save a new TSV with just the taxon information (using new feature IDs that you come up with yourself):

Feature ID      Taxon
f1        k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__
f2        k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria
f3        k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus
f4        k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__
...

and import using:

qiime tools import \
  --input-path taxa.tsv \
  --output-path taxonomy.qza \
  --type "FeatureData[Taxonomy]"

Please let us know how it goes.

system · June 10, 2021, 8:56pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.