Differences in OTU table (table.qza) and taxa collapse file

Hi, I have a question- when exporting the table.qza into .biom format, I am left with around ~1800 OTUs. However, when exporting the taxonomy.qza file with taxa collapse to species-level (p 7), I am left with ~600 taxonomically categorized OTUs. Is there something I'm missing here? Below is the code I use for taxa collapse:

qiime taxa collapse
--i-table table.qza
--i-taxonomy taxonomy.qza
--p-level 7
--o-collapsed-table table-l7.qza

My question is: if the table.qza and the taxonomy.qza are both being used for the taxa collapse, why is there such a huge difference in numbers from the output of taxa collapse if every sequence in the table.qza is unique? Shouldn't the l7 classification be the same?

1 Like

Hi @schmeltzy,
This sounds like intended behavior. Let me explain:

collapse collapses your OTUs into taxonomic labels that are shared between OTUs at level X. So if you collapse at level 7, the features you are getting out the other end are species, not OTUs. So you have ~1800 OTUs and ~600 species, which sounds about right.

Even though the sequences are unique, many are classifying to the same taxonomic groups. Remember that many may not even be receiving species-level classifications (because there is not enough sequence information to provide a confident classification at species level), so this will further decrease the number of "species" that you observe after collapsing. E.g., two OTUs that belong to the same genus and are in reality different species may both receive ambiguous species-level classifications, in which case they will both be collapsed into the same species-level feature.

It sounds like maybe what you were trying to do with collapse was to merge taxonomy information into your feature table. In QIIME2 taxonomy information (and all other feature data) is kept separate from the feature table, and then imported separately for any command that consumes feature data (e.g., see any of the actions in the taxa plugin). If you can provide some more information on your use need, I may be able to recommend an alternative or point you in the right direction.

I hope that helps!

2 Likes

Yes, thank you! I figured out the answer to my question almost as soon as I posted this and then you confirmed my suspicions! But you are correct, what I'm really after is merging taxonomy information with my OTU (feature) table and I haven't been able to figure out a way to do this yet without collapsing.

Hi @schmeltzy,

So I'm guessing you also need this information to export from QIIME2, e.g., to phyloseq, R, or another program. Is that correct?

A number of forum threads have discussed this (e.g., here, here, here)

I hope that helps!

1 Like

Hi @Nicholas_Bokulich, I tried to use the add-metadata command to add to .biom file (feature table) with my taxonomy (a .tsv) but it keeps giving me an error. If I follow the code on the other post/forum with my input as the .biom and the "observational metadata" as my taxonomy file, I get this error: "AttributeError: 'NoneType' object has no attribute 'encode'". Removing headers and trying to input my own has the same effect.

1 Like

Hi @schmeltzy,

Could you post the full command and output (i.e. the stacktrace)? Thanks!

Sure, here is the command: biom add-metadata -i feature-table.biom -o table.w_omd.biom --observation-metadata-fp taxonomy.tsv --observation-header OTUID,taxonomy,confidence

The output is a lot of traceback errors to qiime 2 packages, and then the "Attribute error" I mentioned above.

Thanks, could you include the traceback as well? That is super useful for us, since we can figure out what the code is doing and where things went wrong.

raceback (most recent call last):
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/bin/biom", line 6, in
sys.exit(biom.cli.cli())
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/cli/metadata_adder.py", line 114, in add_metadata
write_biom_table(result, fmt, output_fp)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/cli/util.py", line 35, in write_biom_table
table.to_hdf5(f, biom.parse.generatedby())
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/table.py", line 4123, in to_hdf5
self.group_metadata(axis='observation'), 'csr', compression)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/table.py", line 4095, in axis_dump
formatter[category](grp, category, md, compression)
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/table.py", line 274, in general_formatter
data=[m[header].encode('utf8') for m in md],
File "/raid1/home/micro/mcmindsr/labhome/local/bin/miniconda3/envs/qiime2-2017.10/lib/python3.5/site-packages/biom_format-2.1.6-py3.5-linux-x86_64.egg/biom/table.py", line 274, in
data=[m[header].encode('utf8') for m in md],
AttributeError: 'NoneType' object has no attribute 'encode'

1 Like

Hello,

Could you try biom add-metadata -i feature-table.biom -o table.w_omd.biom --observation-metadata-fp taxonomy.tsv --observation-header taxonomy ? If this doesn't work, would you mind posting your files or at least the first 5 lines of taxonomy.tsv file and the biom table?

Thanks!

2 Likes

@antgonza thanks for the suggestion- so, this runs without any errors, but it doesn't add my metadata. The resulting file looks exactly the same as the feature-table.biom.

1 Like

Would you mind sharing your files? BTW, how are you testing that is the same file? A quick way is: biom summarize-table --observations -i yourbiom.biom. Also, have you seen this adding metadata tutorial?

Anyway, remember that you only need --observation-header OTUID,taxonomy,confidence in case your taxonomy file doesn't have a header. Also, that --sc-separated taxonomy is needed in case you want to have the different levels, because if not everything will be a single string.

1 Like

Unfortunately I don't have authorization to share these files, yes I have seen the adding metadata tutorial, and yes I have tried with and without headers on several different metadata observation files trying the different variations for adding metadata, but none have actually added metadata and if the command runs without error, the resulting output file always is the same as the original .biom file).

@schmeltzy, would it be possible to send the output of the following two commands?

$ biom summarize-table -i feature-table.biom | head
$ wc -l taxonomy.tsv

My guess is that the taxonomy.tsv file does not have entries for everything in feature-table.biom, which could occur if, for instance, the taxonomy.tsv file stemmed from the collapse but the feature-table.biom was the full OTU table.

Just had a thought, @Nicholas_Bokulich, if the taxonomy assigner doesn't classify a given feature, is that feature included in the resulting taxonomy output as k__; p__; c__; o__; f__; g__; s__, or is it omitted from the resulting file? If the latter, then it's possible that could be a problem here. I can't recall right now how flexible the biom add-metadata command is but historically it's been rough around the edges.

Best,
Daniel

Hi @wasade, the taxonomy.tsv definitely has all the same features as the .biom file, and did not stem from the collapse but from the original OTU features. They have the same number of lines as well.

Any other ideas of why the files look the same with the add-metadata function?

@wasade thanks for asking! I believe all q2-feature-classifier classifiers do report unclassified features (as "unclassified"). Sounds like @schmeltzy has ruled that out, anyway.

@schmeltzy what is the precise command that you are using to summarize your biom tables? It looks like you have shown us the add-metadata command but not the summarize command that you are using.

Perhaps you could also share the first few lines of this summary — with anonymized sample IDs if necessary. I certainly understand if you are not authorized to share (e.g., if patient information is involved) but a redacted sample output could help us diagnose this issue even if we are unable to see the complete file.

I've been having the same issue. Using the command suggested above biom add-metadata -i feature-table.biom -o table.w_omd.biom --observation-metadata-fp taxonomy.tsv --observation-header taxonomy and variations tried previously, the resulting biom table does not seem to have taxonomy. I checked it by converting to a .tsv using biom convert -i table.w_omd.biom -o table_w_tax.tsv --to-tsv --header-key taxonomy which may not be the best, but it worked for the qiime1 file I tried it on. I've shared my files, hopefully they help.

Using 'biom add-metadata -i feature-table.biom -o tax_table2.biom --observation-metadata-fp taxonomy.tsv --observation-header OTUID,taxonomy,confidence' I get this traceback:

Traceback (most recent call last):
File "/usr/bin/biom", line 9, in
load_entry_point('biom-format==2.1.6', 'console_scripts', 'biom')()
File "/usr/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/biom/cli/metadata_adder.py", line 114, in add_metadata
write_biom_table(result, fmt, output_fp)
File "/usr/lib64/python2.7/site-packages/biom/cli/util.py", line 35, in write_biom_table
table.to_hdf5(f, biom.parse.generatedby())
File "/usr/lib64/python2.7/site-packages/biom/table.py", line 4123, in to_hdf5
self.group_metadata(axis='observation'), 'csr', compression)
File "/usr/lib64/python2.7/site-packages/biom/table.py", line 4095, in axis_dump
formatter[category](grp, category, md, compression)
File "/usr/lib64/python2.7/site-packages/biom/table.py", line 326, in vlen_list_of_str_formatter
"below:\n%s" % (header, md[0][header]))
TypeError: Category 'taxonomy' is not formatted properly. The most common issue is when 'taxonomy' is represented as a flat string instead of a list. An attempt was made to split this field on a ';' to coerce it into a list but it failed. An example entry (which is not assured to be the problematic entry) is below:
k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae

Using 'biom add-metadata -i feature-table.biom -o tax_table2.biom --observation-metadata-fp taxonomy.tsv --observation-header OTUID,taxonomy,confidence --sc-separated taxonomy' I get the following traceback which seems to match schmeltzy:

Traceback (most recent call last):
File "/usr/bin/biom", line 9, in
load_entry_point('biom-format==2.1.6', 'console_scripts', 'biom')()
File "/usr/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/biom/cli/metadata_adder.py", line 114, in add_metadata
write_biom_table(result, fmt, output_fp)
File "/usr/lib64/python2.7/site-packages/biom/cli/util.py", line 35, in write_biom_table
table.to_hdf5(f, biom.parse.generatedby())
File "/usr/lib64/python2.7/site-packages/biom/table.py", line 4123, in to_hdf5
self.group_metadata(axis='observation'), 'csr', compression)
File "/usr/lib64/python2.7/site-packages/biom/table.py", line 4095, in axis_dump
formatter[category](grp, category, md, compression)
File "/usr/lib64/python2.7/site-packages/biom/table.py", line 274, in general_formatter
data=[m[header].encode('utf8') for m in md],
AttributeError: 'NoneType' object has no attribute 'encode'

files.zip.tar.gz (709.0 KB)

2 Likes

Thanks, @eandersk and @schmeltzy!

From examining the files from @eandersk, it appears that there are entries in the feature-table.biom that are not present in the taxonomy.tsv file. The commands I used to determine this are below. To help detect this issue in the future, I've opened a PR with biom-format to improve the error checking here. It's not clear to me at this time why the taxonomy file is not in sync with the feature-table.biom, and whether this is an expected scenario. @Nicholas_Bokulich, given the result here, do you think we should open an investigative issue with q2-feature-classifier?

@schmeltzy, would it be possible to re-verify the ids in the taxonomy file overlap with the feature table exactly? It's entirely plausible that a different problem is causing the exception you're observing, but I want to make sure we rule this issue out first.

# grab the feature IDs from the BIOM table and sort
$ biom table-ids -i feature-table.biom --observations | sort - > features.ids

# grab the feature IDs from the taxonomy file, omit any comments / headers, and sort
$ cut -f 1 taxonomy.tsv | grep -v "^#" | sort - > taxonomy_features.ids

# ask what appears to be different between the ID files
$ diff features.ids taxonomy_features.ids
3823d3822
< bcb6d7b3650badd2be8198244c8310b4
4000d3998
< c5930b6ac06a91444e31e63a68133993
4067d4064
< c894fc908625a3e7df8ac644ffa215e0

Best,
Daniel

2 Likes

So this is kind of weird, but I tried this again on my files and here is my output for the following commands:
biom summarize-table -i feature-table.biom | head
Num samples: 245
Num observations: 1,824
Total count: 245
Table density (fraction of non-zero values): 0.018

Counts/sample summary:
Min: 1.000
Max: 1.000
Median: 1.000
Mean: 1.000

Then:
wc -l taxonomy.tsv
2017 taxonomy.tsv

So, somehow now my taxonomy file has more lines in it than the feature-table has OTUs? Also, the way my feature-table.biom looks is that the rows of OTUs don't have a specific identifier (eg sample name) but rather have the full OTU sequence as the "sample ID"

@wasade Thanks for your help, I think there was a problem up-stream of this with some filtering I was doing. I've straightened that out and now get the expected results complete with taxonomy.

2 Likes