FeatureTable[Frequency] with taxonomy names rather than codes

Sparkle · December 18, 2019, 9:47am

Good morning to everyone,
I'd like to obtain a FeatureTable[Frequency] object (like the one obtained from dada2 denoise-paired) with actual taxonomy identification rather than a self-assigned name made up of a sequence of numbers and letters (like the one below).

Of course, at this step of the analysis, taxonomic identification hasn't been performed yet, but it is done later, and I'm not sure how to obtain such table...

I've already checked out this previous thread and had already tried it as a part my pipeline, but I'd like to obtain a full file, without performing collapsing through qiime taxa collapse at any taxonomic level.

This would make the results of any further analysis definitely more readable, without having to group my taxa at a specified level (but having a general overview of how my taxa behave instead).

I've noticed the plugin qiime taxa barplot outputs a plot showing taxa rather than the previously mentioned codes, requiring a previously created taxonomy.qza file as input, but other plugins I used (like qiime taxa filter-table to remove mitochondrial and chloroplastic contaminant reads from my features) will just use taxonomy to perform filtering, still outputting a FeatureTable[Frequency] containing the codes.

In particular, I'd be interested in obtaining such table, free from contaminants, showing taxonomies rather than codes.

Is there any easy way to perform this, starting from the filtered table obtained from qiime taxa filter-table and the taxonomy.qza file resulting from qiime feature-classifier classify-sklearn?

Thanks in advance!

colinbrislawn · December 18, 2019, 2:59pm

Hello @Sparkle,

There are a lot of parts to this request. Let's take them one at a time.

Getting that full FeatureTable[Frequency] table with taxonomy names (instead of ASV IDs) could be done by exporting or extracting your qva file. This can be a bit tricky, but it should give you both the table with counts, and ASV names, and taxonomy names.

2nd question:

Do that filtering with qiime taxa filter-table first, then export / extract your table.

Let me know how this works for you!
Colin

Sparkle · December 18, 2019, 3:24pm

Good afternoon, and thanks for your reply!

I've tried both but apparently I obtained a .biom file in both cases.
What should I do with it?

Moreover, if I try to visualize the original filtered table (as follows), I still obtain ASV codes, so I guess I have to use somehow the taxonomy file too while exporting/extracting?.

qiime feature-table tabulate-seqs --i-data table-no-clo-mit.qza --o-visualization table-no-clo-mit.qzv

colinbrislawn · December 18, 2019, 3:37pm

You could convert the .biom table to a tsv file, if you wanted.
biom convert -i table.biom -o table.from_biom_w_taxonomy.txt --to-tsv --header-key taxonomy

That's right! You could export / extract the taxonomy table too, then merge your FeatureTable with your taxonomy table based on the ASV IDs.

Sparkle · December 18, 2019, 3:45pm

Thank you, I'll try this!

Merge through a QIIME plugin or something else?

colinbrislawn · December 18, 2019, 6:50pm

Something else. Say the biom-format package, R, Python, or a spreadsheet program like Excel or Google Sheets.
Both the tables will have the same ASVs IDs, so you can merge on this shared column.

Sparkle · December 19, 2019, 10:53am

I guessed something similar, and then re-importing them from .tsv to .biom again, and from .biom to .qza ?

I tried to convert a .txt to .biom through biom convert but got this.

biom.exception.TableException: Duplicate observation IDs

I guess this means that different ASVs could be associated to the same taxonomical identification.

And that I have to sum the abundances for taxonomies showing the same names and collpase such duplicate rows into just one before providing the txt to biom.exception again?

I checked on Excel and there were duplicate rows (same taxonomy).

timanix · December 19, 2019, 11:36am

That's right.
Now you can choose several options, for example:

To collapse table in Qiime2 and after it to edit biom table (replace long names with last available taxonomy level, such as genus).
To combine taxa and original hash names without collapsing table.
In my analysis, I chose second option to keep different ASVs rather than collapse it since each ASV can demonstrate different behavior and taxonomy resolution sometimes is too low for the analysis.

lca123 · December 19, 2019, 12:35pm

I do a few parses for having a table with counts and taxonomy assigned to each feature. As Timanix mentioned, I also like working with this table rather than a collapsed taxonomy table because there's lots of taxonomies assigned to more than one feature.
That's what I do:

export a feature-table:
qiime tools export --input-path feature-table.qza --output-path .

and convert .biom to .tsv:
biom convert --to-tsv -i feature-table.biom -o feature-table.tsv

tabulate taxonomies:
taxonomy.qza is the output from a classifier.
qiime metadata tabulate --m-input-file rep-seqs-from-dada2.qza --m-input-file taxonomy.qza --o-visualization viz.qzv

the viz.qzv has hash ids, DNA sequence and a taxonomy assigned to it, but still lack frequency per sample. So let's create a file from a few parses having all of it.

qiime tools export --output-path metadata --input-path viz.qzv
cp metadata/metadata.tsv .

Don't know why but join complains these 2 files are not sorted so wont join... as I checked once, they're sorted, but...
Let's also throw away the first line in both files because join can only understand one line as a header
sed -e '1d' feature-table.tsv | awk 'NR<2{print $0;next}{print $0|"sort"}' > a
sed -e '1d' metadata.tsv | awk 'NR<2{print $0;next}{print $0|"sort"}' > b
join --header a b > table-feature-counts-per-sample.tsv

turn it into a .tsv file:
sed -i 's/#OTU ID/#OTUID/g' table-feature-counts-per-sample.tsv
sed -i 's/ /\t/g' table-feature-counts-per-sample.tsv
sed -i 's/\./,/g' table-feature-counts-per-sample.tsv

bye bye lots of files:
rm -rf viz.qzv a b metadata metadata.tsv feature-table.biom feature-table.tsv

Opening your table-feature-counts-per-sample.tsv in Excel you'll find it:

Hope that helps. Cheers.

Nicholas_Bokulich · December 19, 2019, 4:51pm

@Sparkle @lca123,
Here is a tutorial that I think would generate an OTU table text file of the style @lca123 uses, but in what I think is a more streamlined way:

If you want sequence data in the table (as @lca123 has), you can follow the taxonomy annotation steps a second time, but with the representative sequences file.

I hope this helps.