FeatureTable[Frequency] with taxonomy names rather than codes

lca123 · December 19, 2019, 12:35pm

I do a few parses for having a table with counts and taxonomy assigned to each feature. As Timanix mentioned, I also like working with this table rather than a collapsed taxonomy table because there's lots of taxonomies assigned to more than one feature.
That's what I do:

export a feature-table:
qiime tools export --input-path feature-table.qza --output-path .

and convert .biom to .tsv:
biom convert --to-tsv -i feature-table.biom -o feature-table.tsv

tabulate taxonomies:
taxonomy.qza is the output from a classifier.
qiime metadata tabulate --m-input-file rep-seqs-from-dada2.qza --m-input-file taxonomy.qza --o-visualization viz.qzv

the viz.qzv has hash ids, DNA sequence and a taxonomy assigned to it, but still lack frequency per sample. So let's create a file from a few parses having all of it.

qiime tools export --output-path metadata --input-path viz.qzv
cp metadata/metadata.tsv .

Don't know why but join complains these 2 files are not sorted so wont join... as I checked once, they're sorted, but...
Let's also throw away the first line in both files because join can only understand one line as a header
sed -e '1d' feature-table.tsv | awk 'NR<2{print $0;next}{print $0|"sort"}' > a
sed -e '1d' metadata.tsv | awk 'NR<2{print $0;next}{print $0|"sort"}' > b
join --header a b > table-feature-counts-per-sample.tsv

turn it into a .tsv file:
sed -i 's/#OTU ID/#OTUID/g' table-feature-counts-per-sample.tsv
sed -i 's/ /\t/g' table-feature-counts-per-sample.tsv
sed -i 's/\./,/g' table-feature-counts-per-sample.tsv

bye bye lots of files:
rm -rf viz.qzv a b metadata metadata.tsv feature-table.biom feature-table.tsv

Opening your table-feature-counts-per-sample.tsv in Excel you'll find it:

Hope that helps. Cheers.