assign metadata to taxonomy.tsv

Maxy · January 31, 2020, 6:55pm

I have exported a taxonomy to a tsv it looks like this:

Feature ID	Taxon	Consensus
8baa954e2fd980ef025d2fd1ab21d690118ca4ee	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__	1.0
6f7772dd6c2cdb55cd0ce09389612d1f99c17dd9	k__Bacteria; p__Tenericutes; c__Mollicutes; o__RF39; f__; g__; s__	1.0
a7ffe8d90f12d8e964875d5bf172f56f0e75ddd8	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__Faecalibacterium; s__prausnitzii	1.0
bb6025d4ad05b4881e9df78035a8017c4d62a275	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__Ruminococcus; s__	1.0
3901befced23c69200fa887df387e5e387f87e51	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__	1.0
ffbfaf230b3ea67024f2289dadb4e8504cf93127	k__Bacteria; p__Tenericutes; c__Mollicutes; o__RF39; f__; g__; s__	1.0
6a33f442b2d30128a26bee69698af905d9817522	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__	1.0

I would like to figure out which of my original samples those feature IDs came from. Can someone PLEASE step me through how I assign those IDs back to the original samples/metadata they came from. I don't understand the biom tutorial and how you link the metadata back to this tsv. I just want to know which of my samples these reads came from. Sidenote, why is it not possible to export with the metadata attached to each read?

Maxy · January 31, 2020, 7:03pm

I just want each of those lines to have sample information included

lca123 · January 31, 2020, 7:28pm

Hey there,
Not sure if I well understood your question, but have a look in my answer here to find if it helps you.

You'll end up with a table with counts per sample for each feature and those counts represent the number of reads in that sample that are equal to the feature.
Cheers,

thermokarst · January 31, 2020, 8:06pm

Can you explain this a bit more? I am not sure I understand what you mean here.

Maxy · January 31, 2020, 10:44pm

Thank you for getting back to me. I realized I am actually quite confused, everyone only ever shows the top-most of the table, but in fact if I just grep the ID there is a second entry that has the assigned sample the read is from.

2a615d34f6e059d96b1a33429d4722eebd08607e	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__; s__	1.0
f390ed661d3be54e7669bcd9f3b17603ac5d6d0b	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__	1.0
c58224bf9f3e9d1a7f42f30ca665502b674754d0	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__	1.0
bba07452c44417206bd6379d2c025cd9b9587e53	k__Bacteria; p__Tenericutes; c__Mollicutes; o__RF39; f__; g__; s__	1.0
8baa954e2fd980ef025d2fd1ab21d690118ca4ee 107_F7_15_0	Unassigned	0.0
3bc727c6b68111215709b88c0912da7a008a16f3 107_F7_15_1	Unassigned	0.0
6f7772dd6c2cdb55cd0ce09389612d1f99c17dd9 107_F7_15_10	Unassigned	0.0
b4669c26b6a777c4170ec6533148e3faca6aaa09 107_F7_15_100	Unassigned	0.0
a7ffe8d90f12d8e964875d5bf172f56f0e75ddd8 107_F7_15_101	Unassigned	0.0
bb6025d4ad05b4881e9df78035a8017c4d62a275 107_F7_15_102	Unassigned	0.0
3901befced23c69200fa887df387e5e387f87e51 107_F7_15_103	Unassigned	0.0

thermokarst · February 4, 2020, 6:27pm

No problem!

To clarify: the second column is not the sample that the read is from - this is the taxonomic identifier that was found to match the sequence defined by the Feature ID in the first column.

There is no information in this FeatureData[Taxonomy] regarding the Sample Metadata - to do any cross-referencing with your samples you will need to use the FeatureTable[Frequency | etc] - the feature table is a 2D data type, one axis is the features in the analysis, while the other axis are the samples in the analysis (this is essentially a contingency matrix).

Hope that helps!

system · March 7, 2020, 12:27am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.