assign metadata to taxonomy.tsv

I have exported a taxonomy to a tsv it looks like this:

Feature ID Taxon Consensus
8baa954e2fd980ef025d2fd1ab21d690118ca4ee k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__ 1.0
6f7772dd6c2cdb55cd0ce09389612d1f99c17dd9 k__Bacteria; p__Tenericutes; c__Mollicutes; o__RF39; f__; g__; s__ 1.0
a7ffe8d90f12d8e964875d5bf172f56f0e75ddd8 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__Faecalibacterium; s__prausnitzii 1.0
bb6025d4ad05b4881e9df78035a8017c4d62a275 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__Ruminococcus; s__ 1.0
3901befced23c69200fa887df387e5e387f87e51 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae; g__; s__ 1.0
ffbfaf230b3ea67024f2289dadb4e8504cf93127 k__Bacteria; p__Tenericutes; c__Mollicutes; o__RF39; f__; g__; s__ 1.0
6a33f442b2d30128a26bee69698af905d9817522 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__ 1.0

I would like to figure out which of my original samples those feature IDs came from. Can someone PLEASE step me through how I assign those IDs back to the original samples/metadata they came from. I don’t understand the biom tutorial and how you link the metadata back to this tsv. I just want to know which of my samples these reads came from. Sidenote, why is it not possible to export with the metadata attached to each read?

I just want each of those lines to have sample information included

Hey there,
Not sure if I well understood your question, but have a look in my answer here to find if it helps you.

You'll end up with a table with counts per sample for each feature and those counts represent the number of reads in that sample that are equal to the feature.
Cheers,

Can you explain this a bit more? I am not sure I understand what you mean here.

Thank you for getting back to me. I realized I am actually quite confused, everyone only ever shows the top-most of the table, but in fact if I just grep the ID there is a second entry that has the assigned sample the read is from.

2a615d34f6e059d96b1a33429d4722eebd08607e k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__; s__ 1.0
f390ed661d3be54e7669bcd9f3b17603ac5d6d0b k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__ 1.0
c58224bf9f3e9d1a7f42f30ca665502b674754d0 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__ 1.0
bba07452c44417206bd6379d2c025cd9b9587e53 k__Bacteria; p__Tenericutes; c__Mollicutes; o__RF39; f__; g__; s__ 1.0
8baa954e2fd980ef025d2fd1ab21d690118ca4ee 107_F7_15_0 Unassigned 0.0
3bc727c6b68111215709b88c0912da7a008a16f3 107_F7_15_1 Unassigned 0.0
6f7772dd6c2cdb55cd0ce09389612d1f99c17dd9 107_F7_15_10 Unassigned 0.0
b4669c26b6a777c4170ec6533148e3faca6aaa09 107_F7_15_100 Unassigned 0.0
a7ffe8d90f12d8e964875d5bf172f56f0e75ddd8 107_F7_15_101 Unassigned 0.0
bb6025d4ad05b4881e9df78035a8017c4d62a275 107_F7_15_102 Unassigned 0.0
3901befced23c69200fa887df387e5e387f87e51 107_F7_15_103 Unassigned 0.0

No problem!

To clarify: the second column is not the sample that the read is from - this is the taxonomic identifier that was found to match the sequence defined by the Feature ID in the first column.

There is no information in this FeatureData[Taxonomy] regarding the Sample Metadata - to do any cross-referencing with your samples you will need to use the FeatureTable[Frequency | etc] - the feature table is a 2D data type, one axis is the features in the analysis, while the other axis are the samples in the analysis (this is essentially a contingency matrix).

Hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.