Merge metadata from BMR genomics

Costanza_Diliberto · November 22, 2024, 2:08pm

Hello, I am using conda on WSL (Ubuntu) and it is the second day I am using this feature so please be patient.

I think I need to manually build my sample-metadata.tsv file. From the sequencing I have obtained 3 files: 1542453F1533371.taxonomy_table.tsv, 1548529F1544165.taxonomy_table.tsv and 1548530F1544166.taxonomy_table.tsv.

I have used keemei for each single file and it said it worked, I have manually merged the files but I still have a problem as the sampleIDs are repeating.

This is my table summary:

and this is my taxonomy.qvz file.

The features are both 551, so they should be the same, but no matter how I merge those 3 files, I seem to not get the correct sample-metadata file.

The problem is that 120 entries are from sample 1548530F1544166 (so, same SampleID), 72 are from 1542453F1533371 and 118 are from 1548529F1544165.

How do I build the sample-metadata.tsv file?

Thank you in advance, I couldn't figure it out from other posts, and I'm sorry if I missed a similar topic.

Thank you in advance,
Costanza

jwdebelius · November 22, 2024, 6:11pm

Hi @Costanza_Diliberto,

The sample metadata file is (unfortunately) not something QIIME 2 can build for you. Your metadata is information about your samples, usually linked to your hypothesis, as well as information about sample collection and processing. This article talks about ways to design better metadata.

I work with humans, so my metadata usually includes information about their age, health status, sometimes diet or medications, and whatever I'm interested in studying. I also usually have information about how samples were collected, for example, did the person poop at home or at the clinic? Was it in a diaper or a potty hat? How long between when the sample was produced and when it was put into a preservative or freezer?

If you dont have this information in hand, now is the time to track it down! Your sequencing results will not make sense without it. I've worked for people who had the policy that they wouldn't sequence samples until they had metadata in hand because anything else was a waste of resources.

Best,
Justine