Hello everyone. Newbie here!
I hope I am in the right category and apologize in advance if I got it wrong.
I have some very basic questions that I cannot seem to clarify from researching on my own:
Quality scores: I see that this matter is very subjective and must be assessed by each researcher, but I wonder if anyone would recommend a tutorial or some kind of content from which I can learn how to make decisions about it.
Feature tables and metadata files: I cannot figure out a) how to make those, b) how necessary they are to the analyses, c) in what point of the workflow they should be used and d) what roles they play.
I have recently graduated in biological sciences and I was hired to learn to conduct 16s analyses on my own. I have no experience doing any of that, so I apologize for the basic questions. I would appreciate it a lot if anyone could land me a hand on those matters.
Welcome to microbiome analysis! These are good questions! If you haven't gone through already, you might find our tutorials or the online workshops useful.
It's not a tidy answer, but I would go through the forum, read posts,and see how people are making chocies. There are a lot of quality score plots here, and a lot of conversations around the plots. Honestly, this is a big piece of how I've been getting a feel for them.
But, let's talk about the real reason I'm here. (The other mods are rolling their eyes, and I've just poured myself a new cup of tea ).
Some of this is terminology...
Metadata (sometimes also called sample information, patient data, survey data, etc) is non-omics about your microbiome sample.
The feature table tells you which things are in a specific sample, and how many they are. (We use "feature" because we might represent the data as an ASV, an OTU, a genus, a gene, etc).
Let's talk about the feature table first!
How do you make it?
Your feature table comes out of the denosing or OTU clustering pipeline. (Go back to the PD mouse tutorial, or one of the other tutorials for specific instructions.) Typically, this process is going to involve importing your data, doing to QC, and then denoising and/or clustering.
How necessary is it are to the analyses?
I'd argue a feature table is kind of the first point in a microbiome analysis. You've gotten to the point where you're mapping the members of the community to the samples. The feature table is what you use to build other analyses, like diversity, and it goes directly into differential abundance.
If you're working at a sequencing company, this is one of the main bioinformatic products you will produce.
If you're handing the work off to an analyst, this is the point the statistician will wander over and think you might have data.
At what point in the workflow should you use it?
The is kind of the mid-point in the workflow. Like, you get your data, you process it to a feature table, and then you get to actually do analysis. So, once you have it, it's the basis for pretty much everything you'll do down the line. (Well, the feature table, and accompanying representative sequences whcih get you to a tree and taxonomy.
And, okay... metadata - the non-microbiome sample information.
How do you make it?
This should have been planned during the experimental design phase. Usually, it's collected around when the sample was collected (although this semi depends on the study design.) What you need will depend on your reserach question. I study humans , so I want information like age, sex, diet, drug use, and medical history. Colleagues who work with mice often collect things like cage, genotype, diet, and/or treatment. Enviromental studies might look at emperature, pH, rainfall... it will vary based on your question.
Your metadata will probably also contain information about how the sample was processed: what extraction kit did you use? Where did it sit in the extraction plate? Who did the work?
There are some community standard for how metadata should be formatted, as well as a new US government effort to make it more accessible. I'd highly recommend looking at the NMDC page as well as MixS.
How necessary is it to the analysis?
If you want to answer a biologically hypothesis, it's essential. Without metadata, microbiome analysis becomes augury a few steps removed from the bird , super expensive palmstry , or very creative story telling .
If you're shipping the data off to someone else, they probably have the metadata already, and then you dont personally need it.
At what point in the workflow should it be used?
This comes in when you start to visualize and analyze the statistical aspects of the data. You need it to answer your biological question. So, once you have your feature table, you also need your metadata to do science!
Hopefully this helps; there are a lot more videos about these topics in the workshop I linked.
Thank you so much Justine! I have no words for the thoughtfulness of your reply. I appreciate it a lot!
I will read it again (several times) and follow your instructions!
I work for a company that produces phytobiotic additives for livestock animals and we are investigating the effects of those products on the animals' gut microbiota (bacteria) from fecal samples. We started the project working with a sequencing company that provided us lists of the taxa found in the samples. Now I have this taxonomy that tells me how many reads each taxa got (xlxs files) and all the FASTQ files. I want to learn how to analyze the microbial community (starting with diversity analyses for instance) from the FASTQs.
So I'm mapping the bacterial taxa to the fecal samples they came from? Or did I misunderstand it?