in my analyses I am looking at four different datasets of cucumber microbiome using 16s rRNA sequences. I have merged all the feature tables and managed to see the taxonomy barplot with all the studies, However I want to look into the details. The first think I am looking for is the similarities among the datasets. Firstly I want to know the family taxa they all have in common. I can easily see the most abundant ones they have in common just by looking at the taxa plot, however I want to look deeper and get an average % of the relative abundance across samples in each study for these family taxa, is that a way I can do that?
for example in the image above you can see that Rhizobiaceae is present in different percentage in each sample within this one study. Is that a feature I can use to obtain the average of all these samples so I can make a comparison with the other studies?
I have another question under the same topic
within my 4 datasets I have a suspicion that 2 of them are more similar than between themselves than in comparison to the other 2, is that a way I can make that comparison that you guys know of?
I think that grouping your table by "study" column in your metadata can help you to get the barplot with samples averages (average relative abundances) by study.
Maybe PCoA from core-diversity metrics or biplot emperor visualizations from diversity plugins are what you are looking for. They are great in demonstrating similarities/dissimilarities between communities. You also can perform beta-group-significance test.
I am looking into grouping, and I am interested at looking at 'sample.collection' column of my metadata, but I am stuck, this is the output I am getting:
Hi, Lily!
Sorry I forgot to mention it before, but to obtain a grouped taxa barplot you also need to collapse your metadata file. Just create a new metadata, in which instead of ids in "sample-id" column you will have your variables from "sample-collection" column. Of course, this new metadata will be shorter, since each variable in the "sample-id" column in your new metadata should be unique. Others columns are not very important on this step since you need this new metadata only for barplot creation to bypass the error.
one thing I noticed is that in the example I am following for the command line, there is a "k_bacteria" instead of "d_Bacteria" i guess k stands for kingdom?, but dont know what d stands for...
Double check the spelling, it looks like you need double underscore - you are using "_" in the command, while in the taxonomy file it is provided as "__". Or you used two underscores? Hard to tell by screenshot