Is there a function to merge biological sample with similar category or feature?
For example, I want to merge the following data based on group column. So that I will only have 9 rows only.
Hello, you can group your samples based on the metadata column. Notice, that group names will became sample IDs in grouped table and you will need a new metadata file.
qiime feature-table group \
--i-table table.qza \
--p-axis sample \
--m-metadata-file metadata.tsv \
--m-metadata-column group \
--p-mode sum \
--o-grouped-table table-merged.qza
A little bit extension, can I generate something like psmelt() function in phyloseq package with this new table-merged.qza? The output was metadata table with all feature columns.
Because I think it is a little bit hard to see what is inside the .biom file inside the artifact.
You can export your feature table as a biom and then convert it to a tsv table to take a closer look on feature counts. biom convert -i feature-table.biom -o feature-table.tsv --to-tsv
BTW, it will summarize all counts for features within groups. As an alternative, you can use 'mean-ceiling' to get mean values instead
Hello!
As I wrote earlier, you will need a new metadata file
That's right, just checking if it is a behavior you wanted.
Another concern is that if you want to compare different groups, you need several samples in each group to be able to perform stat. analysis. Usually grouped tables are used for taxabarplots. But to do most of the tests between groups, you will need your original table.
Ah you are right. How about longitudinal pairwise difference? Is it okay to use grouped tables? Let's say we can't ensure that subject column is reliable (i.e. we confused some individual during sampling).
Thank you for telling me this.
I am affraid, that still you will not have enough of samples to perform the analysis.
It would be better to double check everything to exclude errors in the metadata file. If other persons were responsible for sampling/processing, you can also consult with them.
How about pre-post treatment comparison? Will that be enough? Only two timepoints are needed, right?
Say, from above data, I want to compare grouped Wt08 to grouped Wt16. Can qiime longitudinal pairwise-differences tell us the difference in shannon_entropy metric?
Or is it okay if the subject column is not reliable, because we would treat them as biological replicates anyway? Then therefore shouldnt be group?
You still will have only 2 samples to compare since you grouped all replicates. It is not enough.
Here I can not say since I do not know if samples were messed up or not. So it is up to you to decide if you can trust the output or not based on your knowledge abput sampling / processing.
If you sure that the 'subject' column can not be trusted, you still can compare alpha diversity metrics by Kruskal Wallis test. It will consider groups as independent groups of samples and do not require individual ID column (subject).
Never thought about it. I guess it will work with at least 3 samples in a group (not sure). But whether statistical power will be sufficient or not is completely another question.
This Kruskal Wallis uses shannon_entropy as measurement right? Can I do the same with distance? I want to know, given matrix (or dataframe) with sample as rows and metadata+features as columns, how different each row against another.
I am familiar with clustering method like PCA, but they do not give you some significance, right?