Is there qiime2 function to merge biological sample with similar feature/group?

Hi everyone,

Is there a function to merge biological sample with similar category or feature?
For example, I want to merge the following data based on group column. So that I will only have 9 rows only.

SampleID subject group genotype age batch subjectwhen
#q2:types categorical categorical categorical categorical categorical categorical
W01E08 W01 Wt08 Wildtype 8 1 W01Age8
W02E08 W02 Wt08 Wildtype 8 1 W02Age8
W03E08 W03 Wt08 Wildtype 8 1 W03Age8
W04E15 W04 Wt15 Wildtype 15 1 W04Age15
W05E15 W05 Wt15 Wildtype 15 1 W05Age15
W06E15 W06 Wt15 Wildtype 15 1 W06Age15
W01E16 W01 Wt16 Wildtype 16 2 W01Age16
W02E16 W02 Wt16 Wildtype 16 2 W02Age16
W03E16 W03 Wt16 Wildtype 16 2 W03Age16
W04E23 W04 Wt23 Wildtype 23 2 W04Age23
W05E23 W05 Wt23 Wildtype 23 2 W05Age23
W06E23 W06 Wt23 Wildtype 23 2 W06Age23
T01E08 T01 Tg08 Transgenic 8 4 T01Age8
T02E08 T02 Tg08 Transgenic 8 4 T02Age8
T03E08 T03 Tg08 Transgenic 8 4 T03Age8
T04E15 T04 Tg15 Transgenic 15 1 T04Age15
T05E15 T05 Tg15 Transgenic 15 1 T05Age15
T06E15 T06 Tg15 Transgenic 15 1 T06Age15
T01E16 T01 Tg16 Transgenic 16 2 T01Age16
T02E16 T02 Tg16 Transgenic 16 2 T02Age16
T03E16 T03 Tg16 Transgenic 16 2 T03Age16
T04E23 T04 Tg23 Transgenic 23 2 T04Age23
T05E23 T05 Tg23 Transgenic 23 2 T05Age23
T06E23 T06 Tg23 Transgenic 23 2 T06Age23
T07E70 T07 Tg70 Transgenic 70 2 T07Age70
T08E70 T08 Tg70 Transgenic 70 2 T08Age70
T09E70 T09 Tg70 Transgenic 70 2 T09Age70
T10E70 T10 Tg70 Transgenic 70 2 T10Age70
T11E70 T11 Tg70 Transgenic 70 2 T11Age70
T12E70 T12 Tg70 Transgenic 70 2 T12Age70

Thank you very much!

Hello, you can group your samples based on the metadata column. Notice, that group names will became sample IDs in grouped table and you will need a new metadata file.

1 Like

Hi,
Thank you, it works!

This was the command I used:

qiime feature-table group \
  --i-table table.qza \
  --p-axis sample \
  --m-metadata-file metadata.tsv \
  --m-metadata-column group \
  --p-mode sum \
  --o-grouped-table table-merged.qza

A little bit extension, can I generate something like psmelt() function in phyloseq package with this new table-merged.qza? The output was metadata table with all feature columns.
Because I think it is a little bit hard to see what is inside the .biom file inside the artifact.

Thank you

You can export your feature table as a biom and then convert it to a tsv table to take a closer look on feature counts.
biom convert -i feature-table.biom -o feature-table.tsv --to-tsv

BTW, it will summarize all counts for features within groups. As an alternative, you can use 'mean-ceiling' to get mean values instead

1 Like

Hi, thank you for your reply.

I will use new table-merged.qza artifact to redo the analysis following moving pictures tutorial. But why can't I summarize it with this command?

qiime feature-table summarize \
  --i-table table-merged.qza \
  --o-visualization table-merged.qzv \
  --m-sample-metadata-file metadata.tsv

Is it because of the old metadata.tsv? Is there a method to also modify the metadata.tsv through qiime2?

Yes I want it that way. Is this uncommon? In my simple mind, I would like them to add up since they are combined, right?

Hello!
As I wrote earlier, you will need a new metadata file

That's right, just checking if it is a behavior you wanted.

Another concern is that if you want to compare different groups, you need several samples in each group to be able to perform stat. analysis. Usually grouped tables are used for taxabarplots. But to do most of the tests between groups, you will need your original table.

1 Like

Ah you are right. How about longitudinal pairwise difference? Is it okay to use grouped tables? Let's say we can't ensure that subject column is reliable (i.e. we confused some individual during sampling).
Thank you for telling me this.

I am affraid, that still you will not have enough of samples to perform the analysis.
It would be better to double check everything to exclude errors in the metadata file. If other persons were responsible for sampling/processing, you can also consult with them.

1 Like

Hi, thank you for your response.

How about pre-post treatment comparison? Will that be enough? Only two timepoints are needed, right?
Say, from above data, I want to compare grouped Wt08 to grouped Wt16. Can qiime longitudinal pairwise-differences tell us the difference in shannon_entropy metric?
Or is it okay if the subject column is not reliable, because we would treat them as biological replicates anyway? Then therefore shouldnt be group?

You still will have only 2 samples to compare since you grouped all replicates. It is not enough.

Here I can not say since I do not know if samples were messed up or not. So it is up to you to decide if you can trust the output or not based on your knowledge abput sampling / processing.

If you sure that the 'subject' column can not be trusted, you still can compare alpha diversity metrics by Kruskal Wallis test. It will consider groups as independent groups of samples and do not require individual ID column (subject).

Thank you for your reply.

Out of curiosity, how many samples is enough?

Thank you!

Never thought about it. I guess it will work with at least 3 samples in a group (not sure). But whether statistical power will be sufficient or not is completely another question.

Thank you for your answer!

This Kruskal Wallis uses shannon_entropy as measurement right? Can I do the same with distance? I want to know, given matrix (or dataframe) with sample as rows and metadata+features as columns, how different each row against another.

I am familiar with clustering method like PCA, but they do not give you some significance, right?

Kindly correct me if I am wrong. Thank you.

It will use any alpha diversity metric you provide as an input (shannon, evenness, ...).

You can use these plugins for beta diversity analysis:
beta-group-significance
adonis

Yeah, but usually you would like to have both, PCoA plots and permanova-like analysis.

Please, next time create a new topic if the subject of questions changed.

I am sorry.

Thank you for your explanation!