Is there qiime2 function to merge biological sample with similar feature/group?

Diki · September 30, 2021, 6:30am

Hi everyone,

Is there a function to merge biological sample with similar category or feature?
For example, I want to merge the following data based on group column. So that I will only have 9 rows only.

SampleID	subject	group	genotype	age	batch	subjectwhen
#q2:types	categorical	categorical	categorical	categorical	categorical	categorical
W01E08	W01	Wt08	Wildtype	8	1	W01Age8
W02E08	W02	Wt08	Wildtype	8	1	W02Age8
W03E08	W03	Wt08	Wildtype	8	1	W03Age8
W04E15	W04	Wt15	Wildtype	15	1	W04Age15
W05E15	W05	Wt15	Wildtype	15	1	W05Age15
W06E15	W06	Wt15	Wildtype	15	1	W06Age15
W01E16	W01	Wt16	Wildtype	16	2	W01Age16
W02E16	W02	Wt16	Wildtype	16	2	W02Age16
W03E16	W03	Wt16	Wildtype	16	2	W03Age16
W04E23	W04	Wt23	Wildtype	23	2	W04Age23
W05E23	W05	Wt23	Wildtype	23	2	W05Age23
W06E23	W06	Wt23	Wildtype	23	2	W06Age23
T01E08	T01	Tg08	Transgenic	8	4	T01Age8
T02E08	T02	Tg08	Transgenic	8	4	T02Age8
T03E08	T03	Tg08	Transgenic	8	4	T03Age8
T04E15	T04	Tg15	Transgenic	15	1	T04Age15
T05E15	T05	Tg15	Transgenic	15	1	T05Age15
T06E15	T06	Tg15	Transgenic	15	1	T06Age15
T01E16	T01	Tg16	Transgenic	16	2	T01Age16
T02E16	T02	Tg16	Transgenic	16	2	T02Age16
T03E16	T03	Tg16	Transgenic	16	2	T03Age16
T04E23	T04	Tg23	Transgenic	23	2	T04Age23
T05E23	T05	Tg23	Transgenic	23	2	T05Age23
T06E23	T06	Tg23	Transgenic	23	2	T06Age23
T07E70	T07	Tg70	Transgenic	70	2	T07Age70
T08E70	T08	Tg70	Transgenic	70	2	T08Age70
T09E70	T09	Tg70	Transgenic	70	2	T09Age70
T10E70	T10	Tg70	Transgenic	70	2	T10Age70
T11E70	T11	Tg70	Transgenic	70	2	T11Age70
T12E70	T12	Tg70	Transgenic	70	2	T12Age70

Thank you very much!

timanix · September 30, 2021, 7:36am

Hello, you can group your samples based on the metadata column. Notice, that group names will became sample IDs in grouped table and you will need a new metadata file.

Diki · September 30, 2021, 8:19am

Hi,
Thank you, it works!

This was the command I used:

qiime feature-table group \
  --i-table table.qza \
  --p-axis sample \
  --m-metadata-file metadata.tsv \
  --m-metadata-column group \
  --p-mode sum \
  --o-grouped-table table-merged.qza

A little bit extension, can I generate something like psmelt() function in phyloseq package with this new table-merged.qza? The output was metadata table with all feature columns.
Because I think it is a little bit hard to see what is inside the .biom file inside the artifact.

Thank you

timanix · September 30, 2021, 8:42am

You can export your feature table as a biom and then convert it to a tsv table to take a closer look on feature counts.
biom convert -i feature-table.biom -o feature-table.tsv --to-tsv

BTW, it will summarize all counts for features within groups. As an alternative, you can use 'mean-ceiling' to get mean values instead

Diki · October 1, 2021, 6:55am

Hi, thank you for your reply.

I will use new table-merged.qza artifact to redo the analysis following moving pictures tutorial. But why can't I summarize it with this command?

qiime feature-table summarize \
  --i-table table-merged.qza \
  --o-visualization table-merged.qzv \
  --m-sample-metadata-file metadata.tsv

Is it because of the old metadata.tsv? Is there a method to also modify the metadata.tsv through qiime2?

Yes I want it that way. Is this uncommon? In my simple mind, I would like them to add up since they are combined, right?

timanix · October 1, 2021, 7:01am

Hello!
As I wrote earlier, you will need a new metadata file

That's right, just checking if it is a behavior you wanted.

Another concern is that if you want to compare different groups, you need several samples in each group to be able to perform stat. analysis. Usually grouped tables are used for taxabarplots. But to do most of the tests between groups, you will need your original table.

Diki · October 1, 2021, 7:09am

Ah you are right. How about longitudinal pairwise difference? Is it okay to use grouped tables? Let's say we can't ensure that subject column is reliable (i.e. we confused some individual during sampling).
Thank you for telling me this.

timanix · October 1, 2021, 7:18am

I am affraid, that still you will not have enough of samples to perform the analysis.
It would be better to double check everything to exclude errors in the metadata file. If other persons were responsible for sampling/processing, you can also consult with them.

Diki · October 1, 2021, 9:11am

Hi, thank you for your response.

How about pre-post treatment comparison? Will that be enough? Only two timepoints are needed, right?
Say, from above data, I want to compare grouped Wt08 to grouped Wt16. Can qiime longitudinal pairwise-differences tell us the difference in shannon_entropy metric?
Or is it okay if the subject column is not reliable, because we would treat them as biological replicates anyway? Then therefore shouldnt be group?

timanix · October 1, 2021, 9:21am

You still will have only 2 samples to compare since you grouped all replicates. It is not enough.

Here I can not say since I do not know if samples were messed up or not. So it is up to you to decide if you can trust the output or not based on your knowledge abput sampling / processing.

timanix · October 1, 2021, 9:27am

If you sure that the 'subject' column can not be trusted, you still can compare alpha diversity metrics by Kruskal Wallis test. It will consider groups as independent groups of samples and do not require individual ID column (subject).

Diki · October 1, 2021, 9:56am

Thank you for your reply.

Out of curiosity, how many samples is enough?

Thank you!

timanix · October 1, 2021, 10:09am

Never thought about it. I guess it will work with at least 3 samples in a group (not sure). But whether statistical power will be sufficient or not is completely another question.

Diki · October 5, 2021, 7:41am

Thank you for your answer!

This Kruskal Wallis uses shannon_entropy as measurement right? Can I do the same with distance? I want to know, given matrix (or dataframe) with sample as rows and metadata+features as columns, how different each row against another.

I am familiar with clustering method like PCA, but they do not give you some significance, right?

Kindly correct me if I am wrong. Thank you.

timanix · October 5, 2021, 8:03am

It will use any alpha diversity metric you provide as an input (shannon, evenness, ...).

You can use these plugins for beta diversity analysis:
beta-group-significance
adonis

Yeah, but usually you would like to have both, PCoA plots and permanova-like analysis.

Please, next time create a new topic if the subject of questions changed.

Diki · October 6, 2021, 4:25am

I am sorry.

Thank you for your explanation!

system · November 6, 2021, 10:26am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.