How to create a list of all observed features (ASVs) to compare it with core-features

Paulina_Srednicka · July 28, 2021, 2:31pm

Hi all,

I am currently working on dataset of human gut microbiomes, and I want to check how pooling samples affect the microbiota obtained from individual people. I have samples from 16 people and 3 samples of pooled microbiota from all 16 people.

I would like to create Venn diagram comparing ASVs of “Individual core microbiome”, “Pooled core microbiome”, “Individual total microbiome” and “Pooled total microbiome”.

I was able to obtain to get a list of ASV “Individual core microbiome” and “Pooled core microbiome”, but I have a problem in creating a list of all observed features from “Individual total microbiome” (all ASVs from samples from individual people) and “Pooled total microbiome” (all ASV from samples pooled). I only get a number of observed_features in every sample, but I don’t know how to create a list of all observed features to compare it with core-features.

Any info would be much appreciated.

Thank you for any help you can offer!

Paulina

core-features-1.000-indyvidual.tsv (987 Bytes) core-features-1.000-mix.tsv (6.7 KB)

timanix · July 28, 2021, 2:44pm

Hi @Paulina_Srednicka ,
Welcome to the forum!
One option will be to use metadata based filtering (check out this tutorial) to filter your feature table to contain only samples from one individual and use all remained features from this filtered tables as a complete list of all features detected in samples from this individual.

If you are good in scripting (R, Python) it may be easier to convert a feature table to .tsv, read it to a dataframe, merge with metadata file and use dataframe filtering methods to subdivide dataframe by individuals and get lists of features for each.

In addition, I got a hint that you can use Jaccard distances (presence/absence) for this purposes and the distribution of within vs between individual distance, but it may be a little bit tricky.

Paulina_Srednicka · July 28, 2021, 4:06pm

Hi @timanix ,
Thank you for quick response

As you said I used feature table (containig samples from all individuals, because I want to compere all observed ASV from all people to pooled samples) to make Venn diagram.

But suprisingly, about 900 ASVs from pooled samples are not found in the individual samples Is it even possible? Or I'm doing something wrong

timanix · August 2, 2021, 8:16am

Hi @Paulina_Srednicka
Sorry for a long silence.

Did you filter your feature table after Dada2 to remove abnormally rare features? Like features with frequencies lower than (some threshold)?

Paulina_Srednicka · August 2, 2021, 9:00am

Hi @timanix,
thank you for response

No, I didn't filter my feature table. Should I? What is the appropriate threshold?

timanix · August 2, 2021, 9:02am

It is recommended to filter rare (relatively some threshold) features after Dada2 since they most probably represents some errors.
I prefer to remove all sequences with frequencies < 50 but you can decrease or increase this number depending on your preferences.

Paulina_Srednicka · August 2, 2021, 9:50am

Ok, I will. Thank you very much for your help

system · September 2, 2021, 3:51pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.