How to create a list of all observed features (ASVs) to compare it with core-features

Hi all,

I am currently working on dataset of human gut microbiomes, and I want to check how pooling samples affect the microbiota obtained from individual people. I have samples from 16 people and 3 samples of pooled microbiota from all 16 people.

I would like to create Venn diagram comparing ASVs of “Individual core microbiome”, “Pooled core microbiome”, “Individual total microbiome” and “Pooled total microbiome”.

I was able to obtain to get a list of ASV “Individual core microbiome” and “Pooled core microbiome”, but I have a problem in creating a list of all observed features from “Individual total microbiome” (all ASVs from samples from individual people) and “Pooled total microbiome” (all ASV from samples pooled). I only get a number of observed_features in every sample, but I don’t know how to create a list of all observed features to compare it with core-features.

Any info would be much appreciated.

Thank you for any help you can offer!


core-features-1.000-indyvidual.tsv (987 Bytes) core-features-1.000-mix.tsv (6.7 KB)

Hi @Paulina_Srednicka ,
Welcome to the forum!
One option will be to use metadata based filtering (check out this tutorial) to filter your feature table to contain only samples from one individual and use all remained features from this filtered tables as a complete list of all features detected in samples from this individual.

If you are good in scripting (R, Python) it may be easier to convert a feature table to .tsv, read it to a dataframe, merge with metadata file and use dataframe filtering methods to subdivide dataframe by individuals and get lists of features for each.

In addition, I got a hint that you can use Jaccard distances (presence/absence) for this purposes and the distribution of within vs between individual distance, but it may be a little bit tricky.


Hi @timanix ,
Thank you for quick response :slight_smile:

As you said I used feature table (containig samples from all individuals, because I want to compere all observed ASV from all people to pooled samples) to make Venn diagram.

But suprisingly, about 900 ASVs from pooled samples are not found in the individual samples :astonished: Is it even possible? Or I'm doing something wrong :see_no_evil:

Hi @Paulina_Srednicka
Sorry for a long silence.

Did you filter your feature table after Dada2 to remove abnormally rare features? Like features with frequencies lower than (some threshold)?

Hi @timanix,
thank you for response :slight_smile:

No, I didn't filter my feature table. Should I? What is the appropriate threshold?

It is recommended to filter rare (relatively some threshold) features after Dada2 since they most probably represents some errors.
I prefer to remove all sequences with frequencies < 50 but you can decrease or increase this number depending on your preferences.

Ok, I will. Thank you very much for your help :slight_smile:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.