Filter-features and important steps for core-features

joaomiranda · December 10, 2021, 9:49pm

Hi everyone,

I need some help in core-features. I have just ran my diversity analysis and now I'm stucked in how to proceed in the core microbiome analysis.

First: I think I'll run filter-samples, filtering all samples with fewer features than the rarefaction threshold that I've estabilished in core-metrics-phylogenetic

Second: I don't know if I have to filter features based in contingency (eliminating singletons) or use total-frequency-based filtering. This step I consider important, because I don't want rare features making some bias in my core microbiome analysis

Third: After filter my table correctly I'll run core-features. Then it's confused a bit how to proceed with the outputs, because I want to make Venn diagrams.
So I looked to this thread and maybe @timanix could help me in this step.

Additionally, I'm working with ASVs (I've runned DADA2)

timanix · December 11, 2021, 2:18pm

Hello!

This step is optional and depends on what you want to achieve and how many samples you will lose on it. If you are satisfied with the number of samples that will retain than you can proceed like this to be consistent with diversity analysis.

That's also depends on you and you can choose any approach. Usually I filter all bacterial ASVs that are found in less than 3-5 samples and with total frequency less than 100 (not a standard of any kind). But you can filter based on the ratio, % and prevalence.

You will need to filter your table based on the groups you want to compare, run core-features for each table and choose your threshold for a % (100, 90 or other). Then you can use this ASVs to create venn diagrams in R, Python or using the online tools by the link in the post on which you referred already.

joaomiranda · December 14, 2021, 3:02am

Hi @timanix ,

Thank you a lot for your prompt response!

This step is optional and depends on what you want to achieve and how many samples you will lose on it. If you are satisfied with the number of samples that will retain than you can proceed like this to be consistent with diversity analysis.

Here I filtered my samples with the threshold equal to the one I used in my rarefaction, I lost very few features and 4 samples. filtered-new-table.qzv (544.4 KB)

That's also depends on you and you can choose any approach. Usually I filter all bacterial ASVs that are found in less than 3-5 samples and with total frequency less than 100 (not a standard of any kind). But you can filter based on the ratio, % and prevalence.

I tried to filter with the contingency approach, because I checked that most of my samples with a low frequency were singletons. So I decided to filtered singletons, but my resulted table shows that I've retained ~600 features, it's a massive lost. filtered-decon-table.qzv (450.8 KB)

If I don't remove these singletons would that affect my venn diagram a lot?

timanix · December 14, 2021, 7:58am

You can filter by absolute count to remove singletons instead

You may end up with a lot of ASVs that are unique to certain group. Singletons are potential errors and I would prefer to remove them unless there is some special interest in them.

joaomiranda · December 14, 2021, 12:26pm

Hello again @timanix ,

You can filter by absolute count to remove singletons instead

I dont' know if I understand what is a filter by absolute count, so I tried to filter features with a total frequency below 100, and again I have a massive lost of features (around ~460 ASVs). filtered-new-table_2.qzv (441.1 KB)

You may end up with a lot of ASVs that are unique to certain group. Singletons are potential errors and I would prefer to remove them unless there is some special interest in them.

If it's okay to go ahead with this low number of ASVs then I'll be comfortable, but I think it's weird.

It is noteworthy that my table also went through other filters, I ran cutadapt , dada2 denoise single and exclude-seqs

Even after these filters, can these singletons that remain still be considered errors that can bias my results?

timanix · December 14, 2021, 12:35pm

Looks like 100 is too strict for your data, but you always can change this parameter to 10 or whatever number looks better for you.

It can be, so you need to decide which threshold to apply for filtering to get rid of too rare ASVs.

joaomiranda · December 17, 2021, 2:55pm

Thanks a lot!

I decided together with my group the best threshold to apply to my data for this core microbiome analysis.

system · January 17, 2022, 8:56pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.