I need some help in core-features. I have just ran my diversity analysis and now I'm stucked in how to proceed in the core microbiome analysis.
First: I think I'll run filter-samples, filtering all samples with fewer features than the rarefaction threshold that I've estabilished in core-metrics-phylogenetic
Second: I don't know if I have to filter features based in contingency (eliminating singletons) or use total-frequency-based filtering. This step I consider important, because I don't want rare features making some bias in my core microbiome analysis
Third: After filter my table correctly I'll run core-features. Then it's confused a bit how to proceed with the outputs, because I want to make Venn diagrams.
So I looked to this thread and maybe @timanix could help me in this step.
Additionally, I'm working with ASVs (I've runned DADA2)
This step is optional and depends on what you want to achieve and how many samples you will lose on it. If you are satisfied with the number of samples that will retain than you can proceed like this to be consistent with diversity analysis.
That's also depends on you and you can choose any approach. Usually I filter all bacterial ASVs that are found in less than 3-5 samples and with total frequency less than 100 (not a standard of any kind). But you can filter based on the ratio, % and prevalence.
You will need to filter your table based on the groups you want to compare, run core-features for each table and choose your threshold for a % (100, 90 or other). Then you can use this ASVs to create venn diagrams in R, Python or using the online tools by the link in the post on which you referred already.
This step is optional and depends on what you want to achieve and how many samples you will lose on it. If you are satisfied with the number of samples that will retain than you can proceed like this to be consistent with diversity analysis.
Here I filtered my samples with the threshold equal to the one I used in my rarefaction, I lost very few features and 4 samples. filtered-new-table.qzv (544.4 KB)
That's also depends on you and you can choose any approach. Usually I filter all bacterial ASVs that are found in less than 3-5 samples and with total frequency less than 100 (not a standard of any kind). But you can filter based on the ratio, % and prevalence.
I tried to filter with the contingency approach, because I checked that most of my samples with a low frequency were singletons. So I decided to filtered singletons, but my resulted table shows that I've retained ~600 features, it's a massive lost. filtered-decon-table.qzv (450.8 KB)
If I don't remove these singletons would that affect my venn diagram a lot?
You can filter by absolute count to remove singletons instead
You may end up with a lot of ASVs that are unique to certain group. Singletons are potential errors and I would prefer to remove them unless there is some special interest in them.
You can filter by absolute count to remove singletons instead
I dont' know if I understand what is a filter by absolute count, so I tried to filter features with a total frequency below 100, and again I have a massive lost of features (around ~460 ASVs). filtered-new-table_2.qzv (441.1 KB)
You may end up with a lot of ASVs that are unique to certain group. Singletons are potential errors and I would prefer to remove them unless there is some special interest in them.
If it's okay to go ahead with this low number of ASVs then I'll be comfortable, but I think it's weird.
It is noteworthy that my table also went through other filters, I ran cutadapt , dada2 denoise single and exclude-seqs
Even after these filters, can these singletons that remain still be considered errors that can bias my results?