ANCOM tutorial — do I need to filter my table?

Nisha · March 3, 2020, 5:04am

I have found the mistake.. Thank you.

Also I wanted to ask that in differential abundance testing with ANCOM, there is filtration step to get only gut sample.

Since I have only gut microbiome data so can I use my same table.qza file and I will skip this step..?

timanix · March 3, 2020, 6:00am

Hi! Filtration step is just an example and if you don't need to filter your samples, just skip it. Or you can filter by other parameter, if you need it for the analysis.

jwdebelius · March 3, 2020, 7:28pm

Hi @Nisha,

I like filtering my features before ANCOM, but not necessarily my samples. So, I tend to get rid fo things tthat don't have a lot of counts or aren't present in many samples because they tend to be "different" and add noise.

Best,
Justine

Nisha · March 4, 2020, 7:51am

hi... i think filteration step should be provided after the generation of feature table and featureData summary. Since I am new to qiime2, i followed the tutorial and just before ANCOM I got to know about filteration. now I am finding it necessary to filter just after dada2..
please comment

regards

Nisha · March 4, 2020, 7:51am

hello @jwdebelius
you mean you will drop those samples that have less sequence counts?

jwdebelius · March 4, 2020, 7:57am

Hi @Nisha,

No, I do at least a two part filtration. First, I drop anything with counts below my rarefaction depth because those are deemed "bad quality" samples for thsi study's definition of bad quality.

Then, I double check my samples in PCoA space and may drop samples which do not cluster, period. I may also filter at this step to spit or remove samples that aren't relevant to my current analysis. (For example, sometimes people will send samples about both and but only want to look at so then we filter.)

Third, before feature-based analysis, I filter my table to get rid of anything with less than (1/rarefaction depth) in less than 10% of my communities. My suggestion here, since the joint filtering isnt implemented in qiime (yet... its on my list), to first filter out any feature present in fewer than 10% of your samples and see where that gets you.

This decreases the over all number of features you test while discarding things that are likely either noise or underpowered.

Best,
Justine

Nisha · March 4, 2020, 12:49pm

@jwdebelius
I was able to drop 1 sample in rarefaction analysis.
and minimum frequency per feature is 1, this I got to know just before ANCOM, so should I filter this data for low abundance features and re-run all steps???
because I used the same table.qza file for all steps...

regards

timanix · March 4, 2020, 1:07pm

Hi!
In my opinion, you can do it or remove low abundant features before ANCOM and use this table only for ANCOM if you don't want to redo all the steps you already did. But I want to see what @jwdebelius will answer

jwdebelius · March 4, 2020, 1:55pm

Hi @Nisha,

No, I pass a table where I haven't filtered features based on prevelance/abundance into rarefaction and then filter my features before I do differential abundance.

Best,
Justine

jwdebelius · March 6, 2020, 12:37pm

A post was split to a new topic: Genus-level boxplots

jng · March 19, 2020, 4:23pm

Hi @jwdebelius

Thank you for providing details on how to filter our tables before ANCOM . I have currently filtered out my features present in less than 10% of my samples.

However, I was also following the ANCOM tutorial (Parkinson's mouse), and was wondering whether you could explain why p-min-frequency is 50.

This is from the pd-mice tutorial:
qiime feature-table filter-features
--i-table ./table_2k.qza
--p-min-frequency 50
--p-min-samples 4
--o-filtered-table ./table_2k_abund.qza

I'm having difficulty deciding what my p-min-frequency should be - I also have a minimum frequency per feature of 1, and I know that filtering low count ASVs should be done to limit FDR. Could you please kindly explain what numbers should be taken into consideration when choosing our p-min frequency before running ANCOM?

Thank you!

Best

jwdebelius · March 19, 2020, 9:56pm

Hi @jng,

If you have run Dada2 or Deblur, there is no ASV in your original table which contains less than 10 counts. If you've since filtered your data (for example, to split experiments, discard samples, etc) then this may no longer be true.

I was involved in writing the PD mice tutorial and, to be honest, I can't remember the exact logic behind that specific depth. (I double checked my notes from that period, and... its not recorded :/. Sometimes that's the way it goes in this field: you pick a something and work with it). The mean/total abundance and prevalence co-vary pretty closely so most of the low abundance features will be picked up in your first filtering.

Best,
Justine

jng · March 24, 2020, 7:29am

Thank you very much @jwdebelius for your clarification regarding the specific depth of our p-min frequency. I will keep that strategy in mind when running the rest of my analysis

Best

system · April 24, 2020, 1:29pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.