I'm exploring a method to more fully annotate organelle-derived sequences in 16S datasets, and ran ANCOMBC on a dataset before and after filtering a large proportion of mitochondrial sequences.
The results are exactly the same - the spreadsheets underlying the barplot data pre- and post-filtering are identical.
How reasonable is this? Intuitively, it seems that if I'm truly removing exactly what I'm targeting, when I take a chunk of features out of each sample, the proportions of the other features will all increase together, so it makes sense to me why the results wouldn't change. On the other hand, the ANCOMBC data are calculated out to like a dozen decimal places, and every single cell is identical, which is a little spooky.
Hi @dylan,
I looked at your provenance and I am alittle confused. It looks like the tables went through similar steps.
2 tables were merged together
filtered by "d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia;o__Chloroplast,d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Mitochondria"
filtered by metadata.
ran ancombc
according to your Artifact UUIDs, these are not the same files but I am not sure that your filtering was only applied to one of the artifacts.
Sorry, this statement was misleading. They’re both filtered, one is just more filtered. The idea is that the annotation step is much better at tagging organelles. A lot more sequences get tagged (and thus filtered) with the modified annotation scheme vs the basic one.
Can you elaborate on how one was filtered more? According to the provenance the exact same taxa filtering command was applied to both tables. Was this filtering step not applied at the qiime taxa filter table step?
I used different references for the annotation method (VSEARCH here) in each workflow. This resulted in more sequences being annotated as "d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Mitochondria" in one taxonomy artifact compared to the other (those sequences were mostly annotated as "Unassigned" in the other). Thus the taxonomic filtering step removed a substantially different number of sequences between workflows despite the specific filtering command being identical.
Could you try running ancombc without this filtering step? Logically we should see a different result that way. I think this would be a good sanity check.
I'm happy to, running it now ... but would we in fact expect a different result? Isn't 0% filtered just another point on the line with, say, 1% filtered and 5% filtered?
[edit]
Interesting, the completely unfiltered barplots seem to show the same results, but there are different underlying data.
You are right we didn't really see a different results? I guess the take-away is that these microbes that you are filtering is not altering the composition at all.