This might be a silly question, but is it recommended to remove samples that may not be needed prior entering a pipeline? Or should the samples be excluded using the filter-samples plug in?

I want to later use ANCOM for differential analysis, but there are samples from those who did not complete all of the treatments of the experiment. Even though there will be a loss of data, I wanted to exclude those that did not undergo all of the treatments.

However, another question is if it would be possible to retain all of the samples?
Lets say there were 3 Treatments (A,B,C) and 5 individuals who donated a sample prior and after a treatment, but only Treatments A and B had all 5 participants and Treatment C had 3/5 participants. Would it be possible to compare the treatment effect before and after use between all of the treatments? Or must those who did not participate in all 3 treatments must be filtered out (bringing me back to my first question).

there are no silly questions here :smile:

This really comes down to individual circumstances. In general, Iā€™d say that it is frowned upon to remove samples without a good reason. A few good reasons:

  1. Sample does not have enough sequence depth
  2. Sample is contaminated
  3. Sample is a significant outlier and something obviously went wrong

I think you could go either way on this.

If you wanted to run some type of paired sample analysis on subjects pre- and post-treatment, then you will need to drop those samples (actually, that method will drop those samples and issue a warning about which samples were dropped).

Otherwise, you are comparing differences between bulk groups, so in my opinion there is no obligation to drop subjects that were not sampled at both time points.

(however, note that paired sample analysis will probably be much more powerful, as it will account for baseline differences between subjects. In humans this will be substantial, but probably less so in animals, though beware cage effects) :rat:

I hope that helps!


