In theory, yes: if you have a group of samples in both batches which are biologically “the same” (i.e. healthy patients, samples all treated with condition A, etc), then you should be able to use q2-perc-norm for that purpose.
Again, in theory yes you could do this, but you’re right that it probably won’t work. The sequencing controls will likely have very few OTUs in common with the real samples (and if they do, they may be very low abundance), making the percentile normalization meaningless. For example, an OTU which is absent in your sequencing controls but present in your real data will be converted to 1.0 in all your real samples. I also don’t think that you trust the OTU abundances in your negative sequencing controls, as these are (by definition) just noise, right?
Definitely not. The point of the method is to identify a group of samples that can be considered “the same” in both of your batches, use these as an anchor to compare all the other samples to (i.e. use them as the null to normalize all other samples relative to), and then combine the data across groups. If you consider one run cases and the other controls, you not only aren’t identifying a group of samples in both groups that are comparable, but you also no longer have two batches to combine. Does that make sense?
I think you have three options here:
- Just do your analyses and remove any results that could be due to batch. For example, if you’re doing beta diversity analyses, only consider comparisons that are within-batch.
- If you have a subset of samples that are the same condition in both sequencing runs, then use that as your “controls” to normalize against. But that depends on the exact experiment you’re running, if any – it won’t work if all 380 samples are just cross-sectional samples from different people.
- You might be able to use batch correction methods that have been developed for other 'omics data. In our paper, we used ComBat and saw that it worked okay to reduce batch effects. I think it mostly shifts the “mean” of the data (I think it assumes that the data in both batches is distributed the same, e.g. has same variance), so it might work depending on what type of batch effect you have. There are other methods as well, though I’m less familiar with them. @seangibbons might have more to say about this as well!