Run differences in a longitudinal study

Jesus_Marin_Miret · February 22, 2019, 4:51pm

Hello,

I am analyzing a dataset containing samples from two separate sequencing runs. The samples of each run are different from each other (from different time points) and we haven't used mock communities. From what I have read in the forum I know that those are bad strategies for analyzing metagenomes, but I didn't know it beforehand.

I have processed the runs separately, and after dada2 I merge them with qiime feature-table merge. Then I analyze diversity (alpha and beta) with: qiime diversity alpha-group-significance; qiime diversity beta and qiime diversity beta-group-significance. They all give differences between the experimental condition (which is season: Autum-Run1, Spring-Run1 and Summer-Run2).

My problem comes when I look at PCoA plots (qiime emperor plot). The samples are clearly separated by run. I am afraid that runs are affecting my results.

I have tried deleting the second run (Summer) from the analysis to see whether my results would be affected or not. In the end I can still see differences between seasons.

So my questions are:
¿Can I make anything to solve the batch effect problem?
¿Should I analyze the runs separately?
¿Can I ignore the run effect (I got the same result with two runs and with only one run)?

This is the jaccard emperor plot where the run differences are seen in the scatter option.

jaccard_emperor.qzv (756.1 KB)

Thank you beforehand!

Nicholas_Bokulich · February 23, 2019, 12:31am

Hi @Jesus_Marin_Miret,
Thank you for sharing the emperor plot, that makes your problem much clearer.

Honestly, it sounds like the effect you are seeing is probably genuine, not run effect, but you cannot rule that out since you separated seasons by runs.

Nothing except re-sequencing at least a subset of samples. You could resequence a subset of each group — say 10 per group — on a new run to confirm that you replicate the same results for those samples. That would be enough in my mind to trust the original run results.

Yes, that is another option. If your goal is not to test seasonal effects, go for it! But it sounds like season effect is an important part of your experiment.

No definitely not. Just because Spring and Autumn are different does not mean that the Summer difference is not run effect.

I'd say go with the limited re-sequencing approach that I described above. It would result in minimal cost (provided a colleague is doing a sequencing run that can accommodate a small number of additional samples), and would be enough to validate the initial runs IF the new run does not suffer from batch effects of its own.

Good luck!

Jesus_Marin_Miret · February 25, 2019, 1:05pm

Hi @Nicholas_Bokulich, your help is very appreciated. I think we can re-sequence part of the sequences to validate the results.