I hope you are well.
I have many samples from different batches of sequencing that arrive at different times (usually once a month). These are paired-end Illumina MiSeq data (2 x 300 bp).
I remember I read somewhere on this forum, that ideally I should analyze all of the samples together, or failing that, separately (per batch) but with the same set of parameters. So I did that.
However, recently I noticed that the set of parameters that worked well for my first 3 batches does not work well for my last batch. This seems to be caused by difference in the quality of the raw reads.
For example, with the first 3 batches, using the same parameters, the majority of the reads were retained after filtering. But with the last batch I lost a lot of reads after filtering (in some cases, I lost 85% of the reads). And this seems to cause problems in the downstream analyses.
Ideally, I would like to analyze each batch separately, without having to analyze all the samples together at the same time, because it would take too much computational resources and time (I have hundreds of samples in a batch).
May I ask for your advice on this please?
Do you think sticking with the same set of parameters for each batch is a good way to do this?
But in case the quality of the raw reads in a batch differ substantially that they produce a bad result with the same set of parameters (like what I’m facing right now), what should I do?
Thank you so much for your generous time and help!