I'm wondering what the best approach is in long running microbiome experiments. We are following communities in wwt plants over a long periode where we sample regularly. Due to covid we have a substantial backlog with a gap of since march 2020 in the actual analysis (but fortunately not the sampling)
I'm currently running 2021.11 with the 138 release of the silva classifier. The last previous experiments were analyzed with 2020.2 and silva 132. Unfortunately the raw data for early runs are no longer available (long story) but I do have the dada2 output. We only have raw data from june 2020 and later.
We want to see how the community evolves over long periods and plan to add data again for the upcoming year or so.
Should I retrain the silva 132 under 2021.11 and combine with the old samples for which I still have the repseqs, since it is suggested to use the scikit-learn version that comes with the conda env for a specific qiime2 version?
Should I rerun with 2020.2 all together and merge the rep-seqs using the silva 132 version? Or should I forget about 132 and rerun the whole dataset in 2021.11 and silva 138 or whatever comes in the future?
What would be the best approach? Maybe forget about the limited amount of data from early 2020 all together?