using older Silva or just latest in long timespan microbiome experiments

fenny · December 10, 2021, 12:46pm

I'm wondering what the best approach is in long running microbiome experiments. We are following communities in wwt plants over a long periode where we sample regularly. Due to covid we have a substantial backlog with a gap of since march 2020 in the actual analysis (but fortunately not the sampling)

I'm currently running 2021.11 with the 138 release of the silva classifier. The last previous experiments were analyzed with 2020.2 and silva 132. Unfortunately the raw data for early runs are no longer available (long story) but I do have the dada2 output. We only have raw data from june 2020 and later.

We want to see how the community evolves over long periods and plan to add data again for the upcoming year or so.

Should I retrain the silva 132 under 2021.11 and combine with the old samples for which I still have the repseqs, since it is suggested to use the scikit-learn version that comes with the conda env for a specific qiime2 version?

Should I rerun with 2020.2 all together and merge the rep-seqs using the silva 132 version? Or should I forget about 132 and rerun the whole dataset in 2021.11 and silva 138 or whatever comes in the future?

What would be the best approach? Maybe forget about the limited amount of data from early 2020 all together?

colinbrislawn · December 13, 2021, 3:10am

Hello @fenny,

Welcome to the forums! :qiime2:

I don't have all the answers, but hopefully I can help assemble this data set. Let's dive in!

I'm glad you and your team were able to continue sampling over this last year.

That is unfortunate, but still workable. Because DADA2 produces stable ASVs, if you proces future runs using the same version of DADA (same qiime2 version, q2-dada2 version, etc) then you can merge the feature tables and merge ASV sequences to get a unified output.

In fact, it's best to run DADA2 separately on each sequencing run, so you are off to a good start!

While the version of Qiime2 should not make a huge difference, reviewer three is going to ask why the database changed.

I think you have outlined your options well. Here is what I would do:

I like this option because it lets you incorporate all your data into a unified output, even the DADA2 results from that first run. When you merge and process at the very end, you get to use the newest databases and pre-trained taxonomic classifiers, along with any new features!
:qiime2:

(Of course, you could also process and merge the data you have now to demo this pipeline and see how the project is going! )

fenny · January 4, 2022, 12:58pm

Hi Colin,

thanks for the lengthy explanation. I tried the merging option and indeed that works perfectly. All the results are now classified against the 138 release.

Thanks, Fenny