I am working on a project where a part of the data was already analyzed before using qiime2 version 2020.8 the pre-trained full 16S classifier. Are the results of this analysis comparable to results that would be generated with the current quiime2 2021.02 version and pre-trained classifier?
Are there any problems that could occur if the results from different versions are used together in downstream analyses?
Those two specific classifiers — 2020.8 and 2021.02 — should be comparable, assuming that the underlying methodology and database did not change, and the database is trained to the same region (and you have already indicated that both were full-length 16S). If you are using the greengenes or SILVA pre-trained classifiers from the QIIME 2 data resources site then you should be in the clear. If you are using a different database then I cannot guarantee.
In general, we do some validation each time new classifiers are released to ensure that the same results are being retrieved. Changes can occur, however. Two reasons this might occur:
When a new database version is released. SILVA releases a new version every year or two. Greengenes has not been updated in a while.
More rarely, when the methodology changes for preparing the databases. Starting with the 2020.6 release we started using RESCRIPt to process, filter, and format the SILVA database released on the QIIME 2 website. We also use RESCRIPt to train and evaluate the SILVA and greengenes databases released there, but this does not impact the results since it uses the same method for classifier training.
In your case, both 2020.8 and 2021.2 were post-RESCRIPt so the methodology for processing the SILVA database has not changed… if you are using SILVA you might want to check the version number though to make sure that you are using the same SILVA version (presumably 138).
And finally, if ever in doubt, you could just run a little test to ensure that the classifiers are yielding the same result. Any sequences will do… just classify with both, and then you can use q2-quality-control evaluate-taxonomy to compare the results… an F-score of 1.0 would indicate that the exact same classification is being made for each query sequence.