I'm using QIIME 2 v. 2021.11. I used dada2 denoised-paired for quality and the SILVA database with feature-classifier for taxonomy. My taxonomy artifact had the vsearch citation in its provenance. Is that because of the SILVA database, or am I misunderstanding something else?
Hi @Emily_Sprague ,
You can check the provenance graph to see where this might be introduced.
My guess is that you are using the pre-trained SILVA classifiers or sequences from the QIIME 2 website data-resources. These SILVA-based databases are built using RESCRIPt and include a dereplication step with vsearch (to remove redundant sequences from the database).
This was one motivation behind RESCRIPt: so that the underlying steps used for generating custom reference databases would become visible and hence reproducible by others who want to build their own databases. But I suppose it creates an issue for scraping citations from provenance when re-using previously generated data. So some citation curation may be in order
I did use the silva-138-99-seqs.qza file, so that makes sense. Thanks for explaining it. Another thought for citation curation: my artifacts generated later in the workflow had two or three duplicates for some of the citations, so when I downloaded the Bibtex files into my reference manager I had to do some deleting.
Hi @Emily_Sprague ,
For sure — this is because some underlying packages are key parts of multiple plugins or actions, so when parsing all citations in a provenance tree you get duplicate entries and need to dereplicate.
This plugin might be of interest/use to you here. provenance-replay
has a replay citations
action (mentioned but not shown in this tutorial?) to parse all citations in the provenance tree and dereplicate these:
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.