Didn't use vsearch but its citation is showing up in my provenance

I'm using QIIME 2 v. 2021.11. I used dada2 denoised-paired for quality and the SILVA database with feature-classifier for taxonomy. My taxonomy artifact had the vsearch citation in its provenance. Is that because of the SILVA database, or am I misunderstanding something else?

1 Like

Hi @Emily_Sprague ,
You can check the provenance graph to see where this might be introduced.

My guess is that you are using the pre-trained SILVA classifiers or sequences from the QIIME 2 website data-resources. These SILVA-based databases are built using RESCRIPt and include a dereplication step with vsearch (to remove redundant sequences from the database).

This was one motivation behind RESCRIPt: so that the underlying steps used for generating custom reference databases would become visible and hence reproducible by others who want to build their own databases. But I suppose it creates an issue for scraping citations from provenance when re-using previously generated data. So some citation curation may be in order :grin:

4 Likes

I did use the silva-138-99-seqs.qza file, so that makes sense. Thanks for explaining it. Another thought for citation curation: my artifacts generated later in the workflow had two or three duplicates for some of the citations, so when I downloaded the Bibtex files into my reference manager I had to do some deleting.

2 Likes

Hi @Emily_Sprague ,

For sure — this is because some underlying packages are key parts of multiple plugins or actions, so when parsing all citations in a provenance tree you get duplicate entries and need to dereplicate.

This plugin might be of interest/use to you here. provenance-replay has a replay citations action (mentioned but not shown in this tutorial?) to parse all citations in the provenance tree and dereplicate these:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.