how can i get a reference-seqs in closeed-reference clusttering

i want analysis in to group of soil sample
so i amplified DNA in V3 and V4 and started analysis

but i have some metter to cluster DNA

qiime vsearch cluster-features-closed-reference
--i-table derep_table.qza
--i-sequences derep_seqs.qza
--i-reference-sequences sepp-refs-silva-128.qza
--p-perc-identity 0.99
--o-clustered-table table-dn99.qza
--o-clustered-sequences rep-seqs-dn99.qza
--o-unmatched-sequences unmatched-seq-dn99.qza

this code printed error
"Invalid value for '--i-reference-sequences': Expected an artifact of at
least type FeatureData[Sequence]. An artifact of type SeppReferenceDatabase
was provided."

i readed "OTU picking strategies"
if i use non-overlapping amplicon, like V2 and V4 in rRNA, i should use close-reference clustering....

but i dont know what is reference-sequence and how i get it?? i search in forum and google but i cant find this information...
please help me! thank you

Hi @svbreqwaiu01, the sepp-refs-silva-128.qza file is for fragment insertion. If you'd like to perform closed-reference OTU picking follow the instructions here. The reference files (GreenGenes and SILVA Sequence files) to use can be found on the Data resources page.


sorry and very thank you so much for your reply
is it your mean that i can closed-clustering with Marker gene reference databases??

As explained in the linked resources, this is exactly what closed-reference clustering means. You are only retaining those sequences that cluster within / match a given reference database within a defined percent similarity. The following article does a great job comparing some of these approaches:

Callahan, Benjamin J., Paul J. McMurdie, and Susan P. Holmes. 2017. “Exact Sequence Variants Should Replace Operational Taxonomic Units in Marker-Gene Data Analysis.” The ISME Journal 2 (12): 1–5. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis | The ISME Journal


