Hi all!
I was hoping you guys could help me out. This is my first time working with 16S samples. I am trying to train a classifier using the SILVA 132 database, therefore I need a reference sequence file and a taxonomy file. However, when I look in the SILVA_132_QIIME_release file, I see that there are many options that I could select.
If I look in the taxonomy/16S_only directory, there are directories which, what I assume, correspond to the percentages the database OTUs are clustered at. Then within those directories (let’s look at the 99 directory as an example), there are 7 different taxonomy files. My question is how would I know which taxonomy file to use?
I have a similar issue when trying to choose a reference sequence file. Looking in the SILVA_132_QIIME_release, based on what I am seeing, there are two directories to choose from: rep_set and rep_set_aligned. How would I know which directory to look into to ensure I am using the proper reference sequence file?
Last question: how would I know which percentage clustering to select? I would assume 99% clustering would give the most accurate identification, what reason would I not use the files in the respective 99 directories?
I am running QIIME2-2020.11 which was installed using conda.
Essentially, I am looking for what file to use to fill in the bolded code here:
qiime tools import
–type ‘FeatureData[Sequence]’
–input-path INSERT_REF_SEQ_FILE.fasta
–output-path ref_seq.qza
qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path INSERT_TAXONOMY_FILE.txt
–output-path ref-taxonomy.qza
Thank you in advance for your support!