Primer sequences used for trained Silva119 515F/806R?

Hi,

I was just wondering what primer sequences were used for the trained Silva 119 515F/806R data resource available here

When I look at the primers used in the Training Feature Classifier Tutorial, they differ from the 515F/806R primers I used.

From the tutorial
–p-f-primer GTGCCAGCMGCCGCGGTAA
–p-r-primer GGACTACHVGGGTWTCTAAT \

My primers:

FWD:GTGYCAGCMGCCGCGGTAA; REV:GGACTACNVGGGTWTCTAAT

If it different primers were used to extract the amplicon region out of Silva 119 than my 515F/806R primers, would you recommend I extract the region again using the primer sequences I used for the V4 region? Or do you think it would not make much of a difference…?

Thank you!

colleen

1 Like

Hi @ctekellogg!

I will answer the first part of your question here, and then turn it over to @Nicholas_Bokulich for part two!

We can use provenance to learn about the parameters that were used to create that artifact! If you load up the Silva 119 515F/806R trained classifier at view.qiime2.org:

https://view.qiime2.org/provenance/?src=https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fqiime2-data%2F2017.12%2Fcommon%2Fsilva-119-99-515-806-nb-classifier.qza

Now, navigate to the "Provenance tab" (and click on the box that corresponds to extract_reads):

Over on the right hand side you can see the parameters we used when training this classifier:

f_primer:"GTGCCAGCMGCCGCGGTAA"
r_primer:"GGACTACHVGGGTWTCTAAT"

Pretty cool! Provenance is baked into every file output from QIIME 2!

Now, on to @Nicholas_Bokulich for part 2...

Thanks! :t_rex:

Honestly, I don't expect it will make a big difference. It looks like the difference between your primers and those used for the pre-trained classifier are a single degenerate base near the 5' end of each primer. The extract-reads step is not that sensitive (e.g., looking for an exact match) that this will impact what reads are extracted.

If you do have sufficient memory to train your own classifier, I would suggest doing so just for peace of mind. But if you do not, or run into problems training your own classifier, then the pre-trained classifiers we provide should be fine.

Good luck!

1 Like

@thermokarst, I was wondering if there was some way to see how the files were generated. Obviously I am still figuring out QIIME2 (trying to force myself to do this rather than defaulting to QIIME1.9). This is awesome. Thank you!

Collleen

1 Like

Hi @Nicholas_Bokulich, thanks for helping me think about this a bit more. Perhaps I'll give the training the classifier myself a try, and otherwise I'll use the pre-trained classifier, hoping that what you suggest

The extract-reads step is not that sensitive (e.g., looking for an exact match) that this will impact what reads are extracted

is indeed the case.

Thanks!
colleen

1 Like

Hi @thermokarst, I have one more quick question. How can i figure out what source files are used for the classifier if all i have is the MD5?

For example:

For example, I am just trying to confirm which Silva reference_reads and reference_taxonomy files were used, so I can use the same ones (or equivalent from newer Silva versions). For example was it the rep_set/99/Silva_119_rep_set99.fna (and associated taxonomy)...?

Thanks again for your quick responses!
Colleen

Hi @ctekellogg --- Click on the boxes above your currently select ones - they should correspond to the import steps for the input artifacts used to fit the classifier:

The md5 sum is over to the right: a86c94ce8d58ea9154fb88b05c123b02, as well as the name of the imported file. You can compute the md5sum of the file in question and compare the hashes to verify if that is the same file.

Also worth noting, Silva 119 is the latest version of Silva that is easily imported into QIIME 2, see this thread for more details:

Thanks! :t_rex:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.