SILVA 97% classifier

David_Bradshaw · July 17, 2018, 9:46pm

Dear Whom It May Concern,

I have run into a problem with the SILVA 97% classifier in that the taxa bar plots shows that most of the OTUs are clustered to just Bacteria;;;;;_. Is this a problem with my classifier or just that the data cannot be defined? I am using OTU clustering because I have read that you need more sequencing depth than I have (10x) to really be able to use ESV/Deblur. You have trained classifiers at 99% using SILVA could you train one at 97% and post it or send it to me just so that I can see if I have done it correctly? I am unsure if this is an okay request. Thank you for your time and help.

Sincerely,

David Bradshaw

Nicholas_Bokulich · July 18, 2018, 4:21pm

Kingdom-level classification is almost always because the wrong classifier was used.

If you are using one of the pre-trained classifiers, make sure you are using the full-length SILVA classifier, not the 515f-806r classifier if your reads are not from those primers.

I am unfamiliar with that advice but may be misunderstanding. In any case, it is irrelevant — the classifier does not know and does not care where the sequence came from. It will classify it either way.

Sounds like you may be mixing up classification with OTU clustering, or assuming that they are somehow connected. Just because you did 97% OTU clustering does not mean that you need to use a classifier trained on the 97% OTUs. Use one of the pre-trained 99% SILVA classifiers and it will save you much trouble.

I hope that helps!

David_Bradshaw · July 18, 2018, 6:48pm

Dear Dr. Nicholas Bokulich,

Thank you for your help. I used the Earth Microbiome Project modified 515f=806r primers, would using the 515f-806r classifier still be okay then?

Honestly my PIs believe that trying to do ESV via Deblur or DADA2 may be trying to get too much out of my data (water and sediments). I guess that what I mean by sequencing depth. With higher depth we can be more sure that a variation is real and not due to PCR error. I hope our reasoning is correct?

Okay thank you very much for the clarification, I figured that since SILVA gives out the various clustering percents that that would have to match the classifier.

Sincerely,

David Bradshaw

Nicholas_Bokulich · July 18, 2018, 7:03pm

Absolutely! Sounds like this one is right for you.

You can never get too much out of your data.

That's sort of the point of these denoising methods. With OTU picking you still have the issue with low read depth and spurious OTUs — stringent filtering should be used that is not needed for denoising methods (they still toss singletons). I'm not sure I understand/agree with your rationale.

It's not a bad assumption, since so much else about OTU picking is finicky! But no, you do not need these to match.

Good luck!

David_Bradshaw · July 19, 2018, 4:40pm

Dear Dr. Nicholas Bokulich,

Thank you very much for the information and advice, it has been very helpful. I honestly need to talk to my PIs more about using OTU vs Deblur/DADA2, was planning on doing both versions to compare anyways. I imagine your opinion is that there is no reason to not use the exact sequence variant techniques? Is there any reason to use Deblur vs DADA2?

Yes I have been using the filtering to account for that per the paper you cited. using the following script (I currently have about 9.08 million sequences for my data (392 samples) after quality filtering and chimera removal).

qiime feature-table filter-features --i-table '/home/microbiology/QIIME2_1/table-nonchimeric-dn-97.qza' --o-filtered-table table-nc-dn-97-0.005.qza --p-min-frequency 494

This is what the paper is referring to correct? I am removing any OTUs that does not have at least 0.005% of the total number of sequences.

Thank you for your time and help.

Sincerely,

David Bradshaw

Nicholas_Bokulich · July 19, 2018, 4:51pm

it's always good to compare (though note that unless if you know what your samples actually contain, e.g., you are testing on a mock community, your choice will be based on whichever method makes you feel better, not which is necessarily more accurate)

"no reason to not use" ≠ "reason to use"

I feel there is reason to use ESV methods. The results I have seen — both in the publications for deblur and dada2 but also in my own benchmarks on mock communities — show that these methods perform better than OTU picking at weeding out noise.

This is not to say that OTU clustering methods cannot be used — and indeed they have their place, e.g., for sequencing technologies that are incompatible with ESV methods; and when clustering is explicitly intended against a reference set (in which case personally I would denoise then cluster).

Correct, you are using the correct command and min frequency.