Hello,
While analyzing 16S sequences, the taxa bar plot gave good numbers of taxonomic groups for my water samples.
But for my 18S sequences, It gave only two taxa. I am not sure, what I am missing. Can anyone please help?
Sequencing was done using a v3 600 cycle sequencing kit to produce 300 bp paired-end reads.
.F –TTGTACACACCGCCC and R - CCTTCYGCAGGTTCACCTAC are the primers used for PCR amplification for the 18S r RNA region.
I did the denoising :
qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trunc-len-f 209
--p-trunc-len-r 174
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
I used silva-138-99-nb-classifier and while checking using shasum -a 256..., It gave c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616.
I used this to generate a bar plot:
qiime taxa barplot
--i-table table-no-singletons.qza
--i-taxonomy taxonomy.qza
--m-metadata-file metadata.tsv
--o-visualization taxa-bar-plots.qzv.
The Silva database includes both 16S and 18S reads as stated by @colinbrislawn in the post linked below, so you should be able to train an 18S classifier using that database.
However, should you wish to use an alternative to Silva, there is a database called Protist Ribosomal Reference database, which is a SSU rRNA gene database.
Hello,
I downloaded SILVA_132_QIIME_release. Under this, I found subfolders rep set and taxonomy. I choose 18S only from those subfolders. Both of the folders have further subfolders like 90,94,97,99. I choose 99 for both. One has silva_132_99_18S file in fna format and another taxonomy folders has taxonomy 7 levels, taxonomy all levels, raw taxonomy, consensus taxonomy 7 levels as a files. I choose taxonomy 7 levels.
After this, I did
qiime tools import
--type 'FeatureData[Sequence]'
--input-path silva_132_99_18S.fna
--output-path silva-132-18S-rep-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-132-18S-rep-seqs.qza
--i-reference-taxonomy silva-132-18S-taxonomy.qza
--o-classifier silva-132-18S-classifier.qza
Is this a appropriate steps to train the 18S classifier? I just want to make sure , I am on a good track.
Based on your taxonomy plot It appears that your sequences are in the reverse orientation. Which means taxonomy assignment will not work when using qiime feature-classifier fit-classifier-naive-bayes, as the sequences must match the orientation of the reference database. I recommend orienting your sequences using RESCRIPt as follows:
Then use the premade silva classifier on rep-seqs-orient.qza, and see what you get.
Alternatively, try using qiime feature-classifier classify-consensus-vsearch ... , as orientation of your sequences will not matter, and you should be able to obtain taxonomy.
Also when I choose "majority_taxonomy_all_levels" as input path taxonomy, I got this barplot. One taxonomic path has 7 taxa levels and another one has up to 14 taxa levels. Which one is considered the best to use and interpret? taxa-bar-plots.qzv (437.6 KB)
The differences in ranks between 132 and later versions has to do with our improved parsing. I would not use 132 anyway as it is old. For a historical point of view for the 14 ranks please read this post. Many of the recent versions of QIIME 2 make use of RESCRIPt to prepare SILVA. You can also do the same yourself as outlined here.
I am not sure why one of your taxonomy files has truncated taxonomy.
I'd stick with later versions of QIIME 2 and SILVA (138.1). Again, you can curate the SILVA database yourself too.
I do not think these are the correct files. I do not see where you ran rescript orient-seqs in your provenance.
Hello,
After orientation my sequences using
qiime rescript orient-seqs
--i-sequences rep-seqs.qza
--p-threads 8
--o-oriented-seqs rep-seqs-orient.qza
--o-unmatched-seqs rep-seqs-non-orient.qza
And then, I used qiime feature classifier using silva-138-99-nb-classifier.qza.
qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs-orient.qza
--o-classification taxonomy.qza
Here is the resulting barplot from the above step. taxa-bar-plots.qzv (439.5 KB)
One important question,
Is there pre pre-made classifier for 18S as it is available for 16S?
For 16S, I believe this is pre-made classifier which is available in https://resources.qiime2.org/
I want to document the 16S and 18S communities in my water sample. Amplifications were done on the 18S rRNA V9 region. The primers were 18S (F – TTGTACACACCGCCC and R - CCTTCYGCAGGTTCACCTAC).
SILVA contains all the small subunit (SSU) sequence data (i.e. 16S & 18S), they are homologous. But go ahead and try that classifier. If that does not work out too well then try making your own classifier via RESCRIPt, as there is a chance that the curation for the premade classifier might be too aggressive (discarding to many eukaryotic reference sequences).
My concern is that you are sequencing the V9 region. Many do not sequence the tail ends of the rRNA gene, so there might not be as many reference sequences that cover this area for optimal classification compared to other regions. But I could be wrong... give it a try.
For 16S, it gives quite good taxonomic classification like this but I am not sure why not getting good taxonomy for 18S?
But after using
qiime rescript orient-seqs
--i-sequences rep-seqs.qza
--p-threads 8
--o-oriented-seqs rep-seqs-orient.qza
--o-unmatched-seqs rep-seqs-non-orient.qza
And then, I used qiime feature classifier using silva-138-99-nb-classifier.qza.
I am still having confusions taxa-bar-plots.qzv (1.7 MB) taxa-bar-plots.qzv (422.8 KB) taxa-bar-plots.qzv (439.5 KB)
. Can I use this same pre made classifier for 16S and 18S taxonomic classification? I am so sorry for repeating same situation again. I really appreciate your time.
I think the example premade classifiers might be too aggressive in their filtering. That is, any eukaryotic sequence less than 1400 bp is removed. This may be removing too many valid reference sequences. I am thinking that you should make your own classifier as outlined here. Then let us know how it works.
Hello @SoilRotifer
Thank you so much for your time and suggestions.
I tried making my own classfier. I exactly followed the steps in mentioned in the tutorial regarding processing, filtering, and evaluating the SILVA database (138.1).
I filtered the sequences as usual in the post ( Archaea, Bacteria, and Eukaryota: 900,1200, and 1400) and then dereplicated in uniq mode. Then I extracted the reads using my forward and reverse primer for 18S followed by dereplication of the extracted region. I followed naive-bayes to get the classifier.
I got this barplot.
The size of the classifier I got is 7.41 Mb. Is it ok to have a small-sized classifier? The size of pre-made classifiers seems to be bulky. I have attached my classifier along with a bar plot and my representative sequence. Does this classifier look good?
Just to clarify, I was not suggesting that you follow the SILVA example tutorial completely, but skip several steps as outlined in the other link. That is do not perform the sequence length filtering, etc...
I recommended this because there are quite a few eukaryotic sequences that are short (i.e. amplicon length) that are likely important for you, but these will be removed with the length trimming. So only run the four steps I mentioned, then you can perform your amplicon extraction for an amplicon specific classifier, (you could just use the full length classifier too).
Again, the RESCRIPt tutorial is not a standard operating procedure, but just provides many examples of what you can do when curating your reference database.