Only 2 two taxa from 18S taxa barplot

Namraj_Jaishi · July 20, 2024, 4:30pm

Hello,
While analyzing 16S sequences, the taxa bar plot gave good numbers of taxonomic groups for my water samples.
But for my 18S sequences, It gave only two taxa. I am not sure, what I am missing. Can anyone please help?
Sequencing was done using a v3 600 cycle sequencing kit to produce 300 bp paired-end reads.
.F –TTGTACACACCGCCC and R - CCTTCYGCAGGTTCACCTAC are the primers used for PCR amplification for the 18S r RNA region.
I did the denoising :
qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trunc-len-f 209
--p-trunc-len-r 174
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

I used silva-138-99-nb-classifier and while checking using shasum -a 256..., It gave c08a1aa4d56b449b511f7215543a43249ae9c54b57491428a7e5548a62613616.
I used this to generate a bar plot:
qiime taxa barplot
--i-table table-no-singletons.qza
--i-taxonomy taxonomy.qza
--m-metadata-file metadata.tsv
--o-visualization taxa-bar-plots.qzv.

Here are the QZV files:
denoising-stats209174.qzv (1.2 MB)
paired-end-demux.qzv (321.5 KB)
taxa-bar-plots.qzv (422.8 KB)

Mike_Stevenson · July 22, 2024, 10:30am

Hi @Namraj_Jaishi

The Silva database includes both 16S and 18S reads as stated by @colinbrislawn in the post linked below, so you should be able to train an 18S classifier using that database.

Qiime2 18S workflow

However, should you wish to use an alternative to Silva, there is a database called Protist Ribosomal Reference database, which is a SSU rRNA gene database.

18S reference dataset

I hope that helps!

Namraj_Jaishi · July 23, 2024, 3:30pm

Hello,
I downloaded SILVA_132_QIIME_release. Under this, I found subfolders rep set and taxonomy. I choose 18S only from those subfolders. Both of the folders have further subfolders like 90,94,97,99. I choose 99 for both. One has silva_132_99_18S file in fna format and another taxonomy folders has taxonomy 7 levels, taxonomy all levels, raw taxonomy, consensus taxonomy 7 levels as a files. I choose taxonomy 7 levels.
After this, I did
qiime tools import
--type 'FeatureData[Sequence]'
--input-path silva_132_99_18S.fna
--output-path silva-132-18S-rep-seqs.qza

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path taxonomy_7_levels.tsv
--output-path silva-132-18S-taxonomy.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-132-18S-rep-seqs.qza
--i-reference-taxonomy silva-132-18S-taxonomy.qza
--o-classifier silva-132-18S-classifier.qza
Is this a appropriate steps to train the 18S classifier? I just want to make sure , I am on a good track.

Best,
Namraj

SoilRotifer · July 23, 2024, 7:32pm

Hi @Namraj_Jaishi,

Based on your taxonomy plot It appears that your sequences are in the reverse orientation. Which means taxonomy assignment will not work when using qiime feature-classifier fit-classifier-naive-bayes, as the sequences must match the orientation of the reference database. I recommend orienting your sequences using RESCRIPt as follows:

qiime rescript orient-seqs \
    --i-sequences  rep-seqs.qza \
    --p-threads 8 \
    --o-oriented-seqs   rep-seqs-orient.qza \
    --o-unmatched-seqs   rep-seqs-non-orient.qza

Then use the premade silva classifier on rep-seqs-orient.qza, and see what you get.

Alternatively, try using qiime feature-classifier classify-consensus-vsearch ... , as orientation of your sequences will not matter, and you should be able to obtain taxonomy.

Let us know if either works.

-Mike

Namraj_Jaishi · July 23, 2024, 7:45pm

@SoilRotifer ,
When I run those steps. I got this barplot.
taxa-bar-plots.qzv (362.9 KB)

Also when I choose "majority_taxonomy_all_levels" as input path taxonomy, I got this barplot. One taxonomic path has 7 taxa levels and another one has up to 14 taxa levels. Which one is considered the best to use and interpret?
taxa-bar-plots.qzv (437.6 KB)

Namraj_Jaishi · July 23, 2024, 7:46pm

I will try this one and keep posted.
Thank you.

SoilRotifer · July 23, 2024, 8:54pm

The differences in ranks between 132 and later versions has to do with our improved parsing. I would not use 132 anyway as it is old. For a historical point of view for the 14 ranks please read this post. Many of the recent versions of QIIME 2 make use of RESCRIPt to prepare SILVA. You can also do the same yourself as outlined here.

I am not sure why one of your taxonomy files has truncated taxonomy.

I'd stick with later versions of QIIME 2 and SILVA (138.1). Again, you can curate the SILVA database yourself too.

I do not think these are the correct files. I do not see where you ran rescript orient-seqs in your provenance.

Namraj_Jaishi · July 24, 2024, 7:12pm

Hello,
After orientation my sequences using
qiime rescript orient-seqs
--i-sequences rep-seqs.qza
--p-threads 8
--o-oriented-seqs rep-seqs-orient.qza
--o-unmatched-seqs rep-seqs-non-orient.qza
And then, I used qiime feature classifier using silva-138-99-nb-classifier.qza.

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs-orient.qza
--o-classification taxonomy.qza
Here is the resulting barplot from the above step.
taxa-bar-plots.qzv (439.5 KB)

SoilRotifer · July 24, 2024, 7:44pm

Can you run the taxonomy assignment using the SILVA 138.1 using the sequences as they are, that is, without using orient-seqs?

Also, what gene / amplicon region are you targeting? What eukaryotes are you expecting?

Namraj_Jaishi · July 25, 2024, 1:24pm

One important question,
Is there pre pre-made classifier for 18S as it is available for 16S?
For 16S, I believe this is pre-made classifier which is available in https://resources.qiime2.org/

I want to document the 16S and 18S communities in my water sample. Amplifications were done on the 18S rRNA V9 region. The primers were 18S (F – TTGTACACACCGCCC and R - CCTTCYGCAGGTTCACCTAC).

SoilRotifer · July 25, 2024, 2:48pm

Hi @Namraj_Jaishi,

SILVA contains all the small subunit (SSU) sequence data (i.e. 16S & 18S), they are homologous. But go ahead and try that classifier. If that does not work out too well then try making your own classifier via RESCRIPt, as there is a chance that the curation for the premade classifier might be too aggressive (discarding to many eukaryotic reference sequences).

My concern is that you are sequencing the V9 region. Many do not sequence the tail ends of the rRNA gene, so there might not be as many reference sequences that cover this area for optimal classification compared to other regions. But I could be wrong... give it a try.

Namraj_Jaishi · July 26, 2024, 8:01pm

When I tried the classifier ( silva-138-99-nb-classifier.qza) for 18 S, it only assigned bacteria and unassigned like this

I

For 16S, it gives quite good taxonomic classification like this but I am not sure why not getting good taxonomy for 18S?

But after using
qiime rescript orient-seqs
--i-sequences rep-seqs.qza
--p-threads 8
--o-oriented-seqs rep-seqs-orient.qza
--o-unmatched-seqs rep-seqs-non-orient.qza
And then, I used qiime feature classifier using silva-138-99-nb-classifier.qza.

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs-orient.qza
--o-classification taxonomy.qza

I am still having confusions
taxa-bar-plots.qzv (1.7 MB)
taxa-bar-plots.qzv (422.8 KB)
taxa-bar-plots.qzv (439.5 KB)
. Can I use this same pre made classifier for 16S and 18S taxonomic classification? I am so sorry for repeating same situation again. I really appreciate your time.

Best,
Namraj

SoilRotifer · July 26, 2024, 9:35pm

HI @Namraj_Jaishi ,

I think the example premade classifiers might be too aggressive in their filtering. That is, any eukaryotic sequence less than 1400 bp is removed. This may be removing too many valid reference sequences. I am thinking that you should make your own classifier as outlined here. Then let us know how it works.

Namraj_Jaishi · July 27, 2024, 9:41am

Hello @SoilRotifer
Thank you so much for your time and suggestions.
I tried making my own classfier. I exactly followed the steps in mentioned in the tutorial regarding processing, filtering, and evaluating the SILVA database (138.1).
I filtered the sequences as usual in the post ( Archaea, Bacteria, and Eukaryota: 900,1200, and 1400) and then dereplicated in uniq mode. Then I extracted the reads using my forward and reverse primer for 18S followed by dereplication of the extracted region. I followed naive-bayes to get the classifier.
I got this barplot.
The size of the classifier I got is 7.41 Mb. Is it ok to have a small-sized classifier? The size of pre-made classifiers seems to be bulky. I have attached my classifier along with a bar plot and my representative sequence. Does this classifier look good?

rep-seqs.qza (395.0 KB)
taxa-bar-plots.qzv (1.7 MB)

silva-138.1-ssu-nr99-515f-806r-classifier.qza (7.4 MB).

Once again, thank you so much.

Best,
Namraj

SoilRotifer · July 28, 2024, 9:14pm

Hi @Namraj_Jaishi,

The classifications look much better!

Just to clarify, I was not suggesting that you follow the SILVA example tutorial completely, but skip several steps as outlined in the other link. That is do not perform the sequence length filtering, etc...

I recommended this because there are quite a few eukaryotic sequences that are short (i.e. amplicon length) that are likely important for you, but these will be removed with the length trimming. So only run the four steps I mentioned, then you can perform your amplicon extraction for an amplicon specific classifier, (you could just use the full length classifier too).

Again, the RESCRIPt tutorial is not a standard operating procedure, but just provides many examples of what you can do when curating your reference database.

But your taxonomy looks much improved!

Namraj_Jaishi · August 27, 2024, 7:21pm

Hello
Mike, Greetings

Thank you so much for your help. I really appreciate it.

Best,
Namraj

system · September 28, 2024, 1:22am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.