unassigned ASVs with greengenes2 2022.10 trained for non-v4 region

Hi,

I have a zymo sample that I processed with QIIME2 (2022.2). I trained my classifier using the greengenes2 2022.10 version

qiime feature-classifier extract-reads
--i-sequences .2022.10.backbone.full-length.fna.qza
--p-f-primer CCTAYGGGRBGCASCAG --p-r-primer GGACTACNNGGGTATCTAAT
--p-read-orientation both
--o-reads 2022.10.backbone.targeted.fna.qza
--p-n-jobs 1

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads 2022.10.backbone.targeted.fna.qza
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier 2022.10.backbone.targeted.classifier.qza

When I ran the classifier, all my ASVs are unassigned. However, when I export 2022.10.backbone.targeted.fna.qza into a fasta file , index it as a Blast db and execute blast of the representative sequences vs the 2022.10.backbone.targeted.fna db, I get hits covering the entire query ASV. I also get classifications using the silva database trained for the same region. I looked at the taxonomy of the top blast hits and observed that they share a family or order level, however, I haven't checked all of them. Running the classification using the full length 16S classifier, produced results. is there a way to identify the driving taxa that lead to the classification?

Hi @Alexandra_Bastkowska,

Just to check, do the BLAST hits correspond to the expected region, such that the fwd primer is observed just upstream of the hits?

Which features map to backbone sequences regrettably isn't supported yet by q2-vsearch, see Expose `FeatureMap` data when clustering · Issue #93 · qiime2/q2-vsearch · GitHub

Best,
Daniel

The BLAST db is 2022.10.backbone.targeted.fna.qza, which I exported into fasta format. The representative sequences of my zymo samples completely cover the reference in the db. Below are the top hits (sorry for the format). The taxonomy of the best hit (at genus) corresponds to what I expect from the zymo control.
qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore
899f472bcb20419a21a9af29154f02ea RS-GCF-014229295.1-NZ-JAARZL010000002.1 100.000 429 0 0 1 429 1 429 0.0 793 s__Listeria_A marthii
448dbd891a66d6cbc432a90f68113186 RS-GCF-017582425.1-NZ-JAGFEC010000042.1 100.000 429 0 0 1 429 1 429 0.0 793 s__Pseudomonas_E_650326 aeruginosa_A
f2498d6f9a80f8edbd73769914d8a38a RS-GCF-000245335.1-NZ-JH600280.1 100.000 430 0 0 1 430 1 430 0.0 795 s__Bacillus_P_294101 mojavensis
c4ccade04576f4aae5d5c45583628687 LC054227 98.605 430 6 0 1 430 1 430 0.0 761 s__Fermentibacillus polygoni
64daa1b4744dec744c722df4e5a6a6b1 RS-GCF-014229295.1-NZ-JAARZL010000002.1 99.534 429 2 0 1 429 1 429 0.0 782 s__Listeria_A marthii
1b8e2ed29e52f09d7e4d5e9a8a709184 RS-GCF-014229295.1-NZ-JAARZL010000002.1 99.301 429 3 0 1 429 1 429 0.0 776 s__Listeria_A marthii
8c4f987abe9e0282b34f0ea83e68f941 RS-GCF-000245335.1-NZ-JH600280.1 99.302 430 3 0 1 430 1 430 0.0 778 s__Bacillus_P_294101 mojavensis

The other observation I made is that my project samples have assignments, it is only the zymo sample that fails.

Thanks, @Alexandra_Bastkowska! Sorry, for the delay, I thought I had clicked reply on this message a few days ago

If I read that right, those BLAST results look to start at the first bp give or take of the backbone sequences. The forward primer specified in the seed of this thread is I believe 341F. Is it possible the wrong primer was used in the extract-reads step?

sorry,I just noticed your reply.
The blast db was based on the output of the extracted region:
qiime feature-classifier extract-reads
--i-sequences 2022.10.backbone.full-length.fna.qza
--p-f-primer CCTAYGGGRBGCASCAG --p-r-primer GGACTACNNGGGTATCTAAT
--p-read-orientation both
--o-reads 2022.10.backbone.targeted.fna.qza
--p-n-jobs 1
qiime tools export --input-path 2022.10.backbone.targeted.fna.qza --output-path sequences_targeted
makeblastdb -in sequences_targeted/dna-sequences.fasta -dbtype nucl
blastn -query rep-seqs/dna-sequences.fasta -db sequences_targeted/dna-sequences.fasta -outfmt 6

I did not blast the rep seq against the full-length backbone, but just the subsequence (starting from 341f) that I also used as an input for training the classifier. This way I wanted to check if I match anything at all in the trimmed sequences. Hence, the input for the blast search as well for the classifier training is the same sequence fragment. Hope this clarifies my approach. I am happy to provide my representative sequence file, if that helps

Hi Daniel,

If I execute blast on the full length backbone, this is the result:
899f472bcb20419a21a9af29154f02ea RS-GCF-014229295.1-NZ-JAARZL010000002.1 100.000 429 0 0 1 429 361 789 0.0 793
448dbd891a66d6cbc432a90f68113186 RS-GCF-017582425.1-NZ-JAGFEC010000042.1 100.000 429 0 0 1 429 347 775 0.0 793
f2498d6f9a80f8edbd73769914d8a38a RS-GCF-000245335.1-NZ-JH600280.1 100.000 430 0 0 1 430 360 789 0.0 795
c4ccade04576f4aae5d5c45583628687 LC054227 98.605 430 6 0 1 430 355 784 0.0 761
64daa1b4744dec744c722df4e5a6a6b1 RS-GCF-014229295.1-NZ-JAARZL010000002.1 99.534 429 2 0 1 429 361 789 0.0 782
1b8e2ed29e52f09d7e4d5e9a8a709184 RS-GCF-014229295.1-NZ-JAARZL010000002.1 99.301 429 3 0 1 429 361 789 0.0 776
8c4f987abe9e0282b34f0ea83e68f941 RS-GCF-000245335.1-NZ-JH600280.1 99.302 430 3 0 1 430 360 789 0.0 778
bac7d287f41b28596b47ed2b00ac6b61 RS-GCF-000245335.1-NZ-JH600280.1 99.535 430 2 0 1 430 360 789 0.0 784
f7a6fb6659834998b5f4fcc7d001ba99 RS-GCF-014229295.1-NZ-JAARZL010000002.1 99.767 429 1 0 1 429 361 789 0.0 787

Hi @Alexandra_Bastkowska ,

Could you share these results? As a boxplot QZV or as an output from qiime metadata tabulate?

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.