Strange results based on taxa-bar-plots.qzv. Classifier problem?

Based on taxa-bar-plots.qzv I see that I am having many taxonomic classified up to only domain level (see the graph below). This is something unexpected for me. I thought that it has to do something with the classifier but while reading the QIIME2 forum it looks like I did everything correct (see the scripts below). Do you know what might be the reason it?

image

qiime tools import
--type 'FeatureData[Sequence]'
--input-path SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna
--output-path SILVA_132.qza

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_7_levels.txt
--output-path SILVA_132_ref-taxonomy.qza
qiime feature-classifier extract-reads
--i-sequences SILVA_132.qza
--p-f-primer GTTYGATYMTGGCTCAG
--p-r-primer GCWGCCTCCCGTAGGWGT
--o-reads ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads ref-seqs.qza
--i-reference-taxonomy SILVA_132_ref-taxonomy.qza
--o-classifier classifier.qza

I am using QIIME2 2019.4 installed by miniconda.

Hi @Joanna,

Welcome to the forum!

It looks like there may be an issue with your data, rather than your scripts, possibly either in the database or sequences you’re using. Can you explain a bit more of your up-stream pipeline to help diagnose the problem. What region are you sequencing? What kind of sequencing, importing, and denosing/clustering did you use?

Best,
Justine

Hi Justine,

Thank you for your answer.

I targeted the V1-V2 region of 16S and the samples were sequenced in 4 runs using MiSeq (2x300bp). First, I imported the multiplexed sequences. Further, I used “qiime cutadapt demux-paired” for demultiplexing followed by trimming the adapters, “qiime dada2 denoise-paired” and I merged the denoised data. Below you can find the scripts which I used:

qiime tools import
–type MultiplexedPairedEndBarcodeInSequence
–input-path L4_fastq
–output-path multiplexed-seqs_L4.qza

qiime cutadapt demux-paired --i-seqs multiplexed-seqs_L4.qza --m-forward-barcodes-file mapping_L4.txt --m-forward-barcodes-column BarcodeSequence --o-per-sample-sequences demultiplexed-seqs_L4.qza --o-untrimmed-sequences untrimmed_L4.qza --verbose

qiime cutadapt trim-paired --i-demultiplexed-sequences demultiplexed-seqs_L4.qza --p-front-f GTTYGATYMTGGCTCAG --p-front-f GCWGCCWCCCGTAGGWGT --p-front-r CTGAGCCAKRATCRAAC --p-front-r ACWCCTACGGGWGGCWGC --p-front-r GCWGCCWCCCGTAGGWGT --o-trimmed-sequences trimmed-seqs_L4.qza --verbose

qiime dada2 denoise-paired
–i-demultiplexed-seqs trimmed-seqs_L1.qza
–p-trim-left-f 0
–p-trim-left-r 0
–p-trunc-len-f 255
–p-trunc-len-r 224
–o-table table_L1.qza
–o-representative-sequences rep-seqs_L1.qza
–o-denoising-stats denoising-stats_L1.qza

For merging the data I used: qiime feature-table merge & qiime feature-table merge-seqs

Please if you need more information let me know.

Best,
Joanna

Hi @Joanna,

Thanks! That sounds fairly normal. Do you have reasonable sequencing depth?

Could you also try running the full length Silva classifier on the resources page to see if it gives you what you expect? It’s not specific to a region, but it might help narrow down if the problem is because of your classifer or your data.

Best,
Justine

Hi Justine,

Do you have reasonable sequencing depth?
I think so.

image

Could you also try running the full length Silva classifier on the resources page to see if it gives you what you expect?
I got the same results as in the case of only_16S.

Best,
Joanna

This table summary looks okay to me. Can you read through this post (and the linked posts there, too)? Some samples are classified with only kingdom-level

Also, have you tried using a full-length classifier? If so, how does that compare?

Hi Matthew,

Thanks for your help. I really appreciate it.

Based on the suggested posts and this one I tried to use a full-length classifier (by skipping qiime feature-classifier extract-reads) and I got the same results:

image

I also tried different databases, GG vs. SILVA, still the same strange results.
I tried SILVA_97 vs SILVA_99, still the same.
I tried pre-trained classifier vs. my-trained classifier, the same results.

Some time ago I run qiime1 with the same datasets and I didn't observe such results. That is why I thought it has to do something with the classification.

Just to check I chose randomly one unclassified sequence and one Bacteria__ sequence and I put them into BLAST and SILVA_db, and there I am getting a taxonomic classification.
E.g. Here are the BLAST results for Bacteria__:
image
and here for SILVA:
image

So I wonder, why I do not see any assigned classification when using qiime2. Any idea?

Many thanks,
Joanna

presumably you followed different steps in qiime1 so there are vaster pipeline differences that could explain this.

Can you exclude uncultured hits from your BLAST results? Notably, those results show uncultured organisms, which are not particularly useful for diagnosing this.

Your SILVA results are a better indicator. This leads me to suggest a different approach:

How about you try classifying with classify-consensus-vsearch instead? This will mirror what you did with the SILVA webtool but for everything. I suspect what may be going on is your reads are in mixed orientations — do you happen to know if your reads are in mixed orientations? — the sklearn classifier gets confounded (because the classifier is trained on the reads as they exist in the reference database, which usually occur in a single direction), but the vsearch classifier works just fine for mixed orientation reads.

Please give that a try and let me know how it goes!

1 Like

Dear Nicholas,

You are right, I might have the reads in mixed orientations and the classification with classify-consensus-vsearch helped to solve the problem.

image

Thank you all for your help!

Best,
Joanna

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.