Reverse Reads throwing Taxa Histogram off

Hello,

When producing a histogram the output produces a majority categorized as k_Fungai;;
My question is why is there so much uncategorized species and how do you fix this?

Hello @Hmickelson,

Welcome to the forums! :qiime2:

Give us all the details you got!

What database did you use?
What classification method did you use?
What did you expect to be in that sample?

Hi @colinbrislawn

I will try my best!

I used Qiime2 and Unite data base to classify my reads. I was expecting there to be more diversity within the sample instead of the majority being non-classified or no-name. My thesis advisor thinks its due to reverse reads although we have not figured out how to get Qiimme or the classifier to read the sample properly to get away from this problem.

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite-ver8-97-seqs-10.05.2021.qza
--i-reference-taxonomy unite-ver8-97-tax-10.05.2021.qza
--o-classifier unite-ver8-97-classifier-10.05.2021.qza

Thannks
Hollie

1 Like

Thank you for the added detail!

There's a bunch of things that can mess up a database. For example, it was necessary to reformat taxonomy labels within Unite before they would match for Qiime2.

To build and test your database, you could try using the RESCRIPt pipeline or download one of my pre-trained classifiers for the UNITE database.

You could also try a totally different approach!
https://docs.qiime2.org/2023.5/plugins/available/feature-classifier/classify-consensus-vsearch/

Using classify-consensus-vsearch will also solve this problem, as it searches the database in both the forward and reverse directions by default!

1 Like

@colinbrislawn how would you use the classify-consensus-vsearch command? Would you add it here:
qiime feature-classifier classify-sklearn
--i-classifier unite-ver8-97-classifier-10.05.2021.qza
--i-reads rep-seqs-dada2.qza
--o-classification taxonomy-single-end.qza
-- classify-consensus-vsearch
or would it be added here:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite-ver8-97-seqs-10.05.2021.qza
--i-reference-taxonomy unite-ver8-97-tax-10.05.2021.qza
--o-classifier unite-ver8-97-classifier-10.05.2021.qza
-- classify-consensus-vsearch

The command classify-consensus-vsearch replaces classify-sklearn.

View all the options you can use with it here, though the defaults should be good: classify-consensus-vsearch: VSEARCH-based consensus taxonomy classifier — QIIME 2 2023.5.1 documentation

(No 'fit-classifier' step is needed with vsearch because it indexes the database on the fly.)

This is my modified command:
qiime feature-classifier classify-consensus-vsearch
--i-classifier unite-ver8-97-classifier-10.05.2021.qza
--p-threads rep-seqs-dada2.qza
--o-classification taxonomy-single-end.qza

using the pertained classifier from UNITE although I am getting this message.

(1/1?) No such option: --i-classifier

Should I take the --I-classifier out of the command or am I using this wrong?

Hi @Hmickelson, yes, that option is not needed for the vsearch classifier. You can see the docs for this command here (or by calling qiime feature-classifier classify-consensus-vsearch --help) - @colinbrislawn linked to this above.

In place of the classifier artifact, this command takes the FeatureData[Sequence] and FeatureData[Taxonomy] artifacts that you used to train the scikit-learn classifier with qiime feature-classifier fit-classifier-naive-bayes. Your command should look something like:

qiime feature-classifier classify-consensus-vsearch \
    --i-reference-reads unite-ver8-97-seqs-10.05.2021.qza \
    --i-reference-taxonomy unite-ver8-97-tax-10.05.2021.qza \
    --i-query rep-seqs-dada2.qza \
    --o-classification vsearch-taxonomy.qza

Also, which primers are you using for ITS sequencing? It's possible that you're getting non-fungal eukaryote reads in here that can't be classified to taxa in the database (and as a result, as just being assigned to the least specific assignment in the database). You could run qiime feature-table tabulate-seqs on your rep-seqs-dada2.qza. The result there will contain links that you can use to BLAST your sequences against the NCBI nr database. That will give you an idea of whether you have an issue with non-fungal sources in your data, in which case you might just need to filter those out after doing taxonomic assignment.

1 Like

@gregcaporaso I went ahead and worked through my pipeline from scratch and used the above command but I am getting a "command not found: --o-search-results"

I updated to this command:
qiime feature-classifier classify-consensus-vsearch
--i-reference-reads unite-ver8-97-seqs-10.05.2021.qza
--i-reference-taxonomy unite-ver8-97-tax-10.05.2021.qza
--i-query rep-seqs-dada2.qza
--o-classification vsearch-taxonomy.qza
--o-search-results taxonomy-single-end.qza

and it seems to be doing something and not spitting back error messages. Does this look right?

I also do have the command you recommended within my pipeline to filter out any non-fungal reads.

I feel that we are getting close! Thank you for the help

2 Likes

This can happen if in multi-line commands when the slashes are not quite right.

mult-line lLinux commands \
use a slash to go on to the next line, \
then end without a slash
1 Like

@colinbrislawn @gregcaporaso I was able to get the classifiy-consensus-vsearch to work! it significantly lowered the percentage of reads labeled K;;.

Thank you!!!!

2 Likes

I have one more question. Now that I have been able to reduce the sequences labeled K;;, I am now seeing more Unassigned and plant sequences. I am using the unite version 9 all eukaryotes rep seq 216,528 unite data base. I have worked though all of the filters in this tutorial Filtering data — QIIME 2 2017.11.0 documentation but I think I might be using them in the wrong part of the pipeline. They all come back with a green notification but when I open the taxa bar plot I still see them. Any tricks?

Pipeline commands used:

qiime feature-classifier classify-consensus-vsearch
--i-reference-reads unite-ver9-99-seqs-16.10.2022.qza
--i-reference-taxonomy unite-ver9-99-tax-16.10.2022.qza
--i-query rep-seqs-dada2.qza
--o-classification vsearch-taxonomy.qza
--o-search-results taxonomy-single-end.qza

qiime taxa filter-table
--i-table table.qza
--i-taxonomy taxonomy.qza
--p-include p__
--p-exclude mitochondria,chloroplast
--o-filtered-table table-with-phyla-no-mitochondria-no-chloroplast.qza

qiime taxa barplot
--i-table table-dada2.qza
--i-taxonomy vsearch-taxonomy.qza
--m-metadata-file SB2.tsv
--o-visualization taxa-bar-plots.qzv

qiime tools view taxa-bar-plots.qzv

After you run qiime taxa filter-table
the main output file is
table-with-phyla-no-mitochondria-no-chloroplast.qza

I would try using that as the input for the boxplot vis, like this

qiime taxa barplot
  --i-table table-with-phyla-no-mitochondria-no-chloroplast.qza
  --i-taxonomy vsearch-taxonomy.qza
  --m-metadata-file SB2.tsv
  --o-visualization table-with-phyla-no-mitochondria-no-chloroplast-plots.qzv

Here is what Ive been adding.

qiime taxa filter-table
--i-table table-dada2.qza
--i-taxonomy vsearch-taxonomy.qza
--p-mode exact
--p-exclude "Unassigned; k__Viridiplantae"
--o-filtered-table table-no-mitochondria-exact.qza

qiime taxa barplot
--i-table table-no-mitochondria-exact.qza
--i-taxonomy vsearch-taxonomy.qza
--m-metadata-file SB2.tsv
--o-visualization table-no-mitochondria-exact.qza

qiime tools view taxa-bar-plots.qza

all come back with green commands although I am still getting the unassigned and k__Viridiplantae assignments when I want to exclude them. Same with the no chloroplast and mitochondria commands

Would you be willing to post table-no-mitochondria-exact.qza or a screenshot so we can take a look?

Huh! Strange!

I think something is broken...

What happens when you view that .qzv file using https://view.qiime2.org/?

If you think it's safe and appropriate to do so, you could also upload the .qzv file so we can take a look and see what's wrong.

When I use the drag and drop method on all of the files created using the different filters given earlier (ie: exact, no mitochondria, table-with-phyla etc.) I get the same bar plot over and over again.
taxa-bar-plots.qzv (350.3 KB)
table-with-phyla-no-mitochondria-no-chloroplast.qzv (350.3 KB)
table-with-phyla-no-mitochondria-no-chloroplast.qza.qzv (350.3 KB)
table-no-mitochondria.qza.qzv (356.0 KB)
table-no-mitochondria-exact.qza.qzv (356.0 KB)
table-no-chloroplast.qza.qzv (350.3 KB)

1 Like

I think the issue might be with this command:

Instead of using the ; to separate the two labels you want to exclude, you should use a ,. @Hmickelson, can you try again with the following command:

qiime taxa filter-table \
 --i-table table-dada2.qza \
 --i-taxonomy vsearch-taxonomy.qza \
 --p-mode exact \
 --p-exclude "Unassigned,k__Viridiplantae" \
 --o-filtered-table table-no-mitochondria-exact.qza
1 Like

This worked only for removing the Unassigned, the k_Viridiplantae is still present. Getting closer!