Kingdom-level classification of fungal ITS sequences

bsen2018 · January 15, 2018, 9:22am

Hi @Nicholas_Bokulich,
I am in a similar problem with the fungal sequences, getting only up to kingdom and phyla level classification. Thus, in continuation to the earlier threads, I would like to request your help on the further steps once we have the output from exlude-seqs command so that I can obtain the taxonomy.qza file containing the feature IDs that matches the perc-identity (0.8 for e.g.) set in exclude-seqs step. I got stuck with the output of exclude.seqs command. I should train the classifier after the exlude-seqs step or before? Basically, I need the taxonomy file of featureIDs with 0.8/0.9 perc-identity with UNITE database seqs. It will be great to see the commands I should follow. Please help. Thanks!

Nicholas_Bokulich · January 16, 2018, 5:18pm

Hi @bsen2018,

Please check out this thread. The issue may be how you are training your classifier, and how your sequences have been processed. A very common issue with ITS classification is if sequencing adapters are still present in the reads.

The reference sequences are not being filtered out, so this does not affect the classifier training step. You only want to filter out query sequences with exclude-seqs.

This tutorial covers the commands that you want. exclude-seqs outputs both a hits.qza and misses.qza file... you will want to use the "hits".

Another option would be to not use exclude-seqs, and instead use the classify-consensus-blast or classify-consensus-vsearch methods in q2-feature-classifier with a perc-identity setting of 0.8 (or whatever you want to use). Any sequences with a lower % identity will be unclassified. That way you can first visualize what proportion of a sample is unclassified, before removing these.

I hope that helps!

system · February 16, 2018, 11:19pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.