Two step classification

Filipe_Matteoli · March 20, 2020, 8:05pm

Hello everyone,
My issue is similar to 2525. I have AMF amplicon. Following Nicholas_Bokulich suggestion in this issue, I am using a two-step approach, which consists:

Classifying my repseqs.qza using MaarjAM through classify-consensus-blast
Classify the Unassigned sequences using classify-sklearn using SILVA 132 18S dataset.

I have all set and was able to import MaarjAM sequences and classify using blast, using the following command:

qiime feature-classifier classify-consensus-blast --i-reference-reads maarjAM_seq.qza --i-reference-taxonomy maarjAM_taxonomy.qza --p-perc-identity 0.99 --i-query 2-dada2_repseqs.qza' --o-classification 3-consensus-blast.qza

Next, I generated taxa barplot and confirmed the existence of several unassigned.

The problem is that I cannot figure out the next step, do I have to filter Unassigned ASVs from 3-consensus-blast.qza ? Can I do this using filter-seqs ?
After that do I have to merge objects generated with classify-consensus-blast and classify-sklearn into one?

Nicholas_Bokulich · March 21, 2020, 1:52am

Welcome to the forum @Filipe_Matteoli!

Yes and yes

Yes, see qiime feature-table merge-taxa --help and note in the instructions how the order impacts merging (the first file listed takes priority, so input the output from classify-sklearn first)

Good luck!

Filipe_Matteoli · March 26, 2020, 7:47pm

Hello, @Nicholas_Bokulich, thanks for your help! I have successfully filtered Unassigned OTUs and merged taxonomy. However, I would like to point for those working with AMF specific amplicon like AMV4.5NF (5'-AAGCTCGTAGTTGAATTTCG-3') and AMDGR (5'-CCCAACTATCCCTATTAATCAT-3). That blast-consensus classification provided much more insight into the data than classify-sklearn, using maarjAM up to date database. Thus, in the end, I used only consensus-blast to classify AMF amplicon.

Best,

Nicholas_Bokulich · March 26, 2020, 8:43pm

classify-sklearn does not handle mixed-orientation reads, so will have lots of unassigned sequences if either the query or reference sequences are in mixed orientations. Not sure if that applies to maarjAM or not, but one possible reason why you would see performance differences between these. Another reason is that it depends on what parameters you are using with classify-consensus-blast.

system · April 27, 2020, 2:51am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.