ITS classification - nothing beyond phylum level

spongebob · May 19, 2020, 11:59pm

Hi there,

I recently migrated over to qiime2 from mothur. I am processing the same ITS sequence data, that i did with mothur and i am using the same UNITE database for ITS classification. When i used the mothur pipeline i get good family and genus level resolution, however using qiime2 >90% of the sequences are not classified beyond “kingdom; fungi”. I am a bit puzzled and wonder if i have done something incorrect along the the way. I have posted by command log below. Wondering if anyone can help me see where i have gone wrong?

I am using qiime2-2019.10, installed through conda on ubuntu.

qiime tools import
–type ‘SampleData[PairedEndSequencesWithQuality]’
–input-path //path_to_file/samples.txt
–input-format PairedEndFastqManifestPhred33V2
–output-path /path_to_file_output/samples_demux-paired-end.qza

qiime demux summarize
–i-data samples_demux-paired-end.qza
–o-visualization samples_demux.qzv

qiime cutadapt trim-paired
–i-demultiplexed-sequences samples_demux-paired-end.qza
–p-cores 2
–p-front-f GAACGCAGCRAANNGYGA
–p-front-r TCCTCCGCTTATTGATATGC
–p-adapter-r TCRCNNTTYGCTGCGTTC
–p-adapter-f GCATATCAATAAGCGGAGGA
–o-trimmed-sequences samples_demux-paired-end_PT.qza
–verbose \

qiime demux summarize
–i-data samples_demux-paired-end_PT.qza
–o-visualization samples_demux-paired-end_PT.qzv

DENOISE WITH DADA2 - did not trim as ITS
qiime dada2 denoise-paired
–i-demultiplexed-seqs samples_demux-paired-end_PT.qza
–p-trunc-len-f 0
–p-trunc-len-r 0
–p-max-ee-f 2
–p-max-ee-r 2
–p-n-threads 2
–o-table samples_demux-paired-end_PT_table-dada2.qza
–o-representative-sequences samples_demux-paired-end_PT_rep-seqs-dada2.qza
–o-denoising-stats samples_demux-paired-end_PT_denoising-stats.qza

qiime metadata tabulate
–m-input-file samples_demux-paired-end_PT_denoising-stats.qza
–o-visualization samples_demux-paired-end_PT_denoising-stats.qzv

REMOVE SINGLETONS FROM TABLE
qiime feature-table filter-features
–i-table samples_demux-paired-end_PT_table-dada2.qza
–p-min-frequency 2
–o-filtered-table samples_demux-paired-end_PT_table-dada2_single.qza

REMOVE SINGLETONS FROM SEQUENCE FILE
qiime feature-table filter-seqs
–i-data samples_demux-paired-end_PT_rep-seqs-dada2.qza
–i-table samples_demux-paired-end_PT_table-dada2_single.qza
–o-filtered-data samples_demux-paired-end_PT_rep-seqs-dada2_single.qza

qiime feature-table summarize
–i-table samples_demux-paired-end_PT_table-dada2_single.qza
–o-visualization samples_demux-paired-end_PT_table-dada2_single.qzv

qiime feature-table tabulate-seqs
–i-data samples_demux-paired-end_PT_rep-seqs-dada2_single.qza
–o-visualization samples_demux-paired-end_PT_rep-seqs-dada2_single.qzv

make UNITE database following https://github.com/gregcaporaso/2017.06.23-q2-fungal-tutorial:

https://doi.org/10.15156/BIO/786334
Includes singletons set as RefS (in dynamic files).
sh_refs_qiime_ver8_dynamic_s_02.02.2019.fasta
sh_taxonomy_qiime_ver8_dynamic_s_02.02.2019.txt

*qiime tools import *
–type FeatureData[Sequence]
–input-path sh_refs_qiime_ver8_dynamic_s_02.02.2019.fasta
–output-path UNITE_seqs_v8_dynamic_02022019.qza
Imported sh_refs_qiime_ver8_dynamic_s_02.02.2019.fasta as DNASequencesDirectoryFormat to UNITE_seqs_v8_dynamic_02022019.qza

qiime tools import
–type FeatureData[Taxonomy]
–input-format HeaderlessTSVTaxonomyFormat
–input-path sh_taxonomy_qiime_ver8_dynamic_s_02.02.2019.txt
–output-path UNITE_tax_v8_dynamic_02022019.qza

Imported sh_taxonomy_qiime_ver8_dynamic_s_02.02.2019.txt as HeaderlessTSVTaxonomyFormat to UNITE_tax_v8_dynamic_02022019.qza

Train the classifier on this region
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads UNITE_seqs_v8_dynamic_02022019.qza
–i-reference-taxonomy UNITE_tax_v8_dynamic_02022019.qza
–o-classifier classifier_UNITE_v8.qza
–verbose

UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.21.2. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Saved TaxonomicClassifier to: classifier_UNITE_v8.qza

Classify reads by taxon using a fitted classifier
qiime feature-classifier classify-sklearn
–i-classifier /path_to_database/qimme2_databases/UNITE/UNITEv8_DynamicClassifier/dynamic/classifier_UNITE_v8.qza
–i-reads samples_demux-paired-end_PT_rep-seqs-dada2_single.qza
–o-classification samples_demux-paired-end_PT_rep-seqs-dada2_single_classification.qza

qiime metadata tabulate
–m-input-file samples_demux-paired-end_PT_rep-seqs-dada2_single_classification.qza
–o-visualization samples_demux-paired-end_PT_rep-seqs-dada2_single_classification_taxonomy.qzv

I also tried to classify the samples using

qiime feature-classifier classify-consensus-vsearch as suggested on the forum.
using the developer version of UNITE as suggested on the forum.

But the results were still unclassified.

Thanks for any help

Nicholas_Bokulich · May 20, 2020, 3:19am

Welcome @spongebob !

What a puzzling problem! The fact that classify-consensus-vsearch is also failing is a pretty good indicator that something has gone seriously wrong.

A few things could be at fault, but it sounds like it’s probably an issue with the query sequences themselves. Could you walk through the following to troubleshoot? Stop once you find the problem.

This is a better tutorial to work from: Fungal ITS analysis tutorial
that tutorial (and the older one you linked to) both use mock community data as a toy dataset… want to try classifying the mock community data with your classifier? If you get the same issue, we know it’s a classifier issue. If it works, then we know that your data are at fault!
try NCBI BLASTing a few to see what you hit (e.g., if you hit non-fungal sequences that’s a clear answer. Most ITS primers also hit plants and other euks, so maybe it’s just non-target? That should be unclassified or classified at kingdom level since such seqs are not in the UNITE db)

If those steps don’t shake out the issue, could you please share all of the QZVs that you listed above? Then I can work backwards through them to see if I spot any clues… you can share via DM if you don’t want these posted publicly here.

Let me know what you find!

spongebob · May 22, 2020, 1:44am

Thanks for the response. I have tried as you suggested and here are the results:

I tried using my classifer with the mock community and the results are the same. So not a classifier problem.
I did blast some of the representative sequences and many were not fungal. Looking back at my mothur log i used “remove.lineage” so i only retained the fungal sequences. Would it make sense to use the UNITE database version called “All eukaryotes” (https://doi.org/10.15156/BIO/786386) to re-classify my data? Then is it possible to remove the non fugal sequences from analysis?

Thanks again for your helpful suggestions.

Nicholas_Bokulich · May 22, 2020, 2:54am

That would explain the issue! A few options:

Yes that is certainly one option. The disadvantage is that including all eukaryote sequences would make a much larger and slower classifier, when you presumably don't care what the other euks are, you just want to get rid of them. If you know what non-fungal reads you expect to find, you could just selectively add those to your database to provide an "outgroup" for identifying those non-fungal reads.

Option 2 is to just take the data you have now and filter out all unclassified/phylum only sequences with qiime taxa filter-table. The advantage is that you don't need to re-classify again, the disadvantage is that you are assuming that these indeed represent non-fungal seqs.

Option 3 is to use qiime quality-control exclude-seqs to filter out sequences that do not align against the UNITE fungal ITS database. The advantage is that you are explicitly filtering out anything that does not hit within a specified % identity, the disadvantage is that it adds more time.

I'd personally try option 2, see what I'm left with, and then maybe move onto option 3 if I'm still harboring some doubts.

spongebob · May 25, 2020, 7:54pm

Has anyone else had issues getting the “all euks” UNITE databases from Pluto?

If you try to access the databases below you don’t end up downloaded anything.

https://plutof.ut.ee/#/doi/10.15156/BIO/786388
https://plutof.ut.ee/#/doi/10.15156/BIO/786386

I have emailed them but no response…thoughts?

Nicholas_Bokulich · May 25, 2020, 7:56pm

You need to click on the “media” tab to get the file download link. So e.g., this is probably what you are looking for:
https://files.plutof.ut.ee/public/orig/99/D0/99D026A5A6EE65E6F8D736A1B02BB18D44724F8BAD7B103D6CB14FC09BFD20A8.gz

spongebob · June 6, 2020, 2:20am

seemed to be a browser issue! thanks

system · July 7, 2020, 8:28am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.