Trouble with Taxonomic Assignement using UNITE database (ITS-fungi)

Hello! I am a quite new user of QIIME2 and I'm analyzing 6 samples of S.siderea (coral) looking for insights on the fungal communities. I used primers ITS-1F and ITS-2R and I worked with Illumina equipment. I will report on my pipeline in the hope that someone can help me figuring out my problem.
I am using Qiime qiime2-2022.2 version.

First I downloaded from UNITE the zip file "sh_qiime_release_16.10.2022.tgz" from UNITE in this link PlutoF DOI , then I followed this code to train a classifier :

Clean

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' sh_refs_qiime_ver9_99_16.10.2022.fasta | tr -d ' ' > sh_refs_qiime_ver9_99_16.10.2022_dev_uppercase.fasta

Import sequences

qiime tools import
--type FeatureData[Sequence]
--input-path sh_refs_qiime_ver9_99_16.10.2022_dev_uppercase.fasta
--output-path unite-ver9-seqs_99_16.10.2022.qza

Import taxanomy file

qiime tools import
--type FeatureData[Taxonomy]
--input-path sh_taxonomy_qiime_ver9_99_16.10.2022.txt
--output-path unite-ver9-taxonomy_99_16.10.2022.qza
--input-format HeaderlessTSVTaxonomyFormat

Train

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads unite-ver9-seqs_99_16.10.2022.qza
--i-reference-taxonomy unite-ver9-taxonomy_99_16.10.2022.qza
--o-classifier unite-ver9-99-classifier-16.10.2022.qza

Up until this point I have no problems and I looked at my classifier file and it looks fine it is aprox 100, 000 MB and has plenty of rows with lines that have taxonomic information. The problem is in the next part, where I try to use my data with the classifier.

Import raw files of my samples (6)

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path Raw --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path ../Run1/Qiime2/Run1.qza

Check Qscores

qiime tools view Run1.qzv
Quality score and number of reads:


Triming

nohup qiime dada2 denoise-paired --i-demultiplexed-seqs Run1.qza --p-trim-left-f 30 --p-trim-left-r 30 --p-trunc-len-f 250 --p-trunc-len-r 250 --p-n-threads 10 --output-dir dada_run1 --verbose &

classifier

qiime feature-classifier classify-skl

--i-classifier /home/xibalba/Sabina/UNITE/developer/unite-ver9-99-classifier-16.10.2022.qza
--i-reads representative_sequences.qza
--o-classification classified_seq.qza

Export file taxonomy.tsv

qiime tools export --input-path /home/xibalba/Sabina/Run1/Qiime2/dada_run1/classified_seq.qza --output-path /home/xibalba/Sabina/Run1/Qiime2/dada_run1/aln_tree

When I check on the file taxonomy.tsv I get this:

It appears as if the only classification I get is on the kingdom class, the results seem very wrong. Well thank you very much for your time and I hope someone can point me to the right direction on how to analyze my data.

Based on the primer names, I think you have ITS1 amplicons and 2x 250 PE reads. With the dada2 step you merge the forward and reverse reads. ITS region is known for variability in length and especially ITS1. So, the ITS1 amplicons in your samples can be longer than what could be merged in dada2 with 2x 250 bp reads. And ITS1 can also smaller than 250 bp resulting in reads ending with reverse complement primer sequences. For ITS1 amplicons best option is to use only the forward reads, in your case also better quality ( but that is very common). Run cutadapt after re-importing the forward reads as single reads, to remove reverse complement ITS2R primer sequences. Run dada2 as denoise-single, set trunc-len to 0, and maybe also add --p-max-ee 8.0 --p-trunc-q 8 in script based on: [Customization of a DADA2-based pipeline for fungal internal transcribed spacer 1 (ITS1) amplicon data sets - PMC](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8765055/" /l "sd).
Another thing to consider is to use the All Eukaryotes Unite database and not the Fungi only Unite database you used now for training your classifier. Your primers will also amplify other Eukaryotic targets in your samples, the primers are not fungii specific. With an classifier trained on Unite Fungi only database, the other Eukaryotic targets will be classified as k__Fungi.

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.