ITS Classifier: Extract reference reads - HiSeq:

Hello,

I have a quick question. Our lab decided to try and use HISeq to test if we get similar or better resolution (taxonomic resolution), for our samples as we do with ASVs.However, the DADA2 parameters that give me the best outputs are quite short, as expected (70-90bp). Since the reverse read looks pretty bad, we are testing the pipeline only with the forward reads, but my question is, since the length is so small, when I assign taxonomy, most hits differ from the ASV results (dominant ASVs) and or only hit “Fungi”.

In the old moving pictures turotial, there is a suggestion to extract reads and train the classifier in this manner (qiime feature-classifier extract-reads), but it also states that this should not be done with fungal sequences. Can you please provide some guideline. I am running a test on this now, but I don’t want to do something and report things incorrectly.

I am currently working with QIIME2 ( v2026.1) installed on HPCC.

Thank you so much for your help,

Fabi

Hi @fabipc ,

70-90 nt of ITS should still yield reasonable results for most fungi, even if it might not reach species. Of course, a few issues are possible:

  1. the poorly classified reads are not actually fungi. These could be off-target hits, e.g., plants or other eukaryotes (which will amplify with most ITS primers). Use the UNITE version containing all eukaryotes to diagnose.
  2. you could be getting poor matches in the database. UNITE does contain some accessions that are not fully annotated. This does not sound like the case, but removing incomplete accessions from the database may be one option to improve classifications.
  3. If your reads are in mixed orientations that could lead to poor classification of the mixed reads. Try adjusting the read-orientation parameter to same or both to observe how this impacts classification.

Yes I recommend training the classifier on the full database without extracting, as specified in the tutorial.

Please give that a try and let us know what you find! If you are still getting poor results, please share a QZV of the results and the full commands that you are running.

Good luck!