Hello!
I am working with ITS1 data and have used a workflow borrowing heavily from the Langille lab’s microbiome helper, with a few fungal modifications including extracting the ITS1 region (Microbiome Helper 2 Marker gene workflow · LangilleLab/microbiome_helper Wiki · GitHub).
For step 3, Assign taxonomy to ASVs, I trained and used a classifier based on the most recent reference available on UNITE (PlutoF DOI) using qiime feature-classifier fit-classifier-naive-bayes and qiime feature-classifier classify-sklearn. I now have a taxonomy.tsv that appears to match what I expected to see based on my samples.
My question involves the different types of feature IDs provided. I have a mix of representative sequences (SH0016447.10FU_KX515298_reps) and reference sequences (SH0016832.10FU_MG593539_refs). From what I understand, these are artifacts of how UNITE selects sequences to represent species hypotheses, but that including both can inflate diversity. I am wondering what common practice is to mitigate this? I have read that I can use qiime collapse to group by species hypothesis, but there seems to be a lot of caveats, and I am not confident in how I can do this while maintaining taxonomic information and abundance data.
Any insight would be extremely appreciated.
Thank you very much.
