Hi QIIME 2 community,
I am working on a fungal ITS meta-analysis using data from 77 BioProjects. The data include Illumina paired-end reads and Ion Torrent, GridION, and PacBio single-end reads. The datasets target different ITS regions: ITS1, ITS2, and full ITS / ALL.
Because some datasets have abnormal or low-information quality scores, DADA2 is not suitable. For example, one dataset has quality scores almost entirely at Q3 and Q30, so DADA2 cannot learn a reliable error model. I am therefore considering a VSEARCH-based workflow:
ITSxpress trimming to ITS1, ITS2, or ALL
dereplication
chimera removal
de novo OTU clustering at 97%
taxonomy assignment using UNITE
My goal is to obtain one final table for downstream comparison across all BioProjects and platforms.
I am unsure whether I should:
Combine ITS1, ITS2, and full ITS reads after ITSxpress trimming and perform one global 97% OTU clustering, producing one mixed-region OTU table; or
Process ITS1, ITS2, and full ITS separately, generate region-specific OTU tables, assign taxonomy with the same UNITE database, collapse to a shared taxonomic level such as genus or species, and then merge the taxonomy tables.
I understand that ITS1, ITS2, and full ITS are different regions, so a single mixed-region OTU table may not be biologically comparable even if it is technically possible.
What would be the recommended QIIME 2 approach for this type of heterogeneous fungal ITS meta-analysis? Is a single mixed-region OTU97 table acceptable, or should I merge only after taxonomy assignment/collapsing?
Thank you!