Oops — sorry, processing PGM data is learning process for me as well ![]()
Perfect!
I also just saw this post crop up on the forum:
They are using Illumina data, but their reorientation script might be useful for you — perhaps you should follow that thread and/or contact that user for help, to see if their reorientation script might be useful for you.
Interesting. There are obviously two different changes here between QIIME1 and QIIME2 that could both be causing this disparity:
- The OTU picking pipelines should be very similar, but do use different algorithms (uclust vs. vsearch).
- The taxonomy classifiers are different. I have found that vsearch performs similarly or better than uclust on mock communities, but that was with 16S and fungal ITS... it could be a very different story for 18S reads.
The vsearch classifier does have a number of different parameters to change. Especially --p-perc-identity, --p-maxaccepts, and --p-min-consensus will alter behavior, and may improve classification. If you are not already, I would recommend using a reference database with sequences clustered at 99% rather than 97%.
Sorry I can't provide clearer answers — since your data type is currently constrained from using dada2/deblur and the sklearn classifier in QIIME2 there are limited options for trying alternative approaches. It is particularly troubling because those would be are preferred/recommended approaches for Illumina 16S data, so you are sort of "stuck" doing QIIME1-style analyses in QIIME2 right now. The fact that QIIME1 is doing better is a bit concerning... but it seems that there may be more optimization possible with the vsearch classifier for 18S data, that I hope may help.
Please let us know if that helps!