I am using QIIME 2 OTU/Naive Bayes classifier to process ITS amplicons. I am looking for confirmation on what/which code actually removes homopolymers. I've looked through the website/tutorials and cannot find it (the closest thing I've found is --p-max-ambiguous from q-score (q-score: Quality filter based on sequence quality scores. — QIIME 2 2023.5.1 documentation) btu this is for ambiguous bases not homopolymers. Can someone with better knowledge of QIIME 2 code please provide confirmation and link to any code that does remove homopolymers? Thanks so much!
The cull-seqs action is part of the RESCRIPt plugin, which must be installed separately. Note that this will only remove homopolymer from FeatureData[Sequence] or RNASequence artifacts, i.e., after reads have been denoised or clustered.
Just to confirm, in order to remove reads that have homopolymers with n > x (either user-defined x, or by default x), one would have to install and apply the RESCRIPt plugin? I.e., QIIME2 does not by default remove reads with homopolymers?
I believe this should be clarified, since in QIIME 1 had split_libraries.py that had the option to exclude reads with --max_homopolymer over a certain number. I could not find any doc in QIIME2 that refers to homopolymers.
I just want to make sure reads with homopolyers are not being removed in the background in some hidden step.
This is important in the analysis of ITS sequences, and is refered in a 2022 review paper that QIIME2 does remove reads with homopolymers n>5 in its script.